1
|
Styk J, Pös Z, Pös O, Radvanszky J, Turnova EH, Buglyó G, Klimova D, Budis J, Repiska V, Nagy B, Szemes T. Microsatellite instability assessment is instrumental for Predictive, Preventive and Personalised Medicine: status quo and outlook. EPMA J 2023; 14:143-165. [PMID: 36866160 PMCID: PMC9971410 DOI: 10.1007/s13167-023-00312-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 01/06/2023] [Indexed: 01/26/2023]
Abstract
A form of genomic alteration called microsatellite instability (MSI) occurs in a class of tandem repeats (TRs) called microsatellites (MSs) or short tandem repeats (STRs) due to the failure of a post-replicative DNA mismatch repair (MMR) system. Traditionally, the strategies for determining MSI events have been low-throughput procedures that typically require assessment of tumours as well as healthy samples. On the other hand, recent large-scale pan-tumour studies have consistently highlighted the potential of massively parallel sequencing (MPS) on the MSI scale. As a result of recent innovations, minimally invasive methods show a high potential to be integrated into the clinical routine and delivery of adapted medical care to all patients. Along with advances in sequencing technologies and their ever-increasing cost-effectiveness, they may bring about a new era of Predictive, Preventive and Personalised Medicine (3PM). In this paper, we offered a comprehensive analysis of high-throughput strategies and computational tools for the calling and assessment of MSI events, including whole-genome, whole-exome and targeted sequencing approaches. We also discussed in detail the detection of MSI status by current MPS blood-based methods and we hypothesised how they may contribute to the shift from conventional medicine to predictive diagnosis, targeted prevention and personalised medical services. Increasing the efficacy of patient stratification based on MSI status is crucial for tailored decision-making. Contextually, this paper highlights drawbacks both at the technical level and those embedded deeper in cellular/molecular processes and future applications in routine clinical testing.
Collapse
Affiliation(s)
- Jakub Styk
- Institute of Medical Biology, Genetics and Clinical Genetics, Faculty of Medicine, Comenius University, 811 08 Bratislava, Slovakia ,Comenius University Science Park, 841 04 Bratislava, Slovakia ,Geneton Ltd, 841 04 Bratislava, Slovakia
| | - Zuzana Pös
- Comenius University Science Park, 841 04 Bratislava, Slovakia ,Geneton Ltd, 841 04 Bratislava, Slovakia ,Institute of Clinical and Translational Research, Biomedical Research Centre, Slovak Academy of Sciences, 845 05 Bratislava, Slovakia
| | - Ondrej Pös
- Comenius University Science Park, 841 04 Bratislava, Slovakia ,Geneton Ltd, 841 04 Bratislava, Slovakia
| | - Jan Radvanszky
- Comenius University Science Park, 841 04 Bratislava, Slovakia ,Institute of Clinical and Translational Research, Biomedical Research Centre, Slovak Academy of Sciences, 845 05 Bratislava, Slovakia ,Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04 Bratislava, Slovakia
| | - Evelina Hrckova Turnova
- Comenius University Science Park, 841 04 Bratislava, Slovakia ,Slovgen Ltd, 841 04 Bratislava, Slovakia
| | - Gergely Buglyó
- Department of Human Genetics, Faculty of Medicine, University of Debrecen, 4032 Debrecen, Hungary
| | - Daniela Klimova
- Institute of Medical Biology, Genetics and Clinical Genetics, Faculty of Medicine, Comenius University, 811 08 Bratislava, Slovakia
| | - Jaroslav Budis
- Comenius University Science Park, 841 04 Bratislava, Slovakia ,Geneton Ltd, 841 04 Bratislava, Slovakia ,Slovak Centre of Scientific and Technical Information, 811 04 Bratislava, Slovakia
| | - Vanda Repiska
- Institute of Medical Biology, Genetics and Clinical Genetics, Faculty of Medicine, Comenius University, 811 08 Bratislava, Slovakia ,Medirex Group Academy, NPO, 949 05 Nitra, Slovakia
| | - Bálint Nagy
- Comenius University Science Park, 841 04 Bratislava, Slovakia ,Department of Human Genetics, Faculty of Medicine, University of Debrecen, 4032 Debrecen, Hungary
| | - Tomas Szemes
- Comenius University Science Park, 841 04 Bratislava, Slovakia ,Geneton Ltd, 841 04 Bratislava, Slovakia ,Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04 Bratislava, Slovakia
| |
Collapse
|
2
|
Xiao C, Chen Z, Chen W, Padilla C, Colgan M, Wu W, Fang LT, Liu T, Yang Y, Schneider V, Wang C, Xiao W. Personalized genome assembly for accurate cancer somatic mutation discovery using tumor-normal paired reference samples. Genome Biol 2022; 23:237. [PMID: 36352452 PMCID: PMC9648002 DOI: 10.1186/s13059-022-02803-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 10/25/2022] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND The use of a personalized haplotype-specific genome assembly, rather than an unrelated, mosaic genome like GRCh38, as a reference for detecting the full spectrum of somatic events from cancers has long been advocated but has never been explored in tumor-normal paired samples. Here, we provide the first demonstrated use of de novo assembled personalized genome as a reference for cancer mutation detection and quantifying the effects of the reference genomes on the accuracy of somatic mutation detection. RESULTS We generate de novo assemblies of the first tumor-normal paired genomes, both nuclear and mitochondrial, derived from the same individual with triple negative breast cancer. The personalized genome was chromosomal scale, haplotype phased, and annotated. We demonstrate that it provides individual specific haplotypes for complex regions and medically relevant genes. We illustrate that the personalized genome reference not only improves read alignments for both short-read and long-read sequencing data but also ameliorates the detection accuracy of somatic SNVs and SVs. We identify the equivalent somatic mutation calls between two genome references and uncover novel somatic mutations only when personalized genome assembly is used as a reference. CONCLUSIONS Our findings demonstrate that use of a personalized genome with individual-specific haplotypes is essential for accurate detection of the full spectrum of somatic mutations in the paired tumor-normal samples. The unique resource and methodology established in this study will be beneficial to the development of precision oncology medicine not only for breast cancer, but also for other cancers.
Collapse
Affiliation(s)
- Chunlin Xiao
- grid.94365.3d0000 0001 2297 5165National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20894 USA
| | - Zhong Chen
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, 11021 Campus St., Loma Linda, CA 92350 USA
| | - Wanqiu Chen
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, 11021 Campus St., Loma Linda, CA 92350 USA
| | - Cory Padilla
- grid.504403.6Dovetail Genomics, 100 Enterprise Way, Scotts Valley, CA 95066 USA
| | - Michael Colgan
- grid.417587.80000 0001 2243 3366The Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD USA
| | - Wenjun Wu
- grid.249335.a0000 0001 2218 7820Blood Cell Development and Function Program, Fox Chase Cancer Center, Philadelphia, PA 19111 USA
| | - Li-Tai Fang
- grid.418158.10000 0004 0534 4718Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., 1301 Shoreway Road, Belmont, CA 94002 USA
| | - Tiantian Liu
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, 11021 Campus St., Loma Linda, CA 92350 USA
| | - Yibin Yang
- grid.249335.a0000 0001 2218 7820Blood Cell Development and Function Program, Fox Chase Cancer Center, Philadelphia, PA 19111 USA
| | - Valerie Schneider
- grid.94365.3d0000 0001 2297 5165National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20894 USA
| | - Charles Wang
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, 11021 Campus St., Loma Linda, CA 92350 USA
| | - Wenming Xiao
- grid.417587.80000 0001 2243 3366The Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD USA
| |
Collapse
|
3
|
Moradi N, Ohadian Moghadam S, Heidarzadeh S. Application of next-generation sequencing in the diagnosis of gastric cancer. Scand J Gastroenterol 2022; 57:842-855. [PMID: 35293278 DOI: 10.1080/00365521.2022.2041717] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Objectives: Gastric cancer (GC) is a disease with high mortality, poor prognosis and numerous risk factors. GC has an asymptomatic nature in early stages of the diseases, making timely diagnosis complicated using common conventional approaches, namely pathological examinations and imaging tests. Recently, molecular profiling of GC using next generation sequencing (NGS) has opened new doors to efficient prognostic, diagnostic, and therapeutic strategies. The current review aims to thoroughly discuss and compare the current NGS techniques and commercial platforms utilized for GC diagnosis and treatment, highlighting the most recent NGS-based GC studies. Furthermore, this review addresses the challenges of clinical implementation of NGS in GC.Materials and methods: This review was conducted according to the eligible studies identified via search of Web of Science, PubMed, Scopus, Embase and the Cochrane Library. In the present study, data on gastric cancer patients and NGS methods used to diagnose the disease were reviewed.Conclusion: Given the ever-rising advancements in NGS technologies, bioinformatics, healthcare guidelines and refined classifications, it is hoped that these technologies can actualize their advantages and optimize GC patients' experience.
Collapse
Affiliation(s)
- Narges Moradi
- Department of Life Technologies, University of Turku, Turku, Finland
| | | | - Siamak Heidarzadeh
- Department of Microbiology and Virology, School of Medicine, Zanjan University of Medical Sciences, Zanjan, Iran
| |
Collapse
|
4
|
Souvorov A, Agarwala R. SAUTE: sequence assembly using target enrichment. BMC Bioinformatics 2021; 22:375. [PMID: 34289805 PMCID: PMC8293564 DOI: 10.1186/s12859-021-04174-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 05/05/2021] [Indexed: 01/25/2023] Open
Abstract
Background Illumina is the dominant sequencing technology at this time. Short length, short insert size, some systematic biases, and low-level carryover contamination in Illumina reads continue to make assembly of repeated regions a challenging problem. Some applications also require finding multiple well supported variants for assembled regions. Results To facilitate assembly of repeat regions and to report multiple well supported variants when a user can provide target sequences to assist the assembly, we propose SAUTE and SAUTE_PROT assemblers. Both assemblers use de Bruijn graph on reads. Targets can be transcripts or proteins for RNA-seq reads and transcripts, proteins, or genomic regions for genomic reads. Target sequences are nucleotide and protein sequences for SAUTE and SAUTE_PROT, respectively. Conclusions For RNA-seq, comparisons with Trinity, rnaSPAdes, SPAligner, and SPAdes assembly of reads aligned to target proteins by DIAMOND show that SAUTE_PROT finds more coding sequences that translate to benchmark proteins. Using AMRFinderPlus calls, we find SAUTE has higher sensitivity and precision than SPAdes, plasmidSPAdes, SPAligner, and SPAdes assembly of reads aligned to target regions by HISAT2. It also has better sensitivity than SKESA but worse precision. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04174-9.
Collapse
Affiliation(s)
| | - Richa Agarwala
- NCBI/NLM/NIH/DHHS, 8600 Rockville Pike, Bethesda, MD, 20894, USA.
| |
Collapse
|
5
|
Ji Z, Guo W, Sakkiah S, Liu J, Patterson TA, Hong H. Nanomaterial Databases: Data Sources for Promoting Design and Risk Assessment of Nanomaterials. NANOMATERIALS 2021; 11:nano11061599. [PMID: 34207026 PMCID: PMC8234318 DOI: 10.3390/nano11061599] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 06/11/2021] [Accepted: 06/14/2021] [Indexed: 12/19/2022]
Abstract
Nanomaterials have drawn increasing attention due to their tunable and enhanced physicochemical and biological performance compared to their conventional bulk materials. Owing to the rapid expansion of the nano-industry, large amounts of data regarding the synthesis, physicochemical properties, and bioactivities of nanomaterials have been generated. These data are a great asset to the scientific community. However, the data are on diverse aspects of nanomaterials and in different sources and formats. To help utilize these data, various databases on specific information of nanomaterials such as physicochemical characterization, biomedicine, and nano-safety have been developed and made available online. Understanding the structure, function, and available data in these databases is needed for scientists to select appropriate databases and retrieve specific information for research on nanomaterials. However, to our knowledge, there is no study to systematically compare these databases to facilitate their utilization in the field of nanomaterials. Therefore, we reviewed and compared eight widely used databases of nanomaterials, aiming to provide the nanoscience community with valuable information about the specific content and function of these databases. We also discuss the pros and cons of these databases, thus enabling more efficient and convenient utilization.
Collapse
|
6
|
Garcia-Garcia S, Cortese MF, Rodríguez-Algarra F, Tabernero D, Rando-Segura A, Quer J, Buti M, Rodríguez-Frías F. Next-generation sequencing for the diagnosis of hepatitis B: current status and future prospects. Expert Rev Mol Diagn 2021; 21:381-396. [PMID: 33880971 DOI: 10.1080/14737159.2021.1913055] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
INTRODUCTION Hepatitis B virus (HBV) causes a complex and persistent infection with a major impact on patients health. Viral-genome sequencing can provide valuable information for characterizing virus genotype, infection dynamics and drug and vaccine resistance. AREAS COVERED This article reviews the current literature to describe the next-generation sequencing progress that facilitated a more comprehensive study of HBV quasispecies in diagnosis and clinical monitoring. EXPERT OPINION HBV variability plays a key role in liver disease progression and treatment efficacy. Second-generation sequencing improved the sensitivity for detecting and quantifying mutations, mixed genotypes and viral recombination. Third-generation sequencing enables the analysis of the entire HBV genome, although the high error rate limits its use in clinical practice.
Collapse
Affiliation(s)
- Selene Garcia-Garcia
- Liver Pathology Unit, Departments of Biochemistry and Microbiology, Hospital Universitari Vall d'Hebron, Universitat Autònoma De Barcelona, Barcelona Spain
- Clinical Biochemistry Research Group, Vall d'Hebron Institut Recerca (VHIR), Hospital Universitari Vall d'Hebron, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Maria Francesca Cortese
- Liver Pathology Unit, Departments of Biochemistry and Microbiology, Hospital Universitari Vall d'Hebron, Universitat Autònoma De Barcelona, Barcelona Spain
- Clinical Biochemistry Research Group, Vall d'Hebron Institut Recerca (VHIR), Hospital Universitari Vall d'Hebron, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Francisco Rodríguez-Algarra
- Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - David Tabernero
- Centro De Investigación Biomédica En Red De Enfermedades Hepáticas Y Digestivas, Instituto De Salud Carlos III, Madrid Spain
| | - Ariadna Rando-Segura
- Liver Pathology Unit, Departments of Biochemistry and Microbiology, Hospital Universitari Vall d'Hebron, Universitat Autònoma De Barcelona, Barcelona Spain
| | - Josep Quer
- Centro De Investigación Biomédica En Red De Enfermedades Hepáticas Y Digestivas, Instituto De Salud Carlos III, Madrid Spain
- Liver Unit, Liver Disease Laboratory-Viral Hepatitis, Vall d'Hebron Institut Recerca-Hospital Universitari Vall d'Hebron, Universitat Autònoma De Barcelona, Barcelona Spain
| | - Maria Buti
- Centro De Investigación Biomédica En Red De Enfermedades Hepáticas Y Digestivas, Instituto De Salud Carlos III, Madrid Spain
- Liver Unit, Department of Internal Medicine, Hospital Universitari Vall d'Hebron, Universitat Autònoma De Barcelona, Barcelona Spain
| | - Francisco Rodríguez-Frías
- Liver Pathology Unit, Departments of Biochemistry and Microbiology, Hospital Universitari Vall d'Hebron, Universitat Autònoma De Barcelona, Barcelona Spain
- Clinical Biochemistry Research Group, Vall d'Hebron Institut Recerca (VHIR), Hospital Universitari Vall d'Hebron, Universitat Autònoma de Barcelona, Barcelona, Spain
- Centro De Investigación Biomédica En Red De Enfermedades Hepáticas Y Digestivas, Instituto De Salud Carlos III, Madrid Spain
| |
Collapse
|
7
|
dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies. BMC Genomics 2019; 20:706. [PMID: 31510940 PMCID: PMC6737619 DOI: 10.1186/s12864-019-6070-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Accepted: 08/29/2019] [Indexed: 01/27/2023] Open
Abstract
Background Accurate de novo genome assembly has become reality with the advancements in sequencing technology. With the ever-increasing number of de novo genome assembly tools, assessing the quality of assemblies has become of great importance in genome research. Although many quality metrics have been proposed and software tools for calculating those metrics have been developed, the existing tools do not produce a unified measure to reflect the overall quality of an assembly. Results To address this issue, we developed the de novo Assembly Quality Evaluation Tool (dnAQET) that generates a unified metric for benchmarking the quality assessment of assemblies. Our framework first calculates individual quality scores for the scaffolds/contigs of an assembly by aligning them to a reference genome. Next, it computes a quality score for the assembly using its overall reference genome coverage, the quality score distribution of its scaffolds and the redundancy identified in it. Using synthetic assemblies randomly generated from the latest human genome build, various builds of the reference genomes for five organisms and six de novo assemblies for sample NA24385, we tested dnAQET to assess its capability for benchmarking quality evaluation of genome assemblies. For synthetic data, our quality score increased with decreasing number of misassemblies and redundancy and increasing average contig length and coverage, as expected. For genome builds, dnAQET quality score calculated for a more recent reference genome was better than the score for an older version. To compare with some of the most frequently used measures, 13 other quality measures were calculated. The quality score from dnAQET was found to be better than all other measures in terms of consistency with the known quality of the reference genomes, indicating that dnAQET is reliable for benchmarking quality assessment of de novo genome assemblies. Conclusions The dnAQET is a scalable framework designed to evaluate a de novo genome assembly based on the aggregated quality of its scaffolds (or contigs). Our results demonstrated that dnAQET quality score is reliable for benchmarking quality assessment of genome assemblies. The dnQAET can help researchers to identify the most suitable assembly tools and to select high quality assemblies generated. Electronic supplementary material The online version of this article (10.1186/s12864-019-6070-x) contains supplementary material, which is available to authorized users.
Collapse
|
8
|
Souvorov A, Agarwala R, Lipman DJ. SKESA: strategic k-mer extension for scrupulous assemblies. Genome Biol 2018; 19:153. [PMID: 30286803 PMCID: PMC6172800 DOI: 10.1186/s13059-018-1540-z] [Citation(s) in RCA: 360] [Impact Index Per Article: 60.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Accepted: 09/12/2018] [Indexed: 01/20/2023] Open
Abstract
SKESA is a DeBruijn graph-based de-novo assembler designed for assembling reads of microbial genomes sequenced using Illumina. Comparison with SPAdes and MegaHit shows that SKESA produces assemblies that have high sequence quality and contiguity, handles low-level contamination in reads, is fast, and produces an identical assembly for the same input when assembled multiple times with the same or different compute resources. SKESA has been used for assembling over 272,000 read sets in the Sequence Read Archive at NCBI and for real-time pathogen detection. Source code for SKESA is freely available at https://github.com/ncbi/SKESA/releases .
Collapse
Affiliation(s)
| | - Richa Agarwala
- NCBI/NLM/NIH/DHHS, 8600 Rockville Pike, Bethesda, 20894 MD USA
| | - David J. Lipman
- NCBI/NLM/NIH/DHHS, 8600 Rockville Pike, Bethesda, 20894 MD USA
- Impossible Foods, impossiblefoods.com, Redwood City, 94063 CA USA
| |
Collapse
|
9
|
Rajoriya N, Combet C, Zoulim F, Janssen HLA. How viral genetic variants and genotypes influence disease and treatment outcome of chronic hepatitis B. Time for an individualised approach? J Hepatol 2017; 67:1281-1297. [PMID: 28736138 DOI: 10.1016/j.jhep.2017.07.011] [Citation(s) in RCA: 106] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/03/2017] [Revised: 06/27/2017] [Accepted: 07/12/2017] [Indexed: 12/12/2022]
Abstract
Chronic hepatitis B virus (HBV) infection remains a global problem. Several HBV genotypes exist with different biology and geographical prevalence. Whilst the future aim of HBV treatment remains viral eradication, current treatment strategies aim to suppress the virus and prevent the progression of liver disease. Current strategies also involve identification of patients for treatment, namely those at risk of progressive liver disease. Identification of HBV genotype, HBV mutants and other predictive factors allow for tailoured treatments, and risk-surveillance pathways, such as hepatocellular cancer screening. In the future, these factors may enable stratification not only of treatment decisions, but also of patients at risk of higher relapse rates when current therapies are discontinued. Newer technologies, such as next-generation sequencing, to assess drug-resistant or immune escape variants and quasi-species heterogeneity in patients, may allow for more information-based treatment decisions between the clinician and the patient. This article serves to discuss how HBV genotypes and genetic variants impact not only upon the disease course and outcomes, but also current treatment strategies. Adopting a personalised genotypic approach may play a role in future strategies to combat the disease. Herein, we discuss new technologies that may allow more informed decision-making for response guided therapy in the battle against HBV.
Collapse
Affiliation(s)
- Neil Rajoriya
- Toronto Centre for Liver Diseases, Toronto General Hospital, 200 Elizabeth Street, Toronto, Ontario M5G 2C4, Canada
| | - Christophe Combet
- Univ Lyon, Université Claude Bernard Lyon 1, INSERM 1052, CNRS 5286, Centre Léon Bérard, Centre de recherche en cancérologie de Lyon, Lyon 69XXX, France
| | - Fabien Zoulim
- Univ Lyon, Université Claude Bernard Lyon 1, INSERM 1052, CNRS 5286, Centre Léon Bérard, Centre de recherche en cancérologie de Lyon, Lyon 69XXX, France; Department of Hepatology, Groupement Hospitalier Nord, Hospices Civils de Lyon, Lyon, France
| | - Harry L A Janssen
- Toronto Centre for Liver Diseases, Toronto General Hospital, 200 Elizabeth Street, Toronto, Ontario M5G 2C4, Canada.
| |
Collapse
|
10
|
Wu L, Yavas G, Hong H, Tong W, Xiao W. Direct comparison of performance of single nucleotide variant calling in human genome with alignment-based and assembly-based approaches. Sci Rep 2017; 7:10963. [PMID: 28887485 PMCID: PMC5591230 DOI: 10.1038/s41598-017-10826-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Accepted: 08/15/2017] [Indexed: 12/30/2022] Open
Abstract
Complementary to reference-based variant detection, recent studies revealed that many novel variants could be detected with de novo assembled genomes. To evaluate the effect of reads coverage and the accuracy of assembly-based variant calling, we simulated short reads containing more than 3 million of single nucleotide variants (SNVs) from the whole human genome and compared the efficiency of SNV calling between the assembly-based and alignment-based calling approaches. We assessed the quality of the assembled contig and found that a minimum of 30X coverage of short reads was needed to ensure reliable SNV calling and to generate assembled contigs with a good coverage of genome and genes. In addition, we observed that the assembly-based approach had a much lower recall rate and precision comparing to the alignment-based approach that would recover 99% of imputed SNVs. We observed similar results with experimental reads for NA24385, an individual whose germline variants were well characterized. Although there are additional values for SNVs detection, the assembly-based approach would have great risk of false discovery of novel SNVs. Further improvement of de novo assembly algorithms are needed in order to warrant a good completeness of genome with haplotype resolved and high fidelity of assembled sequences.
Collapse
Affiliation(s)
- Leihong Wu
- National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR RD, Jefferson, AR, 72079, USA
| | - Gokhan Yavas
- National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR RD, Jefferson, AR, 72079, USA
| | - Huixiao Hong
- National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR RD, Jefferson, AR, 72079, USA
| | - Weida Tong
- National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR RD, Jefferson, AR, 72079, USA
| | - Wenming Xiao
- National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR RD, Jefferson, AR, 72079, USA.
| |
Collapse
|
11
|
Snake Genome Sequencing: Results and Future Prospects. Toxins (Basel) 2016; 8:toxins8120360. [PMID: 27916957 PMCID: PMC5198554 DOI: 10.3390/toxins8120360] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2016] [Revised: 11/23/2016] [Accepted: 11/25/2016] [Indexed: 12/16/2022] Open
Abstract
Snake genome sequencing is in its infancy—very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.
Collapse
|