1
|
Unlocking the potential of Molecular Tumor Boards: from cutting-edge data interpretation to innovative clinical pathways. Crit Rev Oncol Hematol 2024; 199:104379. [PMID: 38718940 DOI: 10.1016/j.critrevonc.2024.104379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/02/2024] [Accepted: 05/01/2024] [Indexed: 05/22/2024] Open
Abstract
The emerging era of precision medicine is characterized by an increasing availability of targeted anticancer therapies and by the parallel development of techniques to obtain more refined molecular data, whose interpretation may not always be straightforward. Molecular tumor boards gather various professional figures, in order to leverage the analysis of molecular data and provide prognostic and predictive insights for clinicians. In addition to healthcare development, they could also become a tool to promote knowledge and research spreading. A growing body of evidence on the application of molecular tumor boards to clinical practice is forming and positive signals are emerging, although a certain degree of heterogeneity exists. This work analyzes molecular tumor boards' potential workflows, figures involved, data sources, sample matrices and eligible patients, as well as available evidence and learning examples. The emerging concept of multi-institutional, disease-specific molecular tumor boards is also considered by presenting two ongoing nationwide experiences.
Collapse
|
2
|
COSAP: Comparative Sequencing Analysis Platform. BMC Bioinformatics 2024; 25:130. [PMID: 38532317 DOI: 10.1186/s12859-024-05756-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 03/20/2024] [Indexed: 03/28/2024] Open
Abstract
BACKGROUND Recent improvements in sequencing technologies enabled detailed profiling of genomic features. These technologies mostly rely on short reads which are merged and compared to reference genome for variant identification. These operations should be done with computers due to the size and complexity of the data. The need for analysis software resulted in many programs for mapping, variant calling and annotation steps. Currently, most programs are either expensive enterprise software with proprietary code which makes access and verification very difficult or open-access programs that are mostly based on command-line operations without user interfaces and extensive documentation. Moreover, a high level of disagreement is observed among popular mapping and variant calling algorithms in multiple studies, which makes relying on a single algorithm unreliable. User-friendly open-source software tools that offer comparative analysis are an important need considering the growth of sequencing technologies. RESULTS Here, we propose Comparative Sequencing Analysis Platform (COSAP), an open-source platform that provides popular sequencing algorithms for SNV, indel, structural variant calling, copy number variation, microsatellite instability and fusion analysis and their annotations. COSAP is packed with a fully functional user-friendly web interface and a backend server which allows full independent deployment for both individual and institutional scales. COSAP is developed as a workflow management system and designed to enhance cooperation among scientists with different backgrounds. It is publicly available at https://cosap.bio and https://github.com/MBaysanLab/cosap/ . The source code of the frontend and backend services can be found at https://github.com/MBaysanLab/cosap-webapi/ and https://github.com/MBaysanLab/cosap_frontend/ respectively. All services are packed as Docker containers as well. Pipelines that combine algorithms can be customized and new algorithms can be added with minimal coding through modular structure. CONCLUSIONS COSAP simplifies and speeds up the process of DNA sequencing analyses providing commonly used algorithms for SNV, indel, structural variant calling, copy number variation, microsatellite instability and fusion analysis as well as their annotations. COSAP is packed with a fully functional user-friendly web interface and a backend server which allows full independent deployment for both individual and institutional scales. Standardized implementations of popular algorithms in a modular platform make comparisons much easier to assess the impact of alternative pipelines which is crucial in establishing reproducibility of sequencing analyses.
Collapse
|
3
|
DVA: predicting the functional impact of single nucleotide missense variants. BMC Bioinformatics 2024; 25:100. [PMID: 38448823 PMCID: PMC10916336 DOI: 10.1186/s12859-024-05709-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 02/16/2024] [Indexed: 03/08/2024] Open
Abstract
BACKGROUND In the past decade, single nucleotide variants (SNVs) have been identified as having a significant relationship with the development and treatment of diseases. Among them, prioritizing missense variants for further functional impact investigation is an essential challenge in the study of common disease and cancer. Although several computational methods have been developed to predict the functional impacts of variants, the predictive ability of these methods is still insufficient in the Mendelian and cancer missense variants. RESULTS We present a novel prediction method called the disease-related variant annotation (DVA) method that predicts the effect of missense variants based on a comprehensive feature set of variants, notably, the allele frequency and protein-protein interaction network feature based on graph embedding. Benchmarked against datasets of single nucleotide missense variants, the DVA method outperforms the state-of-the-art methods by up to 0.473 in the area under receiver operating characteristic curve. The results demonstrate that the proposed method can accurately predict the functional impact of single nucleotide missense variants and substantially outperforms existing methods. CONCLUSIONS DVA is an effective framework for identifying the functional impact of disease missense variants based on a comprehensive feature set. Based on different datasets, DVA shows its generalization ability and robustness, and it also provides innovative ideas for the study of the functional mechanism and impact of SNVs.
Collapse
|
4
|
Implementation of Exome Sequencing to Identify Rare Genetic Diseases. Methods Mol Biol 2024; 2719:79-98. [PMID: 37803113 DOI: 10.1007/978-1-0716-3461-5_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/08/2023]
Abstract
Modern high-throughput genomic testing using next-generation sequencing (NGS) has led to a significant increase in the successful diagnosis of rare genetic disorders. Recent advances in NGS tools and techniques have led to accurate and timely diagnosis of a large proportion of genetic diseases by finding sequence variations in clinical samples. One of the NGS techniques, exome sequencing (ES), is considered as a powerful and easily approachable method for genetic disorders in terms of rapid and cost-effective diagnostic yields. In this chapter, we describe an overview of whole exome sequencing (ES) in the context of experimental and analytical methodologies. Approaches to ES include sequencing capture technique, quality control processes at various stages of sequencing analysis, exome data filtering strategy that incorporates both primary and secondary filtering, and prioritization of candidate variants in diagnosing genetic diseases.
Collapse
|
5
|
Detection of germline variants with pathogenic potential in 48 patients with familial colorectal cancer by using whole exome sequencing. BMC Med Genomics 2023; 16:126. [PMID: 37296477 PMCID: PMC10257304 DOI: 10.1186/s12920-023-01562-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 05/30/2023] [Indexed: 06/12/2023] Open
Abstract
BACKGROUND Hereditary genetic mutations causing predisposition to colorectal cancer are accountable for approximately 30% of all colorectal cancer cases. However, only a small fraction of these are high penetrant mutations occurring in DNA mismatch repair genes, causing one of several types of familial colorectal cancer (CRC) syndromes. Most of the mutations are low-penetrant variants, contributing to an increased risk of familial colorectal cancer, and they are often found in additional genes and pathways not previously associated with CRC. The aim of this study was to identify such variants, both high-penetrant and low-penetrant ones. METHODS We performed whole exome sequencing on constitutional DNA extracted from blood of 48 patients suspected of familial colorectal cancer and used multiple in silico prediction tools and available literature-based evidence to detect and investigate genetic variants. RESULTS We identified several causative and some potentially causative germline variants in genes known for their association with colorectal cancer. In addition, we identified several variants in genes not typically included in relevant gene panels for colorectal cancer, including CFTR, PABPC1 and TYRO3, which may be associated with an increased risk for cancer. CONCLUSIONS Identification of variants in additional genes that potentially can be associated with familial colorectal cancer indicates a larger genetic spectrum of this disease, not limited only to mismatch repair genes. Usage of multiple in silico tools based on different methods and combined through a consensus approach increases the sensitivity of predictions and narrows down a large list of variants to the ones that are most likely to be significant.
Collapse
|
6
|
Genomic Strategies in Mitochondrial Diagnostics. Methods Mol Biol 2023; 2615:397-425. [PMID: 36807806 DOI: 10.1007/978-1-0716-2922-2_27] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
Abstract
Pathogenic variants in both mitochondrial and nuclear genes contribute to the clinical and genetic heterogeneity of mitochondrial diseases. There are now pathogenic variants in over 300 nuclear genes linked to human mitochondrial diseases. Nonetheless, diagnosing mitochondrial disease with a genetic outcome remains challenging. However, there are now many strategies that help us to pinpoint causative variants in patients with mitochondrial disease. This chapter describes some of the approaches and recent advancements in gene/variant prioritization using whole-exome sequencing (WES).
Collapse
|
7
|
Utilizing Large Functional and Population Genomics Resources for CRISPR/Cas Perturbation Experiment Design. Methods Mol Biol 2023; 2637:63-73. [PMID: 36773138 DOI: 10.1007/978-1-0716-3016-7_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Genome sequencing technologies have rapidly evolved in the past decades, enabling us to interpret the human genome through multiple perspectives, ranging from cross-species comparisons, naturally occurring variation in health and disease state to regulatory mechanisms.Although such perspectives are all informative to narrow down the list of genes or variants for perturbation experiments based on specific biological aims, utilizing multiple sources of information is often challenging in practice.In this chapter, we provide an overview of major large-scale functional and population genomics resources, followed by a practical example of selecting target variants for genetic perturbation experiments involving genome engineering techniques such as CRISPR/Cas.
Collapse
|
8
|
SVAT: Secure outsourcing of variant annotation and genotype aggregation. BMC Bioinformatics 2022; 23:409. [PMID: 36182914 PMCID: PMC9526274 DOI: 10.1186/s12859-022-04959-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Accepted: 09/20/2022] [Indexed: 11/10/2022] Open
Abstract
Background Sequencing of thousands of samples provides genetic variants with allele frequencies spanning a very large spectrum and gives invaluable insight into genetic determinants of diseases. Protecting the genetic privacy of participants is challenging as only a few rare variants can easily re-identify an individual among millions. In certain cases, there are policy barriers against sharing genetic data from indigenous populations and stigmatizing conditions. Results We present SVAT, a method for secure outsourcing of variant annotation and aggregation, which are two basic steps in variant interpretation and detection of causal variants. SVAT uses homomorphic encryption to encrypt the data at the client-side. The data always stays encrypted while it is stored, in-transit, and most importantly while it is analyzed. SVAT makes use of a vectorized data representation to convert annotation and aggregation into efficient vectorized operations in a single framework. Also, SVAT utilizes a secure re-encryption approach so that multiple disparate genotype datasets can be combined for federated aggregation and secure computation of allele frequencies on the aggregated dataset. Conclusions Overall, SVAT provides a secure, flexible, and practical framework for privacy-aware outsourcing of annotation, filtering, and aggregation of genetic variants. SVAT is publicly available for download from https://github.com/harmancilab/SVAT. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04959-6.
Collapse
|
9
|
Variant Annotation and Functional Prediction: SnpEff. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2493:289-314. [PMID: 35751823 DOI: 10.1007/978-1-0716-2293-3_19] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Variant annotations, in general, refer to the process of information enrichment of genomic variants from a sequencing experiment. Typically these annotations include functional predictions, such as predicting the amino acid sequence changes from the DNA variant, predicting whether the variant will induce a splice anomaly, or predicting nonsense mediated decay. But other annotations also include combining with genomic databases, adding conservation scores, or comparing to allele frequencies from large population databases. Finally, all these annotations are combined to prioritize and filter variants into a reduced set of highly relevant variants for the study or clinical assay.
Collapse
|
10
|
Complete CFTR gene sequencing in 5,058 individuals with cystic fibrosis informs variant-specific treatment. J Cyst Fibros 2021; 21:463-470. [PMID: 34782259 DOI: 10.1016/j.jcf.2021.10.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 10/27/2021] [Accepted: 10/29/2021] [Indexed: 01/28/2023]
Abstract
BACKGROUND Cystic fibrosis (CF) is a recessive condition caused by variants in each CF transmembrane conductance regulator (CFTR) allele. Clinically affected individuals without two identified causal variants typically have no further interrogation of CFTR beyond examination of coding regions, but the development of variant-specific CFTR-targeted treatments necessitates complete understanding of CFTR genotype. METHODS Whole genome sequences were analyzed on 5,058 individuals with CF. We focused on the full CFTR gene sequence and identified disease-causing variants in three phases: screening for known and structural variants; discovery of novel loss-of-function variants; and investigation of remaining variants. RESULTS All variants identified in the first two phases and coding region variants found in the third phase were interpreted according to CFTR2 or ACMG criteria (n = 371; 16 [4.3%] previously unreported). Full gene sequencing enabled delineation of 18 structural variants (large insertions or deletions), of which two were novel. Additional CFTR variants of uncertain effect were found in 76 F508del homozygotes and in 21 individuals with other combinations of CF-causing variants. Both causative variants were identified in 98.1% (n = 4,960) of subjects, an increase of 2.3 percentage points from the 95.8% (n = 4,847) who had a registry- or chart-reported disease-causing CFTR genotype. Of the remaining 98 individuals, 78 carried one variant that has been associated with CF (CF-causing [n = 70] or resulting in varying clinical consequences n = 8]). CONCLUSIONS Complete CFTR gene sequencing in 5,058 individuals with CF identified at least one DNA variant in 99.6% of the cohort that is targetable by current molecular or emerging gene-based therapeutic technologies.
Collapse
|
11
|
Transcript annotation tool (TransAT): an R package for retrieving annotations for transcript-specific genetic variants. BMC Bioinformatics 2021; 22:350. [PMID: 34182919 PMCID: PMC8240296 DOI: 10.1186/s12859-021-04243-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 06/07/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND An individual's genetics play a role in how RNA transcripts are generated from DNA and consequently in their translation into protein. Transcriptional and translational profiling of patients furnishes the information that a specific marker is present; however, it fails to provide evidence whether the marker correlates with response to a therapeutic agent. A comparative analysis of the frequency of genetic variants, such as single nucleotide polymorphisms (SNPs), in diseased and general populations can identify pathogenic variants in individual patients. This is in part because SNPs have considerable effects on protein function and gene expression when they occur in coding regions and regulatory sequences, respectively. Therefore, a tool that can help users to obtain the allele frequency for a corresponding transcript is the need of the day. Several annotation tools such as SNPnexus and VariED are publicly available; however, none of them can use transcript IDs as input and provide the corresponding genomic positions of variants. RESULTS In this study, we developed an R package, called transcript annotation tool (TransAT), that provides (i) SNP ID and genomic position for a user-provided transcript ID from patients, and (ii) allele frequencies for the SNPs from publicly available global populations. All data elements are extracted, collected, and displayed in an easily downloadable format in two simple command lines. TransAT is available on Windows/Linux/MacOS and is operative for R version 4.0.4 or later. It is available at https://github.com/ShihChingYu/TransAT and can be downloaded and installed using devtools::install_github("ShihChingYu/TransAT", force=T) on the R execution page. Thereafter, all functions can be executed by loading the package into R with library(TransAT). CONCLUSIONS TransAT is a novel tool that seamlessly provides genetic annotations for queried transcripts. Such easily obtainable information would be greatly advantageous for physicians, assisting them to make individualized decisions about specific drug treatments. Moreover, allele frequencies from user-chosen global ethnic populations will highlight the importance of ethnicity and its effect on patient pathogenicity.
Collapse
|
12
|
Abstract
Clinical bioinformatics system is well-established for diagnosing genetic disease based on next-generation sequencing, but requires special considerations when being adapted for the next-generation sequencing-based genetic diagnosis of mitochondrial diseases. Challenges are caused by the involvement of mitochondrial DNA genome in disease etiology. Heteroplasmy and haplogroup are key factors in interpreting mitochondrial DNA variant effects. Data resources and tools for analyzing variant and sequencing data are available at MSeqDR, MitoMap, and HmtDB. Revised specifications of the American College of Medical Genetics/Association of Molecular Pathology standards and guidelines for mitochondrial DNA variant interpretation are proposed by the MSeqDr Consortium and community experts.
Collapse
|
13
|
Cruxome: a powerful tool for annotating, interpreting and reporting genetic variants. BMC Genomics 2021; 22:407. [PMID: 34082700 PMCID: PMC8173893 DOI: 10.1186/s12864-021-07728-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 05/20/2021] [Indexed: 01/23/2023] Open
Abstract
Background Next-generation sequencing (NGS) is an efficient tool used for identifying pathogenic variants that cause Mendelian disorders. However, the lack of bioinformatics training of researchers makes the interpretation of identified variants a challenge in terms of precision and efficiency. In addition, the non-standardized phenotypic description of human diseases also makes it difficult to establish an integrated analysis pathway for variant annotation and interpretation. Solutions to these bottlenecks are urgently needed. Results We develop a tool named “Cruxome” to automatically annotate and interpret single nucleotide variants (SNVs) and small insertions and deletions (InDels). Our approach greatly simplifies the current burdensome task of clinical geneticists and scientists to identify the causative pathogenic variants and build personal knowledge reference bases. The integrated architecture of Cruxome offers key advantages such as an interactive and user-friendly interface and the assimilation of electronic health records of the patient. By combining a natural language processing algorithm, Cruxome can efficiently process the clinical description of diseases to HPO standardized vocabularies. By using machine learning, in silico predictive algorithms, integrated multiple databases and supplementary tools, Cruxome can automatically process SNVs and InDels variants (trio-family or proband-only cases) and clinical diagnosis records, then annotate, score, identify and interpret pathogenic variants to finally generate a standardized clinical report following American College of Medical Genetics and Genomics/ Association for Molecular Pathology (ACMG/AMP) guidelines. Cruxome also provides supplementary tools to examine and visualize the genes or variations in historical cases, which can help to better understand the genetic basis of the disease. Conclusions Cruxome is an efficient tool for annotation and interpretation of variations and dramatically reduces the workload for clinical geneticists and researchers to interpret NGS results, simplifying their decision-making processes. We present an online version of Cruxome, which is freely available to academics and clinical researchers. The site is accessible at http://114.251.61.49:10024/cruxome/. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07728-6.
Collapse
|
14
|
Decoding the genome of superior chapatti quality Indian wheat variety 'C 306' unravelled novel genomic variants for chapatti and nutrition quality related genes. Genomics 2021; 113:1919-1929. [PMID: 33823224 DOI: 10.1016/j.ygeno.2021.03.031] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 03/18/2021] [Accepted: 03/29/2021] [Indexed: 11/26/2022]
Abstract
An Indian wheat variety, 'C 306' has good chapatti quality, which is controlled by multiple genes that have not been explored. We report the high quality de novo assembled genome of 'C 306' by combining short and long read sequencing data. The hybrid assembly covered 93% of gene space and identified about 142 K coding genes, 34% repetitive DNA and ~ 501 K SSR motifs. The phylogenetic analysis of about 83 K orthologous protein groups suggested the closest relationship with T. turgidum, T. aestivum and Ae. tauschii. Genome wide analysis annotated 69,217,536 genomic variants. Out of them, 1423 missense and 117 deleterious variants identified in processing, nutrition, and chapatti quality related genes such as alpha- and beta-gliadin, SSI, SSIII, SUT1, SBEI, CHS, YSL, DMAS, and NAS encoded proteins. These variants may affect quality genes. The genomic data will be potential genomic resources in wheat breeding programs for quality improvement.
Collapse
|
15
|
Detecting Causal Variants in Mendelian Disorders Using Whole-Genome Sequencing. Methods Mol Biol 2021; 2243:1-25. [PMID: 33606250 DOI: 10.1007/978-1-0716-1103-6_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
Abstract
Increasingly affordable sequencing technologies are revolutionizing the field of genomic medicine. It is now feasible to interrogate all major classes of variation in an individual across the entire genome for less than $1000 USD. While the generation of patient sequence information using these technologies has become routine, the analysis and interpretation of this data remains the greatest obstacle to widespread clinical implementation. This chapter summarizes the steps to identify, annotate, and prioritize variant information required for clinical report generation. We discuss methods to detect each variant class and describe strategies to increase the likelihood of detecting causal variant(s) in Mendelian disease. Lastly, we describe a sample workflow for synthesizing large amount of genetic information into concise clinical reports.
Collapse
|
16
|
Pitfalls in variant annotation for hereditary cancer diagnostics: The example of Illumina® VariantStudio®. Genomics 2020; 113:748-754. [PMID: 33053411 DOI: 10.1016/j.ygeno.2020.10.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 08/04/2020] [Accepted: 10/08/2020] [Indexed: 11/27/2022]
Abstract
Next Generation Sequencing (NGS), and specifically targeted panel sequencing is the state-of-the-art in clinical genetic diagnosis of Mendelian diseases. However, the bioinformatics analysis and interpretation of the generated data can be challenging. A spotlight on the default transcript selection of a user-friendly, commercially available software that is widely used by genetics professionals, i.e. Illumina® VariantStudio®, is presented. For the sake of comparison, we employed Ensembl VEP, an open-source command-line tool, as it provides flexibility regarding transcript selection. The analysis of NGS data deriving from sequencing of 857 germline DNA samples of cancer patients indicated a concordance of 82.82% between the two software programs. Significantly, using the default transcript configuration of VariantStudio®, we failed to annotate correctly 11.45% of the identified loss-of-function variants. Our results underline the importance of cautious software and transcript selection and the need for reliable, white-box data analysis, along with bioinformatics expertise in clinical diagnostics.
Collapse
|
17
|
AMLVaran: a software approach to implement variant analysis of targeted NGS sequencing data in an oncological care setting. BMC Med Genomics 2020; 13:17. [PMID: 32019565 PMCID: PMC7001226 DOI: 10.1186/s12920-020-0668-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Accepted: 01/21/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Next-Generation Sequencing (NGS) enables large-scale and cost-effective sequencing of genetic samples in order to detect genetic variants. After successful use in research-oriented projects, NGS is now entering clinical practice. Consequently, variant analysis is increasingly important to facilitate a better understanding of disease entities and prognoses. Furthermore, variant calling allows to adapt and optimize specific treatments of individual patients, and thus is an integral part of personalized medicine.However, the analysis of NGS data typically requires a number of complex bioinformatics processing steps. A flexible and reliable software that combines the variant analysis process with a simple, user-friendly interface is therefore highly desirable, but still lacking. RESULTS With AMLVaran (AML Variant Analyzer), we present a web-based software, that covers the complete variant analysis workflow of targeted NGS samples. The software provides a generic pipeline that allows free choice of variant calling tools and a flexible language (SSDL) for filtering variant lists. AMLVaran's interactive website presents comprehensive annotation data and includes curated information on relevant hotspot regions and driver mutations. A concise clinical report with rule-based diagnostic recommendations is generated.An AMLVaran configuration with eight variant calling tools and a complex scoring scheme, based on the somatic variant calling pipeline appreci8, was used to analyze three datasets from AML and MDS studies with 402 samples in total. Maximum sensitivity and positive predictive values were 1.0 and 0.96, respectively. The tool's usability was found to be satisfactory by medical professionals. CONCLUSION Coverage analysis, reproducible variant filtering and software usability are important for clinical assessment of variants. AMLVaran performs reliable NGS variant analyses and generates reports fulfilling the requirements of a clinical setting. Due to its generic design, the software can easily be adapted for use with different targeted panels for other tumor entities, or even for whole-exome data. AMLVaran has been deployed to a public web server and is distributed with Docker scripts for local use.
Collapse
|
18
|
PGG.SNV: understanding the evolutionary and medical implications of human single nucleotide variations in diverse populations. Genome Biol 2019; 20:215. [PMID: 31640808 PMCID: PMC6805450 DOI: 10.1186/s13059-019-1838-5] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Accepted: 09/26/2019] [Indexed: 12/23/2022] Open
Abstract
Despite the tremendous growth of the DNA sequencing data in the last decade, our understanding of the human genome is still in its infancy. To understand the implications of genetic variants in the light of population genetics and molecular evolution, we developed a database, PGG.SNV ( https://www.pggsnv.org ), which gives much higher weight to previously under-investigated indigenous populations in Asia. PGG.SNV archives 265 million SNVs across 220,147 present-day genomes and 1018 ancient genomes, including 1009 newly sequenced genomes, representing 977 global populations. Moreover, estimation of population genetic diversity and evolutionary parameters is available in PGG.SNV, a unique feature compared with other databases.
Collapse
|
19
|
Abstract
Tumor genomic profiling involves analyzing many data types to produce a molecular profile of a tumor. Many of these analyses result in a prioritized list of genes or variants for further study. Interpretation of these lists relies upon annotating and extracting biological meaning through literature and manually curated knowledge bases. This chapter will describe several of these approaches including gene annotation, variant annotation, clinical annotation, functional enrichment analyses, and network analyses. Taken together or individually, these analyses will result in a biological understanding of complex genomic data to improve clinical decision making.
Collapse
|
20
|
Abstract
This chapter contains a step-by-step protocol for identifying somatic SNPs and small Indels from next-generation sequencing data of tumor samples and matching normal samples. The workflow presented here is largely based on the Broad Institute's "Best Practices" guidelines and makes use of their Genome Analysis Toolkit (GATK) platform. Variants are annotated with population allele frequencies and curated resources such as GnomAD and ClinVar and curated effect predictions from dbNSFP using VCFtools, SnpEff, and SnpSift.
Collapse
|
21
|
Calcium interactions with Cx26 hemmichannel: Spatial association between MD simulations biding sites and variant pathogenicity. Comput Biol Chem 2018; 77:331-342. [PMID: 30466042 DOI: 10.1016/j.compbiolchem.2018.11.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Revised: 07/08/2018] [Accepted: 11/08/2018] [Indexed: 01/23/2023]
Abstract
Connexinophaties are a collective of diseases related to connexin channels and hemichannels. In particular many Cx26 alterations are strongly associated to human deafness. Calcium plays an important role on this structures regulation. Here, using calcium as a probe, extensive atomistic Molecular Dynamics simulations were performed on the Cx26 hemichannel embedded in a lipid bilayer. Exploring different initial conditions and calcium concentration, simulation reached ∼4 μs. Several analysis were carried out in order to reveal the calcium distribution and localization, such as electron density profiles, density maps and distance time evolution, which is directly associated to the interaction energy. Specific amino acid interactions with calcium and their stability were capture within this context. Few of these sites such as, GLU42, GLU47, GLY45 and ASP50, were already suggested in the literature. Besides, we identified novel calcium biding sites: ASP2, ASP117, ASP159, GLU114, GLU119, GLU120 and VAL226. To the best of our knowledge, this is the first time that these sites are reported within this context. Furthermore, since various pathologies involving the Cx26 hemichannel are associated with pathogenic variants in the corresponding CJB2 gene, using ClinVar, we were able to spatially associate the 3D positions of the identified calcium binding sites within the framework of this work with reported pathogenic variants in the CJB2 gene. This study presents a first step on finding associations between molecular features and pathological variants of the Cx26 hemichannel.
Collapse
|
22
|
Predicting variant deleteriousness in non-human species: applying the CADD approach in mouse. BMC Bioinformatics 2018; 19:373. [PMID: 30314430 PMCID: PMC6186050 DOI: 10.1186/s12859-018-2337-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 08/23/2018] [Indexed: 12/30/2022] Open
Abstract
Background Predicting the deleteriousness of observed genomic variants has taken a step forward with the introduction of the Combined Annotation Dependent Depletion (CADD) approach, which trains a classifier on the wealth of available human genomic information. This raises the question whether it can be done with less data for non-human species. Here, we investigate the prerequisites to construct a CADD-based model for a non-human species. Results Performance of the mouse model is competitive with that of the human CADD model and better than established methods like PhastCons conservation scores and SIFT. Like in the human case, performance varies for different genomic regions and is best for coding regions. We also show the benefits of generating a species-specific model over lifting variants to a different species or applying a generic model. With fewer genomic annotations, performance on the test set as well as on the three validation sets is still good. Conclusions It is feasible to construct species-specific CADD models even when annotations such as epigenetic markers are not available. The minimal requirement for these models is the availability of a set of genomes of closely related species that can be used to infer an ancestor genome and substitution rates for the data generation. Electronic supplementary material The online version of this article (10.1186/s12859-018-2337-5) contains supplementary material, which is available to authorized users.
Collapse
|
23
|
Systematic identification and annotation of multiple-variant compound effects at transcription factor binding sites in human genome. J Genet Genomics 2018; 45:373-379. [PMID: 30054217 DOI: 10.1016/j.jgg.2018.05.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2017] [Revised: 05/03/2018] [Accepted: 05/25/2018] [Indexed: 12/21/2022]
Abstract
Understanding the functional effects of genetic variants is crucial in modern genomics and genetics. Transcription factor binding sites (TFBSs) are one of the most important cis-regulatory elements. While multiple tools have been developed to assess functional effects of genetic variants at TFBSs, they usually assume that each variant works in isolation and neglect the potential "interference" among multiple variants within the same TFBS. In this study, we presented COPE-TFBS (Context-Oriented Predictor for variant Effect on Transcription Factor Binding Site), a novel method that considers sequence context to accurately predict variant effects on TFBSs. We systematically re-analyzed the sequencing data from both the 1000 Genomes Project and the Genotype-Tissue Expression (GTEx) Project via COPE-TFBS, and identified numbers of novel TFBSs, transformed TFBSs and discordantly annotated TFBSs resulting from multiple variants, further highlighting the necessity of sequence context in accurately annotating genetic variants. COPE-TFBS is freely available for academic use at http://cope.cbi.pku.edu.cn/.
Collapse
|
24
|
Abstract
Background The advent and ongoing development of next generation sequencing technologies (NGS) has led to a rapid increase in the rate of human genome re-sequencing data, paving the way for personalized genomics and precision medicine. The body of genome resequencing data is progressively increasing underlining the need for accurate and time-effective bioinformatics systems for genotyping - a crucial prerequisite for identification of candidate causal mutations in diagnostic screens. Results Here we present CoVaCS, a fully automated, highly accurate system with a web based graphical interface for genotyping and variant annotation. Extensive tests on a gold standard benchmark data-set -the NA12878 Illumina platinum genome- confirm that call-sets based on our consensus strategy are completely in line with those attained by similar command line based approaches, and far more accurate than call-sets from any individual tool. Importantly our system exhibits better sensitivity and higher specificity than equivalent commercial software. Conclusions CoVaCS offers optimized pipelines integrating state of the art tools for variant calling and annotation for whole genome sequencing (WGS), whole-exome sequencing (WES) and target-gene sequencing (TGS) data. The system is currently hosted at Cineca, and offers the speed of a HPC computing facility, a crucial consideration when large numbers of samples must be analysed. Importantly, all the analyses are performed automatically allowing high reproducibility of the results. As such, we believe that CoVaCS can be a valuable tool for the analysis of human genome resequencing studies. CoVaCS is available at: https://bioinformatics.cineca.it/covacs. Electronic supplementary material The online version of this article (10.1186/s12864-018-4508-1) contains supplementary material, which is available to authorized users.
Collapse
|
25
|
Discovery of Variants Underlying Host Susceptibility to Virus Infection Using Whole-Exome Sequencing. Methods Mol Biol 2017; 1656:209-227. [PMID: 28808973 PMCID: PMC7120756 DOI: 10.1007/978-1-4939-7237-1_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
The clinical course of any viral infection greatly differs in individuals. This variation results from various viral, host, and environmental factors. The identification of host genetic factors influencing inter-individual variation in susceptibility to several pathogenic viruses has tremendously increased our understanding of the mechanisms and pathways required for immunity. Next-generation sequencing of whole exomes represents a powerful tool in biomedical research. In this chapter, we briefly introduce whole-exome sequencing in the context of genetic approaches to identify host susceptibility genes to viral infections. We then describe general aspects of the workflow for whole-exome sequence analysis together with the tools and online resources that can be used to identify and annotate variant calls, and then prioritize them for their potential association to phenotypes of interest.
Collapse
|
26
|
India Allele Finder: a web-based annotation tool for identifying common alleles in next-generation sequencing data of Indian origin. BMC Res Notes 2017; 10:233. [PMID: 28655339 PMCID: PMC5488357 DOI: 10.1186/s13104-017-2556-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Accepted: 06/19/2017] [Indexed: 11/27/2022] Open
Abstract
Objective We built India Allele Finder, an online searchable database and command line tool, that gives researchers access to variant frequencies of Indian Telugu individuals, using publicly available fastq data from the 1000 Genomes Project. Access to appropriate population-based genomic variant annotation can accelerate the interpretation of genomic sequencing data. In particular, exome analysis of individuals of Indian descent will identify population variants not reflected in European exomes, complicating genomic analysis for such individuals. Results India Allele Finder offers improved ease-of-use to investigators seeking to identify and annotate sequencing data from Indian populations. We describe the use of India Allele Finder to identify common population variants in a disease quartet whole exome dataset, reducing the number of candidate single nucleotide variants from 84 to 7. India Allele Finder is freely available to investigators to annotate genomic sequencing data from Indian populations. Use of India Allele Finder allows efficient identification of population variants in genomic sequencing data, and is an example of a population-specific annotation tool that simplifies analysis and encourages international collaboration in genomics research.
Collapse
|
27
|
Pan-cancer analysis reveals technical artifacts in TCGA germline variant calls. BMC Genomics 2017; 18:458. [PMID: 28606096 PMCID: PMC5467262 DOI: 10.1186/s12864-017-3770-y] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 05/07/2017] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Cancer research to date has largely focused on somatically acquired genetic aberrations. In contrast, the degree to which germline, or inherited, variation contributes to tumorigenesis remains unclear, possibly due to a lack of accessible germline variant data. Here we called germline variants on 9618 cases from The Cancer Genome Atlas (TCGA) database representing 31 cancer types. RESULTS We identified batch effects affecting loss of function (LOF) variant calls that can be traced back to differences in the way the sequence data were generated both within and across cancer types. Overall, LOF indel calls were more sensitive to technical artifacts than LOF Single Nucleotide Variant (SNV) calls. In particular, whole genome amplification of DNA prior to sequencing led to an artificially increased burden of LOF indel calls, which confounded association analyses relating germline variants to tumor type despite stringent indel filtering strategies. The samples affected by these technical artifacts include all acute myeloid leukemia and practically all ovarian cancer samples. CONCLUSIONS We demonstrate how technical artifacts induced by whole genome amplification of DNA can lead to false positive germline-tumor type associations and suggest TCGA whole genome amplified samples be used with caution. This study draws attention to the need to be sensitive to problems associated with a lack of uniformity in data generation in TCGA data.
Collapse
|
28
|
QueryOR: a comprehensive web platform for genetic variant analysis and prioritization. BMC Bioinformatics 2017; 18:225. [PMID: 28454514 PMCID: PMC5410040 DOI: 10.1186/s12859-017-1654-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 04/26/2017] [Indexed: 11/21/2022] Open
Abstract
Background Whole genome and exome sequencing are contributing to the extraordinary progress in the study of human genetic variants. In this fast developing field, appropriate and easily accessible tools are required to facilitate data analysis. Results Here we describe QueryOR, a web platform suitable for searching among known candidate genes as well as for finding novel gene-disease associations. QueryOR combines several innovative features that make it comprehensive, flexible and easy to use. Instead of being designed on specific datasets, it works on a general XML schema specifying formats and criteria of each data source. Thanks to this flexibility, new criteria can be easily added for future expansion. Currently, up to 70 user-selectable criteria are available, including a wide range of gene and variant features. Moreover, rather than progressively discarding variants taking one criterion at a time, the prioritization is achieved by a global positive selection process that considers all transcript isoforms, thus producing reliable results. QueryOR is easy to use and its intuitive interface allows to handle different kinds of inheritance as well as features related to sharing variants in different patients. QueryOR is suitable for investigating single patients, families or cohorts. Conclusions QueryOR is a comprehensive and flexible web platform eligible for an easy user-driven variant prioritization. It is freely available for academic institutions at http://queryor.cribi.unipd.it/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1654-4) contains supplementary material, which is available to authorized users.
Collapse
|
29
|
DIVAN: accurate identification of non-coding disease-specific risk variants using multi-omics profiles. Genome Biol 2016; 17:252. [PMID: 27923386 PMCID: PMC5139035 DOI: 10.1186/s13059-016-1112-z] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 11/18/2016] [Indexed: 12/22/2022] Open
Abstract
Understanding the link between non-coding sequence variants, identified in genome-wide association studies, and the pathophysiology of complex diseases remains challenging due to a lack of annotations in non-coding regions. To overcome this, we developed DIVAN, a novel feature selection and ensemble learning framework, which identifies disease-specific risk variants by leveraging a comprehensive collection of genome-wide epigenomic profiles across cell types and factors, along with other static genomic features. DIVAN accurately and robustly recognizes non-coding disease-specific risk variants under multiple testing scenarios; among all the features, histone marks, especially those marks associated with repressed chromatin, are often more informative than others.
Collapse
|
30
|
Abstract
The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.
Collapse
|
31
|
Abstract
The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.
Collapse
|
32
|
Standardized decision support in next generation sequencing reports of somatic cancer variants. Mol Oncol 2014; 8:859-73. [PMID: 24768039 DOI: 10.1016/j.molonc.2014.03.021] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2014] [Revised: 03/18/2014] [Accepted: 03/26/2014] [Indexed: 12/31/2022] Open
Abstract
Of hundreds to thousands of somatic mutations that exist in each cancer genome, a large number are unique and non-recurrent variants. Prioritizing genetic variants identified via next generation sequencing technologies remains a major challenge. Many such variants occur in tumor genes that have well-established biological and clinical relevance and are putative targets of molecular therapy, however, most variants are still of unknown significance. With large amounts of data being generated as high throughput sequencing assays enter the clinical realm, there is a growing need to better communicate relevant findings in a timely manner while remaining cognizant of the potential consequences of misuse or overinterpretation of genomic information. Herein we describe a systematic framework for variant annotation and prioritization, and we propose a structured molecular pathology report using standardized terminology in order to best inform oncology clinical practice. We hope that our experience developing a comprehensive knowledge database of emerging predictive markers matched to targeted therapies will help other institutions implement similar programs.
Collapse
|