1
|
Sarkar A, Yang Y, Vihinen M. Variation benchmark datasets: update, criteria, quality and applications. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5710862. [PMID: 32016318 PMCID: PMC6997940 DOI: 10.1093/database/baz117] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 06/03/2019] [Accepted: 07/01/2019] [Indexed: 02/07/2023]
Abstract
Development of new computational methods and testing their performance has to be carried out using experimental data. Only in comparison to existing knowledge can method performance be assessed. For that purpose, benchmark datasets with known and verified outcome are needed. High-quality benchmark datasets are valuable and may be difficult, laborious and time consuming to generate. VariBench and VariSNP are the two existing databases for sharing variation benchmark datasets used mainly for variation interpretation. They have been used for training and benchmarking predictors for various types of variations and their effects. VariBench was updated with 419 new datasets from 109 papers containing altogether 329 014 152 variants; however, there is plenty of redundancy between the datasets. VariBench is freely available at http://structure.bmc.lu.se/VariBench/. The contents of the datasets vary depending on information in the original source. The available datasets have been categorized into 20 groups and subgroups. There are datasets for insertions and deletions, substitutions in coding and non-coding region, structure mapped, synonymous and benign variants. Effect-specific datasets include DNA regulatory elements, RNA splicing, and protein property for aggregation, binding free energy, disorder and stability. Then there are several datasets for molecule-specific and disease-specific applications, as well as one dataset for variation phenotype effects. Variants are often described at three molecular levels (DNA, RNA and protein) and sometimes also at the protein structural level including relevant cross references and variant descriptions. The updated VariBench facilitates development and testing of new methods and comparison of obtained performances to previously published methods. We compared the performance of the pathogenicity/tolerance predictor PON-P2 to several benchmark studies, and show that such comparisons are feasible and useful, however, there may be limitations due to lack of provided details and shared data. Database URL: http://structure.bmc.lu.se/VariBench
Collapse
Affiliation(s)
- Anasua Sarkar
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22 184 Lund, Sweden
| | - Yang Yang
- School of Computer Science and Technology, Soochow University, No1. Shizi Street, Suzhou, 215006 Jiangsu, China.,Provincial Key Laboratory for Computer Information Processing Technology, No1. Shizi Street, Soochow University, Suzhou, 215006 Jiangsu, China
| | - Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22 184 Lund, Sweden
| |
Collapse
|
2
|
Yao Y, Ramsey SA. CERENKOV3: Clustering and molecular network-derived features improve computational prediction of functional noncoding SNPs. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020; 25:535-546. [PMID: 31797625 PMCID: PMC6897322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Identification of causal noncoding single nucleotide polymorphisms (SNPs) is important for maximizing the knowledge dividend from human genome-wide association studies (GWAS). Recently, diverse machine learning-based methods have been used for functional SNP identification; however, this task remains a fundamental challenge in computational biology. We report CERENKOV3, a machine learning pipeline that leverages clustering-derived and molecular network-derived features to improve prediction accuracy of regulatory SNPs (rSNPs) in the context of post-GWAS analysis. The clustering-derived feature, locus size (number of SNPs in the locus), derives from our locus partitioning procedure and represents the sizes of clusters based on SNP locations. We generated two molecular network-derived features from representation learning on a network representing SNP-gene and gene-gene relations. Based on empirical studies using a ground-truth SNP dataset, CERENKOV3 significantly improves rSNP recognition performance in AUPRC, AUROC, and AVGRANK (a locus-wise rank-based measure of classification accuracy we previously proposed).
Collapse
Affiliation(s)
- Yao Yao
- School of Electrical Engineering and Computer Science, Oregon State University
| | - Stephen A. Ramsey
- School of Electrical Engineering and Computer Science, Oregon State University,Department of Biomedical Sciences, Oregon State University Corvallis, OR 97330, USA
| |
Collapse
|
3
|
Carinci F, Romanos GE, Scapoli L. Molecular tools for preventing and improving diagnosis of peri-implant diseases. Periodontol 2000 2019; 81:41-47. [PMID: 31407432 DOI: 10.1111/prd.12281] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Peri-implantitis is an inflammatory disease of tissues surrounding osseointegrated dental implants. Inflammation affecting soft and hard peri-implant tissues can cause alveolar bone resorption and subsequent implant loss. Clinical surveillance and early diagnosis are of paramount importance to reduce clinical failures and improve implant survival. Current diagnosis of implants is based on clinical and radiological signs. Molecular tests are an emerging diagnostic methodology, which potentially can help to detect and prevent early peri-implantitis and monitor the efficacy of therapy as well. A plethora of potential biomarkers are potentially available to support the clinical diagnosis of peri-implantitis. However, conflicting diagnostic conclusions have been reached, probably related to weak statistical results due to limited sample size or disease heterogeneity. The present paper reviews candidate diagnostic biomarkers for peri-implantitis, including infective agents, genetic susceptibility factors, and key proteins related to inflammation and tissue remodeling.
Collapse
Affiliation(s)
- Francesco Carinci
- Department of Morphology, Surgery and Experimental Medicine, University of Ferrara, Ferrara, Italy
| | - Georgios E Romanos
- Department of Periodontology, School of Dental Medicine, Stony Brook University, Stony Brook, NY, USA
| | - Luca Scapoli
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, Bologna, Italy
| |
Collapse
|
4
|
Farmohammadi A, Tavangar A, Ehteram M, Karimian M. Association of A-197G polymorphism in interleukin-17 gene with chronic periodontitis: Evidence from six case-control studies with a computational biology approach. ACTA ACUST UNITED AC 2019; 10:e12424. [PMID: 31231967 DOI: 10.1111/jicd.12424] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Revised: 03/01/2019] [Accepted: 04/26/2019] [Indexed: 12/19/2022]
Abstract
AIM The aim of the present study was to evaluate the association of interleukin-17 (IL-17) A-197G gene polymorphism with chronic periodontitis (CP) in a case-control study, a meta-analysis, and an in silico approach. METHODS In the case-control study, 122 cases with CP and 126 healthy controls were recruited; IL-17 A-197G genotyping was performed by polymerase chain reaction-restriction fragment length polymorphism. In the meta-analysis, comprehensive literature retrieval was performed on valid databases to identify relevant studies. Bioinformatics tools were employed to investigate the effects of A-197G transition on the promoter region of IL-17. RESULTS Our case-control study revealed a significant association between IL-17 A-197G transition and CP. The overall meta-analysis revealed significant associations between the IL-17 A-197G polymorphism and CP risk in homozygote co-dominant and recessive models. The stratified analysis also showed a statistically significant association between the mentioned transition and CP risk in the Caucasian population. The in silico analysis revealed that the A-197G polymorphism could make changes in protein-binding sites of the IL-17 promoter region. CONCLUSIONS Our study supports that IL-17 A-197G transition could be a genetic risk factor for CP. However, further studies with a larger sample size among different ethnicities are required to obtain a more accurate conclusion.
Collapse
Affiliation(s)
- Amir Farmohammadi
- Department of Oral and Maxillofacial Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Atefeh Tavangar
- Department of Oral and Maxillofacial Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Mohammad Ehteram
- Anatomical Sciences Research Center, Kashan University of Medical Sciences, Kashan, Iran
| | - Mohammad Karimian
- Anatomical Sciences Research Center, Kashan University of Medical Sciences, Kashan, Iran
| |
Collapse
|
5
|
Yao Y, Liu Z, Wei Q, Ramsey SA. CERENKOV2: improved detection of functional noncoding SNPs using data-space geometric features. BMC Bioinformatics 2019; 20:63. [PMID: 30727967 PMCID: PMC6364436 DOI: 10.1186/s12859-019-2637-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 01/18/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND We previously reported on CERENKOV, an approach for identifying regulatory single nucleotide polymorphisms (rSNPs) that is based on 246 annotation features. CERENKOV uses the xgboost classifier and is designed to be used to find causal noncoding SNPs in loci identified by genome-wide association studies (GWAS). We reported that CERENKOV has state-of-the-art performance (by two traditional measures and a novel GWAS-oriented measure, AVGRANK) in a comparison to nine other tools for identifying functional noncoding SNPs, using a comprehensive reference SNP set (OSU17, 15,331 SNPs). Given that SNPs are grouped within loci in the reference SNP set and given the importance of the data-space manifold geometry for machine-learning model selection, we hypothesized that within-locus inter-SNP distances would have class-based distributional biases that could be exploited to improve rSNP recognition accuracy. We thus defined an intralocus SNP "radius" as the average data-space distance from a SNP to the other intralocus neighbors, and explored radius likelihoods for five distance measures. RESULTS We expanded the set of reference SNPs to 39,083 (the OSU18 set) and extracted CERENKOV SNP feature data. We computed radius empirical likelihoods and likelihood densities for rSNPs and control SNPs, and found significant likelihood differences between rSNPs and control SNPs. We fit parametric models of likelihood distributions for five different distance measures to obtain ten log-likelihood features that we combined with the 248-dimensional CERENKOV feature matrix. On the OSU18 SNP set, we measured the classification accuracy of CERENKOV with and without the new distance-based features, and found that the addition of distance-based features significantly improves rSNP recognition performance as measured by AUPVR, AUROC, and AVGRANK. Along with feature data for the OSU18 set, the software code for extracting the base feature matrix, estimating ten distance-based likelihood ratio features, and scoring candidate causal SNPs, are released as open-source software CERENKOV2. CONCLUSIONS Accounting for the locus-specific geometry of SNPs in data-space significantly improved the accuracy with which noncoding rSNPs can be computationally identified.
Collapse
Affiliation(s)
- Yao Yao
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, 97330 OR USA
- Department of Biomedical Sciences, Oregon State University, 106 Dryden Hall, Corvallis, 97330 OR USA
| | - Zheng Liu
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, 97330 OR USA
- Department of Biomedical Sciences, Oregon State University, 106 Dryden Hall, Corvallis, 97330 OR USA
| | - Qi Wei
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, 97330 OR USA
- Department of Biomedical Sciences, Oregon State University, 106 Dryden Hall, Corvallis, 97330 OR USA
| | - Stephen A. Ramsey
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, 97330 OR USA
- Department of Biomedical Sciences, Oregon State University, 106 Dryden Hall, Corvallis, 97330 OR USA
| |
Collapse
|
6
|
Bryzgalov LO, Korbolina EE, Brusentsov II, Leberfarb EY, Bondar NP, Merkulova TI. Novel functional variants at the GWAS-implicated loci might confer risk to major depressive disorder, bipolar affective disorder and schizophrenia. BMC Neurosci 2018; 19:22. [PMID: 29745862 PMCID: PMC5998904 DOI: 10.1186/s12868-018-0414-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND A challenge of understanding the mechanisms underlying cognition including neurodevelopmental and neuropsychiatric disorders is mainly given by the potential severity of cognitive disorders for the quality of life and their prevalence. However, the field has been focused predominantly on protein coding variation until recently. Given the importance of tightly controlled gene expression for normal brain function, the goal of the study was to assess the functional variation including non-coding variation in human genome that is likely to play an important role in cognitive functions. To this end, we organized and utilized available genome-wide datasets from genomic, transcriptomic and association studies into a comprehensive data corpus. We focused on genomic regions that are enriched in regulatory activity-overlapping transcriptional factor binding regions and repurpose our data collection especially for identification of the regulatory SNPs (rSNPs) that showed associations both with allele-specific binding and allele-specific expression. We matched these rSNPs to the nearby and distant targeted genes and then selected the variants that could implicate the etiology of cognitive disorders according to Genome-Wide Association Studies (GWAS). Next, we use DeSeq 2.0 package to test the differences in the expression of the certain targeted genes between the controls and the patients that were diagnosed bipolar affective disorder and schizophrenia. Finally, we assess the potential biological role for identified drivers of cognition using DAVID and GeneMANIA. RESULTS As a result, we selected fourteen regulatory SNPs locating within the loci, implicated from GWAS for cognitive disorders with six of the variants unreported previously. Grouping of the targeted genes according to biological functions revealed the involvement of processes such as 'posttranscriptional regulation of gene expression', 'neuron differentiation', 'neuron projection development', 'regulation of cell cycle process' and 'protein catabolic processes'. We identified four rSNP-targeted genes that showed differential expression between patient and control groups depending on brain region: NRAS-in schizophrenia cohort, CDC25B, DDX21 and NUCKS1-in bipolar disorder cohort. CONCLUSIONS Overall, our findings are likely to provide the keys for unraveling the mechanisms that underlie cognitive functions including major depressive disorder, bipolar disorder and schizophrenia etiopathogenesis.
Collapse
Affiliation(s)
- Leonid O. Bryzgalov
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Science, 10 Lavrentyeva Prospekt, Novosibirsk, Russian Federation 630090
| | - Elena E. Korbolina
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Science, 10 Lavrentyeva Prospekt, Novosibirsk, Russian Federation 630090
- The Novosibirsk State University, 1 Pirogova st., Novosibirsk, Russian Federation 630090
| | - Ilja I. Brusentsov
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Science, 10 Lavrentyeva Prospekt, Novosibirsk, Russian Federation 630090
| | - Elena Y. Leberfarb
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Science, 10 Lavrentyeva Prospekt, Novosibirsk, Russian Federation 630090
| | - Natalia P. Bondar
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Science, 10 Lavrentyeva Prospekt, Novosibirsk, Russian Federation 630090
- The Novosibirsk State University, 1 Pirogova st., Novosibirsk, Russian Federation 630090
| | - Tatiana I. Merkulova
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Science, 10 Lavrentyeva Prospekt, Novosibirsk, Russian Federation 630090
- The Novosibirsk State University, 1 Pirogova st., Novosibirsk, Russian Federation 630090
| |
Collapse
|
7
|
Abstract
Background Germline mutations in the coding sequence of the tumour suppressor APC gene give rise to familial adenomatous polyposis (which leads to colorectal cancer) and are associated with many other oncopathologies. The loss of APC function because of deletion of putative promoter 1A or 1B also results in the development of colorectal cancer. Since the regions of promoters 1A and 1B contain many single nucleotide polymorphisms (SNPs), the aim of this study was to perform functional analysis of some of these SNPs by means of an electrophoretic mobility shift assay (EMSA) and a luciferase reporter assay. Results First, it was shown that both putative promoters of APC (1A and 1B) drive transcription in an in vitro reporter experiment. From eleven randomly selected SNPs of promoter 1A and four SNPs of promoter 1B, nine and two respectively showed differential patterns of binding of nuclear proteins to oligonucleotide probes corresponding to alternative alleles. The luciferase reporter assay showed that among the six SNPs tested, the rs75612255 C allele and rs113017087 C allele in promoter 1A as well as the rs138386816 T allele and rs115658307 T allele in promoter 1B significantly increased luciferase activity in the human erythromyeloblastoid leukaemia cell line K562. In human colorectal cancer HCT-116 cells, none of the substitutions under study had any effect, with the exception of minor allele G of rs79896135 in promoter 1B. This allele significantly decreased the luciferase reporter’s activity Conclusion Our results indicate that many SNPs in APC promoters 1A and 1B are functionally relevant and that allele G of rs79896135 may be associated with the predisposition to colorectal cancer. Electronic supplementary material The online version of this article (doi:10.1186/s12863-016-0460-8) contains supplementary material, which is available to authorized users.
Collapse
|
8
|
Peterson TA, Mort M, Cooper DN, Radivojac P, Kann MG, Mooney SD. Regulatory Single-Nucleotide Variant Predictor Increases Predictive Performance of Functional Regulatory Variants. Hum Mutat 2016; 37:1137-1143. [PMID: 27406314 DOI: 10.1002/humu.23049] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Accepted: 06/28/2016] [Indexed: 12/20/2022]
Abstract
In silico methods for detecting functionally relevant genetic variants are important for identifying genetic markers of human inherited disease. Much research has focused on protein-coding variants since coding regions have well-defined physicochemical and functional properties. However, many bioinformatics tools are not applicable to variants outside coding regions. Here, we increase the classification performance of our regulatory single-nucleotide variant predictor (RSVP) for variants that cause regulatory abnormalities from an AUC of 0.90-0.97 by incorporating genomic regions identified by the ENCODE project into RSVP. RSVP is comparable to a recently published tool, Genome-Wide Annotation of Variants (GWAVA); both RSVP and GWAVA perform better on regulatory variants than a traditional variant predictor, combined annotation-dependent depletion (CADD). However, our method outperforms GWAVA on variants located at similar distances to the transcription start site as the positive set (AUC: 0.96) as compared with GWAVA (AUC: 0.71). Much of this disparity is due to RSVP's incorporation of features pertaining to the nearest gene (expression, GO terms, etc.), which are not included in GWAVA. Our findings hold out the promise of a framework for the assessment of all functional regulatory variants, providing a means to predict which rare or de novo variants are of pathogenic significance.
Collapse
Affiliation(s)
- Thomas A Peterson
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, Maryland
| | - Matthew Mort
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, CF14 4XN, United Kingdom
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, CF14 4XN, United Kingdom
| | - Predrag Radivojac
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana
| | - Maricel G Kann
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, Maryland
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington.
| |
Collapse
|
9
|
Tang H, Thomas PD. Tools for Predicting the Functional Impact of Nonsynonymous Genetic Variation. Genetics 2016; 203:635-47. [PMID: 27270698 PMCID: PMC4896183 DOI: 10.1534/genetics.116.190033] [Citation(s) in RCA: 75] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2015] [Accepted: 04/01/2016] [Indexed: 01/09/2023] Open
Abstract
As personal genome sequencing becomes a reality, understanding the effects of genetic variants on phenotype-particularly the impact of germline variants on disease risk and the impact of somatic variants on cancer development and treatment-continues to increase in importance. Because of their clear potential for affecting phenotype, nonsynonymous genetic variants (variants that cause a change in the amino acid sequence of a protein encoded by a gene) have long been the target of efforts to predict the effects of genetic variation. Whole-genome sequencing is identifying large numbers of nonsynonymous variants in each genome, intensifying the need for computational methods that accurately predict which of these are likely to impact disease phenotypes. This review focuses on nonsynonymous variant prediction with two aims in mind: (1) to review the prioritization methods that have been developed to date and the principles on which they are based and (2) to discuss the challenges to further improving these methods.
Collapse
Affiliation(s)
- Haiming Tang
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, California 90033
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, California 90033
| |
Collapse
|
10
|
Levitsky VG, Oshchepkov DY, Klimova NV, Ignatieva EV, Vasiliev GV, Merkulov VM, Merkulova TI. Hidden heterogeneity of transcription factor binding sites: A case study of SF-1. Comput Biol Chem 2016; 64:19-32. [PMID: 27235721 DOI: 10.1016/j.compbiolchem.2016.04.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Revised: 04/19/2016] [Accepted: 04/19/2016] [Indexed: 01/15/2023]
Abstract
Steroidogenic factor 1 (SF-1) belongs to a small group of the transcription factors that bind DNA only as a monomer. Three different approaches-Sitecon, SiteGA, and oPWM-constructed using the same training sample of experimentally confirmed SF-1 binding sites have been used to recognize these sites. The appropriate prediction thresholds for recognition models have been selected. Namely, the thresholds concordant by false positive or negative rates for various methods were used to optimize the discrimination of steroidogenic gene promoters from the datasets of non-specific promoters. After experimental verification, the models were used to analyze the ChIP-seq data for SF-1. It has been shown that the sets of sites recognized by different models overlap only partially and that an integration of these models allows for identification of SF-1 sites in up to 80% of the ChIP-seq loci. The structures of the sites detected using the three recognition models in the ChIP-seq peaks falling within the [-5000, +5000] region relative to the transcription start sites (TSS) extracted from the FANTOM5 project have been analyzed. The MATLIGN classified the frequency matrices for the sites predicted by oPWM, Sitecon, and SiteGA into two groups. The first group is described by oPWM/Sitecon and the second, by SiteGA. Gene ontology (GO) analysis has been used to clarify the differences between the sets of genes carrying different variants of SF-1 binding sites. Although this analysis in general revealed a considerable overlap in GO terms for the genes carrying the binding sites predicted by oPWM, Sitecon, or SiteGA, only the last method elicited notable trend to terms related to negative regulation and apoptosis. The results suggest that the SF-1 binding sites are different in both their structure and the functional annotation of the set of target genes correspond to the predictions by oPWM+Sitecon and SiteGA. Further application of Homer software for de novo identification of enriched motifs in ChIP-Seq data for SF-1ChIP-seq dataset gave the data similar to oPWM+Sitecon.
Collapse
Affiliation(s)
- V G Levitsky
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia; Novosibirsk State University, Novosibirsk, Russia.
| | - D Yu Oshchepkov
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
| | - N V Klimova
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
| | - E V Ignatieva
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia; Novosibirsk State University, Novosibirsk, Russia
| | - G V Vasiliev
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
| | - V M Merkulov
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
| | - T I Merkulova
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia; Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
11
|
Mooney SD. Progress towards the integration of pharmacogenomics in practice. Hum Genet 2014; 134:459-65. [PMID: 25238897 DOI: 10.1007/s00439-014-1484-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 08/20/2014] [Indexed: 12/12/2022]
Abstract
Understanding the role genes and genetic variants play in clinical treatment response continues to be an active area of research with the goal of common clinical use. This goal has developed into today's industry of pharmacogenomics, where new drug-gene relationships are discovered and further characterized, published and then curated into national and international resources for use by researchers and clinicians. These efforts have given us insight into what a pharmacogenomic variant is, and how it differs from human disease variants and common polymorphisms. While publications continue to reveal pharmacogenomic relationships between genes and specific classes of drugs, many challenges remain toward the goal of widespread use clinically. First, the clinical guidelines for pharmacogenomic testing are still in their infancy. Second, sequencing technologies are changing rapidly making it somewhat unclear what genetic data will be available to the clinician at the time of care. Finally, what and when to return data to a patient is an area under constant debate. New innovations such as PheWAS approaches and whole genome sequencing studies are enabling a tsunami of new findings. In this review, pharmacogenomic variants, pharmacogenomic resources, interpretation clinical guidelines and challenges, such as WGS approaches, and the impact of pharmacogenomics on drug development and regulatory approval are reviewed.
Collapse
Affiliation(s)
- Sean D Mooney
- Buck Institute for Research on Aging, 8001 Redwood Blvd, Novato, CA, 94945, USA,
| |
Collapse
|
12
|
Polimanti R, Di Girolamo M, Manfellotto D, Fuciarelli M. In silico analysis of TTR gene (coding and non-coding regions, and interactive network) and its implications in transthyretin-related amyloidosis. Amyloid 2014; 21:154-62. [PMID: 24779883 DOI: 10.3109/13506129.2014.900487] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
INTRODUCTION Transthyretin (TTR)-related amyloidosis is a life-threatening disease. Currently, several questions about the pathogenic mechanisms of TTR-related amyloidosis remain unanswered. METHODS We have investigated various TTR-related issues using different in silico approaches. RESULTS Using an amino acid similarity-based analysis, we have indicated the most relevant TTR secondary structures in determining mutation impact. Our amyloidogenic propensity analysis of TTR missense substitutions has highlighted a similar pattern for wild-type and mutated TTR amino β acid sequences. However, some mutations present differences with respect to the general distribution. We have identified non-coding variants in cis-regulatory elements of the TTR gene, and our analysis on V122I-related haplotypes has indicated differences in non-coding regulatory variants, suggesting differences among V122I carriers. The analysis of methylation status indicated CpG sites that may affect TTR expression. Finally, our interactive network analysis revealed functional partners of TTR that may play a modifier role in the pathogenesis of TTR-related amyloidosis. DISCUSSION AND CONCLUSION Our data provided new insights into the pathogenesis of TTR-related amyloidosis that, if they were to be confirmed through experimental investigations, could significantly improve our understanding of the disease.
Collapse
Affiliation(s)
- Renato Polimanti
- Department of Biology, University of Rome "Tor Vergata" , Rome , Italy and
| | | | | | | |
Collapse
|
13
|
Bryzgalov LO, Antontseva EV, Matveeva MY, Shilov AG, Kashina EV, Mordvinov VA, Merkulova TI. Detection of regulatory SNPs in human genome using ChIP-seq ENCODE data. PLoS One 2013; 8:e78833. [PMID: 24205329 PMCID: PMC3812152 DOI: 10.1371/journal.pone.0078833] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2012] [Accepted: 09/17/2013] [Indexed: 11/18/2022] Open
Abstract
A vast amount of SNPs derived from genome-wide association studies are represented by non-coding ones, therefore exacerbating the need for effective identification of regulatory SNPs (rSNPs) among them. However, this task remains challenging since the regulatory part of the human genome is annotated much poorly as opposed to coding regions. Here we describe an approach aggregating the whole set of ENCODE ChIP-seq data in order to search for rSNPs, and provide the experimental evidence of its efficiency. Its algorithm is based on the assumption that the enrichment of a genomic region with transcription factor binding loci (ChIP-seq peaks) indicates its regulatory function, and thereby SNPs located in this region are more likely to influence transcription regulation. To ensure that the approach preferably selects functionally meaningful SNPs, we performed enrichment analysis of several human SNP datasets associated with phenotypic manifestations. It was shown that all samples are significantly enriched with SNPs falling into the regions of multiple ChIP-seq peaks as compared with the randomly selected SNPs. For experimental verification, 40 SNPs falling into overlapping regions of at least 7 TF binding loci were selected from OMIM. The effect of SNPs on the binding of the DNA fragments containing them to the nuclear proteins from four human cell lines (HepG2, HeLaS3, HCT-116, and K562) has been tested by EMSA. A radical change in the binding pattern has been observed for 29 SNPs, besides, 6 more SNPs also demonstrated less pronounced changes. Taken together, the results demonstrate the effective way to search for potential rSNPs with the aid of ChIP-seq data provided by ENCODE project.
Collapse
Affiliation(s)
| | - Elena V. Antontseva
- Institute of Cytology and Genetics SD RAS, Novosibirsk, Russian Federation
- * E-mail:
| | | | | | - Elena V. Kashina
- Institute of Cytology and Genetics SD RAS, Novosibirsk, Russian Federation
| | | | - Tatyana I. Merkulova
- Institute of Cytology and Genetics SD RAS, Novosibirsk, Russian Federation
- Novosibirsk State University, Novosibirsk, Russian Federation
| |
Collapse
|
14
|
Effect of genetic regions on the correlation between single point mutation variability and morbidity. Comput Biol Med 2013; 43:594-9. [DOI: 10.1016/j.compbiomed.2013.01.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2011] [Revised: 07/27/2012] [Accepted: 01/19/2013] [Indexed: 11/19/2022]
|
15
|
Ackerman C, Locke A, Feingold E, Reshey B, Espana K, Thusberg J, Mooney S, Bean L, Dooley K, Cua C, Reeves R, Sherman S, Maslen C. An excess of deleterious variants in VEGF-A pathway genes in Down-syndrome-associated atrioventricular septal defects. Am J Hum Genet 2012; 91:646-59. [PMID: 23040494 PMCID: PMC3484504 DOI: 10.1016/j.ajhg.2012.08.017] [Citation(s) in RCA: 84] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2012] [Revised: 06/12/2012] [Accepted: 08/17/2012] [Indexed: 12/20/2022] Open
Abstract
About half of people with trisomy 21 have a congenital heart defect (CHD), whereas the remainder have a structurally normal heart, demonstrating that trisomy 21 is a significant risk factor but is not causal for abnormal heart development. Atrioventricular septal defects (AVSD) are the most commonly occurring heart defects in Down syndrome (DS), and ∼65% of all AVSD is associated with DS. We used a candidate-gene approach among individuals with DS and complete AVSD (cases = 141) and DS with no CHD (controls = 141) to determine whether rare genetic variants in genes involved in atrioventricular valvuloseptal morphogenesis contribute to AVSD in this sensitized population. We found a significant excess (p < 0.0001) of variants predicted to be deleterious in cases compared to controls. At the most stringent level of filtering, we found potentially damaging variants in nearly 20% of cases but fewer than 3% of controls. The variants with the highest probability of being damaging in cases only were found in six genes: COL6A1, COL6A2, CRELD1, FBLN2, FRZB, and GATA5. Several of the case-specific variants were recurrent in unrelated individuals, occurring in 10% of cases studied. No variants with an equal probability of being damaging were found in controls, demonstrating a highly specific association with AVSD. Of note, all of these genes are in the VEGF-A pathway, even though the candidate genes analyzed in this study represented numerous biochemical and developmental pathways, suggesting that rare variants in the VEGF-A pathway might contribute to the genetic underpinnings of AVSD in humans.
Collapse
Affiliation(s)
- Christine Ackerman
- Division of Cardiovascular Medicine and the Heart Research Center, Oregon Health & Science University, Portland, OR 97239, USA
| | - Adam E. Locke
- Department of Human Genetics, Emory University, Atlanta, GA 30033, USA
| | - Eleanor Feingold
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Benjamin Reshey
- Division of Cardiovascular Medicine and the Heart Research Center, Oregon Health & Science University, Portland, OR 97239, USA
| | - Karina Espana
- Division of Cardiovascular Medicine and the Heart Research Center, Oregon Health & Science University, Portland, OR 97239, USA
| | | | - Sean Mooney
- Buck Institute for Research on Aging, Novato, CA 94945, USA
| | - Lora J.H. Bean
- Department of Human Genetics, Emory University, Atlanta, GA 30033, USA
| | - Kenneth J. Dooley
- Sibley Heart Center Cardiology and Division of Pediatric Cardiology, Children’s Healthcare of Atlanta, Department of Pediatrics, Emory University, Atlanta, GA 30033, USA
| | - Clifford L. Cua
- Heart Center, Nationwide Children’s Hospital, Columbus, OH 43205, USA
| | - Roger H. Reeves
- Department of Physiology and the Institute for Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | | | - Cheryl L. Maslen
- Division of Cardiovascular Medicine and the Heart Research Center, Oregon Health & Science University, Portland, OR 97239, USA
| |
Collapse
|
16
|
Lehmann KV, Chen T. Exploring functional variant discovery in non-coding regions with SInBaD. Nucleic Acids Res 2012; 41:e7. [PMID: 22941663 PMCID: PMC3592431 DOI: 10.1093/nar/gks800] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
The thousand genomes project and many similar ongoing large-scale sequencing efforts require new methods to predict functional variants in both coding and non-coding regions in order to understand phenotype and genotype relationships. We report the design of a new model SInBaD (Sequence-Information-Based-Decision-model) which relies on nucleotide conservation information to evaluate any annotated human variant in all known exons, introns, splice junctions and promoter regions. SInBaD builds separate mathematical models for promoters, exons and introns, using the human disease mutations annotated in human gene mutation database as the training dataset for functional variants. The ten-fold cross validation shows high prediction accuracy. Validations on test datasets, demonstrate that variants predicted as functional have a significantly higher occurrence in cancer patients. We also applied our model to variants found in four different individual human genomes to identify a set of functional variants, which might be of interest for further studies. Scores for any possible variants for all annotated genes are available under http://tingchenlab.cmb.usc.edu/sinbad/. SInBaD supports the current standard format of genotyping, the variant call files (VCF 4.0), making it easy to integrate it into any existing next-generation sequencing pipeline. The accuracy of SNP detection poses the only limitation to the use of SInBaD.
Collapse
Affiliation(s)
- Kjong-Van Lehmann
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | | |
Collapse
|
17
|
Capriotti E, Nehrt NL, Kann MG, Bromberg Y. Bioinformatics for personal genome interpretation. Brief Bioinform 2012; 13:495-512. [PMID: 22247263 DOI: 10.1093/bib/bbr070] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
An international consortium released the first draft sequence of the human genome 10 years ago. Although the analysis of this data has suggested the genetic underpinnings of many diseases, we have not yet been able to fully quantify the relationship between genotype and phenotype. Thus, a major current effort of the scientific community focuses on evaluating individual predispositions to specific phenotypic traits given their genetic backgrounds. Many resources aim to identify and annotate the specific genes responsible for the observed phenotypes. Some of these use intra-species genetic variability as a means for better understanding this relationship. In addition, several online resources are now dedicated to collecting single nucleotide variants and other types of variants, and annotating their functional effects and associations with phenotypic traits. This information has enabled researchers to develop bioinformatics tools to analyze the rapidly increasing amount of newly extracted variation data and to predict the effect of uncharacterized variants. In this work, we review the most important developments in the field--the databases and bioinformatics tools that will be of utmost importance in our concerted effort to interpret the human variome.
Collapse
Affiliation(s)
- Emidio Capriotti
- Department of Mathematics and Computer Science, University of Balearic Islands, ctra. de Valldemossa Km 7.5, Palma de Mallorca, 07122 Spain.
| | | | | | | |
Collapse
|