1
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: trends from three decades of genetic variant impact predictors. Hum Genomics 2024; 18:90. [PMID: 39198917 PMCID: PMC11360829 DOI: 10.1186/s40246-024-00663-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Accepted: 08/19/2024] [Indexed: 09/01/2024] Open
Abstract
BACKGROUND Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). RESULTS The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past three decades, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 190 VIPs, resulting in a total of 407 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. CONCLUSIONS VIPdb version 2 summarizes 407 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. VIPdb is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA
| | - Arul S Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA
- Illumina, Foster City, CA, 94404, USA
| | - Steven E Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA.
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA.
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA.
| |
Collapse
|
2
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: Trends from 25 years of genetic variant impact predictors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600283. [PMID: 38979289 PMCID: PMC11230257 DOI: 10.1101/2024.06.25.600283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Background Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). Results The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past 25 years, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 186 VIPs, resulting in a total of 403 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. Conclusions VIPdb version 2 summarizes 403 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. Availability VIPdb version 2 is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
| | - Arul S. Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Currently at: Illumina, Foster City, California 94404, USA
| | - Steven E. Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
3
|
Baumgarten N, Rumpf L, Kessler T, Schulz MH. A statistical approach for identifying single nucleotide variants that affect transcription factor binding. iScience 2024; 27:109765. [PMID: 38736546 PMCID: PMC11088338 DOI: 10.1016/j.isci.2024.109765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 01/30/2024] [Accepted: 04/15/2024] [Indexed: 05/14/2024] Open
Abstract
Non-coding variants located within regulatory elements may alter gene expression by modifying transcription factor (TF) binding sites, thereby leading to functional consequences. Different TF models are being used to assess the effect of DNA sequence variants, such as single nucleotide variants (SNVs). Often existing methods are slow and do not assess statistical significance of results. We investigated the distribution of absolute maximal differential TF binding scores for general computational models that affect TF binding. We find that a modified Laplace distribution can adequately approximate the empirical distributions. A benchmark on in vitro and in vivo datasets showed that our approach improves upon an existing method in terms of performance and speed. Applications on eQTLs and on a genome-wide association study illustrate the usefulness of our statistics by highlighting cell type-specific regulators and target genes. An implementation of our approach is freely available on GitHub and as bioconda package.
Collapse
Affiliation(s)
- Nina Baumgarten
- Institute of Cardiovascular Regeneration, Goethe University, 60590 Frankfurt am Main, Germany
- Institute for Computational Genomic Medicine, Goethe University, 60590 Frankfurt am Main, Germany
- Institute for Computer Science, Goethe University, 60590 Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590 Frankfurt am Main, Germany
| | - Laura Rumpf
- Institute of Cardiovascular Regeneration, Goethe University, 60590 Frankfurt am Main, Germany
- Institute for Computational Genomic Medicine, Goethe University, 60590 Frankfurt am Main, Germany
- Institute for Computer Science, Goethe University, 60590 Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590 Frankfurt am Main, Germany
| | - Thorsten Kessler
- German Heart Centre Munich, Department of Cardiology, School of Medicine and Health, Technical University of Munich, 80636 Munich, Germany
- German Centre for Cardiovascular Research, Partner Site Munich Heart Alliance, 80636 Munich, Germany
| | - Marcel H. Schulz
- Institute of Cardiovascular Regeneration, Goethe University, 60590 Frankfurt am Main, Germany
- Institute for Computational Genomic Medicine, Goethe University, 60590 Frankfurt am Main, Germany
- Institute for Computer Science, Goethe University, 60590 Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590 Frankfurt am Main, Germany
| |
Collapse
|
4
|
Yang M, Ali O, Bjørås M, Wang J. Identifying functional regulatory mutation blocks by integrating genome sequencing and transcriptome data. iScience 2023; 26:107266. [PMID: 37520692 PMCID: PMC10371843 DOI: 10.1016/j.isci.2023.107266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 04/05/2023] [Accepted: 06/28/2023] [Indexed: 08/01/2023] Open
Abstract
Millions of single nucleotide variants (SNVs) exist in the human genome; however, it remains challenging to identify functional SNVs associated with diseases. We propose a non-encoding SNVs analysis tool bpb3, BayesPI-BAR version 3, aiming to identify the functional mutation blocks (FMBs) by integrating genome sequencing and transcriptome data. The identified FMBs display high frequency SNVs, significant changes in transcription factors (TFs) binding affinity and are nearby the regulatory regions of differentially expressed genes. A two-level Bayesian approach with a biophysical model for protein-DNA interactions is implemented, to compute TF-DNA binding affinity changes based on clustered position weight matrices (PWMs) from over 1700 TF-motifs. The epigenetic data, such as the DNA methylome can also be integrated to scan FMBs. By testing the datasets from follicular lymphoma and melanoma, bpb3 automatically and robustly identifies FMBs, demonstrating that bpb3 can provide insight into patho-mechanisms, and therapeutic targets from transcriptomic and genomic data.
Collapse
Affiliation(s)
- Mingyi Yang
- Department of Microbiology, Oslo University Hospital and University of Oslo, Oslo, Norway
- Department of Medical Biochemistry, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Omer Ali
- Department of Pathology, Oslo University Hospital - Norwegian Radium Hospital, Oslo, Norway
- Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Magnar Bjørås
- Department of Microbiology, Oslo University Hospital and University of Oslo, Oslo, Norway
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Junbai Wang
- Department of Clinical Molecular Biology (EpiGen), Akershus University Hospital and University of Oslo, Lørenskog, Norway
| |
Collapse
|
5
|
Zhu C, Baumgarten N, Wu M, Wang Y, Das AP, Kaur J, Ardakani FB, Duong TT, Pham MD, Duda M, Dimmeler S, Yuan T, Schulz MH, Krishnan J. CVD-associated SNPs with regulatory potential reveal novel non-coding disease genes. Hum Genomics 2023; 17:69. [PMID: 37491351 PMCID: PMC10369730 DOI: 10.1186/s40246-023-00513-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Accepted: 07/12/2023] [Indexed: 07/27/2023] Open
Abstract
BACKGROUND Cardiovascular diseases (CVDs) are the leading cause of death worldwide. Genome-wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) appearing in non-coding genomic regions in CVDs. The SNPs may alter gene expression by modifying transcription factor (TF) binding sites and lead to functional consequences in cardiovascular traits or diseases. To understand the underlying molecular mechanisms, it is crucial to identify which variations are involved and how they affect TF binding. METHODS The SNEEP (SNP exploration and analysis using epigenomics data) pipeline was used to identify regulatory SNPs, which alter the binding behavior of TFs and link GWAS SNPs to their potential target genes for six CVDs. The human-induced pluripotent stem cells derived cardiomyocytes (hiPSC-CMs), monoculture cardiac organoids (MCOs) and self-organized cardiac organoids (SCOs) were used in the study. Gene expression, cardiomyocyte size and cardiac contractility were assessed. RESULTS By using our integrative computational pipeline, we identified 1905 regulatory SNPs in CVD GWAS data. These were associated with hundreds of genes, half of them non-coding RNAs (ncRNAs), suggesting novel CVD genes. We experimentally tested 40 CVD-associated non-coding RNAs, among them RP11-98F14.11, RPL23AP92, IGBP1P1, and CTD-2383I20.1, which were upregulated in hiPSC-CMs, MCOs and SCOs under hypoxic conditions. Further experiments showed that IGBP1P1 depletion rescued expression of hypertrophic marker genes, reduced hypoxia-induced cardiomyocyte size and improved hypoxia-reduced cardiac contractility in hiPSC-CMs and MCOs. CONCLUSIONS IGBP1P1 is a novel ncRNA with key regulatory functions in modulating cardiomyocyte size and cardiac function in our disease models. Our data suggest ncRNA IGBP1P1 as a potential therapeutic target to improve cardiac function in CVDs.
Collapse
Affiliation(s)
- Chaonan Zhu
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany
| | - Nina Baumgarten
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590, Frankfurt Am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany
| | - Meiqian Wu
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
| | - Yue Wang
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
| | - Arka Provo Das
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany
| | - Jaskiran Kaur
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
| | - Fatemeh Behjati Ardakani
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590, Frankfurt Am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany
| | - Thanh Thuy Duong
- Genome Biologics, Theodor-Stern-Kai 7, 60590, Frankfurt Am Main, Germany
| | - Minh Duc Pham
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany
- Department of Medicine III, Cardiology/Angiology/ Nephrology, Goethe University Hospital, Frankfurt, Germany
- Genome Biologics, Theodor-Stern-Kai 7, 60590, Frankfurt Am Main, Germany
| | - Maria Duda
- Genome Biologics, Theodor-Stern-Kai 7, 60590, Frankfurt Am Main, Germany
| | - Stefanie Dimmeler
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590, Frankfurt Am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany
| | - Ting Yuan
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany.
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany.
- Department of Medicine III, Cardiology/Angiology/ Nephrology, Goethe University Hospital, Frankfurt, Germany.
| | - Marcel H Schulz
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany.
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590, Frankfurt Am Main, Germany.
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany.
| | - Jaya Krishnan
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany.
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590, Frankfurt Am Main, Germany.
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany.
- Department of Medicine III, Cardiology/Angiology/ Nephrology, Goethe University Hospital, Frankfurt, Germany.
| |
Collapse
|
6
|
Jia Y, Qi X, Ma M, Cheng S, Cheng B, Liang C, Guo X, Zhang F. Integrating genome-wide association study with regulatory SNP annotations identified novel candidate genes for osteoporosis. Bone Joint Res 2023; 12:147-154. [PMID: 37051837 PMCID: PMC10003063 DOI: 10.1302/2046-3758.122.bjr-2022-0206.r1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/14/2023] Open
Abstract
Osteoporosis (OP) is a metabolic bone disease, characterized by a decrease in bone mineral density (BMD). However, the research of regulatory variants has been limited for BMD. In this study, we aimed to explore novel regulatory genetic variants associated with BMD. We conducted an integrative analysis of BMD genome-wide association study (GWAS) and regulatory single nucleotide polymorphism (rSNP) annotation information. Firstly, the discovery GWAS dataset and replication GWAS dataset were integrated with rSNP annotation database to obtain BMD associated SNP regulatory elements and SNP regulatory element-target gene (E-G) pairs, respectively. Then, the common genes were further subjected to HumanNet v2 to explore the biological effects. Through discovery and replication integrative analysis for BMD GWAS and rSNP annotation database, we identified 36 common BMD-associated genes for BMD irrespective of regulatory elements, such as FAM3C (pdiscovery GWAS = 1.21 × 10-25, preplication GWAS = 1.80 × 10-12), CCDC170 (pdiscovery GWAS = 1.23 × 10-11, preplication GWAS = 3.22 × 10-9), and SOX6 (pdiscovery GWAS = 4.41 × 10-15, preplication GWAS = 6.57 × 10-14). Then, for the 36 common target genes, multiple gene ontology (GO) terms were detected for BMD such as positive regulation of cartilage development (p = 9.27 × 10-3) and positive regulation of chondrocyte differentiation (p = 9.27 × 10-3). We explored the potential roles of rSNP in the genetic mechanisms of BMD and identified multiple candidate genes. Our study results support the implication of regulatory genetic variants in the development of OP.
Collapse
Affiliation(s)
- Yumeng Jia
- School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Xin Qi
- Precision Medicine Center, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Mei Ma
- School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Shiqiang Cheng
- School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Bolun Cheng
- School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Chujun Liang
- School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Xiong Guo
- School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Feng Zhang
- School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| |
Collapse
|
7
|
Feronato SG, Silva MLM, Izbicki R, Farias TDJ, Shigunov P, Dallagiovanna B, Passetti F, dos Santos HG. Selecting Genetic Variants and Interactions Associated with Amyotrophic Lateral Sclerosis: A Group LASSO Approach. J Pers Med 2022; 12:jpm12081330. [PMID: 36013279 PMCID: PMC9410070 DOI: 10.3390/jpm12081330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 08/10/2022] [Accepted: 08/12/2022] [Indexed: 11/16/2022] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is a multi-system neurodegenerative disease that affects both upper and lower motor neurons, resulting from a combination of genetic, environmental, and lifestyle factors. Usually, the association between single-nucleotide polymorphisms (SNPs) and this disease is tested individually, which leads to the testing of multiple hypotheses. In addition, this classical approach does not support the detection of interaction-dependent SNPs. We applied a two-step procedure to select SNPs and pairwise interactions associated with ALS. SNP data from 276 ALS patients and 268 controls were analyzed by a two-step group LASSO in 2000 iterations. In the first step, we fitted a group LASSO model to a bootstrap sample and a random subset of predictors (25%) from the original data set aiming to screen for important SNPs and, in the second step, we fitted a hierarchical group LASSO model to evaluate pairwise interactions. An in silico analysis was performed on a set of variables, which were prioritized according to their bootstrap selection frequency. We identified seven SNPs (rs16984239, rs10459680, rs1436918, rs1037666, rs4552942, rs10773543, and rs2241493) and two pairwise interactions (rs16984239:rs2118657 and rs16984239:rs3172469) potentially involved in nervous system conservation and function. These results may contribute to the understanding of ALS pathogenesis, its diagnosis, and therapeutic strategy improvement.
Collapse
Affiliation(s)
| | | | - Rafael Izbicki
- Department of Statistics, Universidade Federal de São Carlos, São Carlos 13565-905, Brazil
| | - Ticiana D. J. Farias
- Instituto Carlos Chagas, Fundação Oswaldo Cruz, Curitiba 81310-020, Brazil
- Division of Biomedical Informatics, Department of Immunology and Microbiology, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Patrícia Shigunov
- Instituto Carlos Chagas, Fundação Oswaldo Cruz, Curitiba 81310-020, Brazil
| | | | - Fabio Passetti
- Instituto Carlos Chagas, Fundação Oswaldo Cruz, Curitiba 81310-020, Brazil
| | | |
Collapse
|
8
|
Boytsov A, Abramov S, Makeev VJ, Kulakovskiy IV. Positional weight matrices have sufficient prediction power for analysis of noncoding variants. F1000Res 2022; 11:33. [PMID: 35811788 PMCID: PMC9237556 DOI: 10.12688/f1000research.75471.3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/30/2022] [Indexed: 11/23/2022] Open
Abstract
The position weight matrix, also called the position-specific scoring matrix, is the commonly accepted model to quantify the specificity of transcription factor binding to DNA. Position weight matrices are used in thousands of projects and software tools in regulatory genomics, including computational prediction of the regulatory impact of single-nucleotide variants. Yet, recently Yan et al. reported that "the position weight matrices of most transcription factors lack sufficient predictive power" if applied to the analysis of regulatory variants studied with a newly developed experimental method, SNP-SELEX. Here, we re-analyze the rich experimental dataset obtained by Yan et al. and show that appropriately selected position weight matrices in fact can adequately quantify transcription factor binding to alternative alleles.
Collapse
Affiliation(s)
- Alexandr Boytsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russian Federation
- Moscow Institute of Physics and Technology, Dolgoprudny, 141700, Russian Federation
| | - Sergey Abramov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russian Federation
- Moscow Institute of Physics and Technology, Dolgoprudny, 141700, Russian Federation
| | - Vsevolod J. Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russian Federation
- Moscow Institute of Physics and Technology, Dolgoprudny, 141700, Russian Federation
| | - Ivan V. Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russian Federation
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, 142290, Russian Federation
| |
Collapse
|
9
|
Farooq A, Trøen G, Delabie J, Wang J. Integrating whole genome sequencing, methylation, gene expression, topological associated domain information in regulatory mutation prediction: a study of follicular lymphoma. Comput Struct Biotechnol J 2022; 20:1726-1742. [PMID: 35495111 PMCID: PMC9024376 DOI: 10.1016/j.csbj.2022.03.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 03/22/2022] [Accepted: 03/22/2022] [Indexed: 11/13/2022] Open
Abstract
A major challenge in human genetics is of the analysis of the interplay between genetic and epigenetic factors in a multifactorial disease like cancer. Here, a novel methodology is proposed to investigate genome-wide regulatory mechanisms in cancer, as studied with the example of follicular Lymphoma (FL). In a first phase, a new machine-learning method is designed to identify Differentially Methylated Regions (DMRs) by computing six attributes. In a second phase, an integrative data analysis method is developed to study regulatory mutations in FL, by considering differential methylation information together with DNA sequence variation, differential gene expression, 3D organization of genome (e.g., topologically associated domains), and enriched biological pathways. Resulting mutation block-gene pairs are further ranked to find out the significant ones. By this approach, BCL2 and BCL6 were identified as top-ranking FL-related genes with several mutation blocks and DMRs acting on their regulatory regions. Two additional genes, CDCA4 and CTSO, were also found in top rank with significant DNA sequence variation and differential methylation in neighboring areas, pointing towards their potential use as biomarkers for FL. This work combines both genomic and epigenomic information to investigate genome-wide gene regulatory mechanisms in cancer and contribute to devising novel treatment strategies.
Collapse
|
10
|
Tognon M, Bonnici V, Garrison E, Giugno R, Pinello L. GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs. PLoS Comput Biol 2021; 17:e1009444. [PMID: 34570769 PMCID: PMC8519448 DOI: 10.1371/journal.pcbi.1009444] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 10/15/2021] [Accepted: 09/10/2021] [Indexed: 11/18/2022] Open
Abstract
Transcription factors (TFs) are proteins that promote or reduce the expression of genes by binding short genomic DNA sequences known as transcription factor binding sites (TFBS). While several tools have been developed to scan for potential occurrences of TFBS in linear DNA sequences or reference genomes, no tool exists to find them in pangenome variation graphs (VGs). VGs are sequence-labelled graphs that can efficiently encode collections of genomes and their variants in a single, compact data structure. Because VGs can losslessly compress large pangenomes, TFBS scanning in VGs can efficiently capture how genomic variation affects the potential binding landscape of TFs in a population of individuals. Here we present GRAFIMO (GRAph-based Finding of Individual Motif Occurrences), a command-line tool for the scanning of known TF DNA motifs represented as Position Weight Matrices (PWMs) in VGs. GRAFIMO extends the standard PWM scanning procedure by considering variations and alternative haplotypes encoded in a VG. Using GRAFIMO on a VG based on individuals from the 1000 Genomes project we recover several potential binding sites that are enhanced, weakened or missed when scanning only the reference genome, and which could constitute individual-specific binding events. GRAFIMO is available as an open-source tool, under the MIT license, at https://github.com/pinellolab/GRAFIMO and https://github.com/InfOmics/GRAFIMO. Transcription factors (TFs) are key regulatory proteins and mutations occurring in their binding sites can alter the normal transcriptional landscape of a cell and lead to disease states. Pangenome variation graphs (VGs) efficiently encode genomes from a population of individuals and their genetic variations. GRAFIMO is an open-source tool that extends the traditional PWM scanning procedure to VGs. By scanning for potential TBFS in VGs, GRAFIMO can simultaneously search thousands of genomes while accounting for SNPs, indels, and structural variants. GRAFIMO reports motif occurrences, their statistical significance, frequency, and location within the reference or alternative haplotypes in a given VG. GRAFIMO makes it possible to study how genetic variation affects the binding landscape of known TFs within a population of individuals.
Collapse
Affiliation(s)
- Manuel Tognon
- Computer Science Department, University of Verona, Verona, Italy
| | - Vincenzo Bonnici
- Computer Science Department, University of Verona, Verona, Italy
| | - Erik Garrison
- University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
| | - Rosalba Giugno
- Computer Science Department, University of Verona, Verona, Italy
- * E-mail: (RG); (LP)
| | - Luca Pinello
- Molecular Pathology Unit, Center for Computational and Integrative Biology and Center for Cancer Research, Massachusetts General Hospital Charlestown, Massachusetts, United States of America
- Department of Pathology, Harvard Medical School, Boston, Massachusetts, United States of America
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- * E-mail: (RG); (LP)
| |
Collapse
|
11
|
Klees S, Heinrich F, Schmitt AO, Gültas M. agReg-SNPdb: A Database of Regulatory SNPs for Agricultural Animal Species. BIOLOGY 2021; 10:790. [PMID: 34440019 PMCID: PMC8389679 DOI: 10.3390/biology10080790] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 08/12/2021] [Accepted: 08/12/2021] [Indexed: 12/13/2022]
Abstract
Transcription factors (TFs) govern transcriptional gene regulation by specifically binding to short DNA motifs, known as transcription factor binding sites (TFBSs), in regulatory regions, such as promoters. Today, it is well known that single nucleotide polymorphisms (SNPs) in TFBSs can dramatically affect the level of gene expression, since they can cause a change in the binding affinity of TFs. Such SNPs, referred to as regulatory SNPs (rSNPs), have gained attention in the life sciences due to their causality for specific traits or diseases. In this study, we present agReg-SNPdb, a database comprising rSNP data of seven agricultural and domestic animal species: cattle, pig, chicken, sheep, horse, goat, and dog. To identify the rSNPs, we constructed a bioinformatics pipeline and identified a total of 10,623,512 rSNPs, which are located within TFBSs and affect the binding affinity of putative TFs. Altogether, we implemented the first systematic analysis of SNPs in promoter regions and their impact on the binding affinity of TFs for livestock and made it usable via a web interface.
Collapse
Affiliation(s)
- Selina Klees
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany; (F.H.); (A.O.S.)
- Center for Integrated Breeding Research (CiBreed), Georg-August University, Albrecht-Thaer-Weg 3, 37075 Göttingen, Germany
| | - Felix Heinrich
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany; (F.H.); (A.O.S.)
| | - Armin Otto Schmitt
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany; (F.H.); (A.O.S.)
- Center for Integrated Breeding Research (CiBreed), Georg-August University, Albrecht-Thaer-Weg 3, 37075 Göttingen, Germany
| | - Mehmet Gültas
- Center for Integrated Breeding Research (CiBreed), Georg-August University, Albrecht-Thaer-Weg 3, 37075 Göttingen, Germany
- Faculty of Agriculture, South Westphalia University of Applied Sciences, Lübecker Ring 2, 59494 Soest, Germany
| |
Collapse
|
12
|
Klees S, Lange TM, Bertram H, Rajavel A, Schlüter JS, Lu K, Schmitt AO, Gültas M. In Silico Identification of the Complex Interplay between Regulatory SNPs, Transcription Factors, and Their Related Genes in Brassica napus L. Using Multi-Omics Data. Int J Mol Sci 2021; 22:E789. [PMID: 33466789 PMCID: PMC7830561 DOI: 10.3390/ijms22020789] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 01/08/2021] [Accepted: 01/11/2021] [Indexed: 01/07/2023] Open
Abstract
Regulatory SNPs (rSNPs) are a special class of SNPs which have a high potential to affect the phenotype due to their impact on DNA-binding of transcription factors (TFs). Thus, the knowledge about such rSNPs and TFs could provide essential information regarding different genetic programs, such as tissue development or environmental stress responses. In this study, we use a multi-omics approach by combining genomics, transcriptomics, and proteomics data of two different Brassica napus L. cultivars, namely Zhongshuang11 (ZS11) and Zhongyou821 (ZY821), with high and low oil content, respectively, to monitor the regulatory interplay between rSNPs, TFs and their corresponding genes in the tissues flower, leaf, stem, and root. By predicting the effect of rSNPs on TF-binding and by measuring their association with the cultivars, we identified a total of 41,117 rSNPs, of which 1141 are significantly associated with oil content. We revealed several enriched members of the TF families DOF, MYB, NAC, or TCP, which are important for directing transcriptional programs regulating differential expression of genes within the tissues. In this work, we provide the first genome-wide collection of rSNPs for B. napus and their impact on the regulation of gene expression in vegetative and floral tissues, which will be highly valuable for future studies on rSNPs and gene regulation.
Collapse
Affiliation(s)
- Selina Klees
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany; (S.K.); (T.M.L.); (H.B.); (A.R.); (J.-S.S.); (A.O.S.)
| | - Thomas Martin Lange
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany; (S.K.); (T.M.L.); (H.B.); (A.R.); (J.-S.S.); (A.O.S.)
| | - Hendrik Bertram
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany; (S.K.); (T.M.L.); (H.B.); (A.R.); (J.-S.S.); (A.O.S.)
| | - Abirami Rajavel
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany; (S.K.); (T.M.L.); (H.B.); (A.R.); (J.-S.S.); (A.O.S.)
| | - Johanna-Sophie Schlüter
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany; (S.K.); (T.M.L.); (H.B.); (A.R.); (J.-S.S.); (A.O.S.)
| | - Kun Lu
- College of Agronomy and Biotechnology, Southwest University, Chongqing 400715, China;
- Academy of Agricultural Sciences, Southwest University, Chongqing 400715, China
- State Cultivation Base of Crop Stress Biology for Southern Mountainous Land of Southwest University, Chongqing 400715, China
| | - Armin Otto Schmitt
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany; (S.K.); (T.M.L.); (H.B.); (A.R.); (J.-S.S.); (A.O.S.)
- Center for Integrated Breeding Research (CiBreed), Albrecht-Thaer-Weg 3, Georg-August University, 37075 Göttingen, Germany
| | - Mehmet Gültas
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany; (S.K.); (T.M.L.); (H.B.); (A.R.); (J.-S.S.); (A.O.S.)
- Center for Integrated Breeding Research (CiBreed), Albrecht-Thaer-Weg 3, Georg-August University, 37075 Göttingen, Germany
| |
Collapse
|
13
|
O'Connor T, Grant CE, Bodén M, Bailey TL. T-Gene: improved target gene prediction. Bioinformatics 2020; 36:3902-3904. [PMID: 32246829 DOI: 10.1093/bioinformatics/btaa227] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Revised: 03/04/2020] [Accepted: 03/30/2020] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Identifying the genes regulated by a given transcription factor (TF) (its 'target genes') is a key step in developing a comprehensive understanding of gene regulation. Previously, we developed a method (CisMapper) for predicting the target genes of a TF based solely on the correlation between a histone modification at the TF's binding site and the expression of the gene across a set of tissues or cell lines. That approach is limited to organisms for which extensive histone and expression data are available, and does not explicitly incorporate the genomic distance between the TF and the gene. RESULTS We present the T-Gene algorithm, which overcomes these limitations. It can be used to predict which genes are most likely to be regulated by a TF, and which of the TF's binding sites are most likely involved in regulating particular genes. T-Gene calculates a novel score that combines distance and histone/expression correlation, and we show that this score accurately predicts when a regulatory element bound by a TF is in contact with a gene's promoter, achieving median precision above 60%. T-Gene is easy to use via its web server or as a command-line tool, and can also make accurate predictions (median precision above 40%) based on distance alone when extensive histone/expression data is not available for the organism. T-Gene provides an estimate of the statistical significance of each of its predictions. AVAILABILITY AND IMPLEMENTATION The T-Gene web server, source code, histone/expression data and genome annotation files are provided at http://meme-suite.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Charles E Grant
- Department of Genome Sciences, University of Washington, Seattle, WA 98195-5065
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane 4072, Australia
| | - Timothy L Bailey
- Department of Pharmacology, University of Nevada, Reno, NV 89557, USA
| |
Collapse
|
14
|
Moradifard S, Saghiri R, Ehsani P, Mirkhani F, Ebrahimi‐Rad M. A preliminary computational outputs versus experimental results: Application of sTRAP, a biophysical tool for the analysis of SNPs of transcription factor-binding sites. Mol Genet Genomic Med 2020; 8:e1219. [PMID: 32155318 PMCID: PMC7216802 DOI: 10.1002/mgg3.1219] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 02/25/2020] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND In the human genome, the transcription factors (TFs) and transcription factor-binding sites (TFBSs) network has a great regulatory function in the biological pathways. Such crosstalk might be affected by the single-nucleotide polymorphisms (SNPs), which could create or disrupt a TFBS, leading to either a disease or a phenotypic defect. Many computational resources have been introduced to predict the TFs binding variations due to SNPs inside TFBSs, sTRAP being one of them. METHODS A literature review was performed and the experimental data for 18 TFBSs located in 12 genes was provided. The sequences of TFBS motifs were extracted using two different strategies; in the size similar with synthetic target sites used in the experimental techniques, and with 60 bp upstream and downstream of the SNPs. The sTRAP (http://trap.molgen.mpg.de/cgi-bin/trap_two_seq_form.cgi) was applied to compute the binding affinity scores of their cognate TFs in the context of reference and mutant sequences of TFBSs. The alternative bioinformatics model used in this study was regulatory analysis of variation in enhancers (RAVEN; http://www.cisreg.ca/cgi-bin/RAVEN/a). The bioinformatics outputs of our study were compared with experimental data, electrophoretic mobility shift assay (EMSA). RESULTS In 6 out of 18 TFBSs in the following genes COL1A1, Hb ḉᴪ, TF, FIX, MBL2, NOS2A, the outputs of sTRAP were inconsistent with the results of EMSA. Furthermore, no p value of the difference between the two scores of binding affinity under the wild and mutant conditions of TFBSs was presented. Nor, were any criteria for preference or selection of any of the measurements of different matrices used for the same analysis. CONCLUSION Our preliminary study indicated some paradoxical results between sTRAP and experimental data. However, to link the data of sTRAP to the biological functions, its optimization via experimental procedures with the integration of expanded data and applying several other bioinformatics tools might be required.
Collapse
Affiliation(s)
| | - Reza Saghiri
- Biochemistry DepartmentPasteur Institute of IranTehranIran
| | - Parastoo Ehsani
- Molecular Biology DepartmentPasteur Institute of IranTehranIran
| | | | | |
Collapse
|
15
|
Ye J, Liu L, Xu X, Wen Y, Li P, Cheng B, Cheng S, Zhang L, Ma M, Qi X, Liang C, Kafle OP, Wu C, Wang S, Wang X, Ning Y, Chu X, Niu L, Zhang F. A genome-wide multiphenotypic association analysis identified candidate genes and gene ontology shared by four common risky behaviors. Aging (Albany NY) 2020; 12:3287-3297. [PMID: 32090979 PMCID: PMC7066886 DOI: 10.18632/aging.102812] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 01/25/2020] [Indexed: 12/20/2022]
Abstract
BACKGROUND Risky behaviors can lead to huge economic and health losses. However, limited efforts are paid to explore the genetic mechanisms of risky behaviors. RESULT MASH analysis identified a group of target genes for risky behaviors, such as APBB2, MAPT and DCC. For GO enrichment analysis, FUMA detected multiple risky behaviors related GO terms and brain related diseases, such as regulation of neuron differentiation (adjusted P value = 2.84×10-5), autism spectrum disorder (adjusted P value =1.81×10-27) and intelligence (adjusted P value =5.89×10-15). CONCLUSION We reported multiple candidate genes and GO terms shared by the four risky behaviors, providing novel clues for understanding the genetic mechanism of risky behaviors. METHODS Multivariate Adaptive Shrinkage (MASH) analysis was first applied to the GWAS data of four specific risky behaviors (automobile speeding, drinks per week, ever-smoker, number of sexual partners) to detect the common genetic variants shared by the four risky behaviors. Utilizing genomic functional annotation data of SNPs, the SNPs detected by MASH were then mapped to target genes. Finally, gene set enrichment analysis of the identified candidate genes were conducted by the FUMA platform to obtain risky behaviors related gene ontology (GO) terms as well as diseases and traits, respectively.
Collapse
Affiliation(s)
- Jing Ye
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Li Liu
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Xiaoqiao Xu
- Clinical Research Center of Shaanxi Province for Dental and Maxillofacial Diseases, College of Stomatology, Xi'an Jiaotong University, Xi'an, China
| | - Yan Wen
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Ping Li
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Bolun Cheng
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Shiqiang Cheng
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Lu Zhang
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Mei Ma
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Xin Qi
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Chujun Liang
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Om Prakash Kafle
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Cuiyan Wu
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Sen Wang
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Xi Wang
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Yujie Ning
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Xiaomeng Chu
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Lin Niu
- Clinical Research Center of Shaanxi Province for Dental and Maxillofacial Diseases, College of Stomatology, Xi'an Jiaotong University, Xi'an, China
| | - Feng Zhang
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| |
Collapse
|
16
|
Qi X, Wen Y, Li P, Liang C, Cheng B, Ma M, Cheng S, Zhang L, Liu L, Kafle OP, Zhang F. An integrative analysis of genome-wide association study and regulatory SNP annotation datasets identified candidate genes for bipolar disorder. Int J Bipolar Disord 2020; 8:6. [PMID: 32009227 PMCID: PMC6995798 DOI: 10.1186/s40345-019-0170-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Accepted: 11/06/2019] [Indexed: 12/25/2022] Open
Abstract
Background Bipolar disorder (BD) is a complex mood disorder. The genetic mechanism of BD remains largely unknown. Methods We conducted an integrative analysis of genome-wide association study (GWAS) and regulatory SNP (rSNP) annotation datasets, including transcription factor binding regions (TFBRs), chromatin interactive regions (CIRs), mature microRNA regions (miRNAs), long non-coding RNA regions (lncRNAs), topologically associated domains (TADs) and circular RNAs (circRNAs). Firstly, GWAS dataset 1 of BD (including 20,352 cases and 31,358 controls) and GWAS dataset 2 of BD (including 7481 BD patients and 9250 controls) were integrated with rSNP annotation database to obtain BD associated SNP regulatory elements and SNP regulatory element-target gene (E–G) pairs, respectively. Secondly, a comparative analysis of the two datasets results was conducted to identify the common rSNPs and also their target genes. Then, gene sets enrichment analysis (FUMA GWAS) and HumanNet-XC analysis were conducted to explore the functional relevance of identified target genes with BD. Results After the integrative analysis, we identified 52 TFBRs target genes, 44 TADs target genes, 55 CIRs target genes and 21 lncRNAs target genes for BD, such as ITIH4 (Pdataset1 = 6.68 × 10−8, Pdataset2 = 6.64 × 10−7), ITIH3 (Pdataset1 = 1.09 × 10−8, Pdataset2 = 2.00 × 10−7), SYNE1 (Pdataset1 = 1.80 × 10−6, Pdataset2 = 4.33 × 10−9) and OPRM1 (Pdataset1 = 1.80 × 10−6, Pdataset2 = 4.33 × 10−9). Conclusion We conducted a large-scale integrative analysis of GWAS and 6 common rSNP information datasets to explore the potential roles of rSNPs in the genetic mechanism of BD. We identified multiple candidate genes for BD, supporting the importance of rSNP in the development of BD.
Collapse
Affiliation(s)
- Xin Qi
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, No. 76 Yan Ta West Road, Xi'an, 710061, People's Republic of China
| | - Yan Wen
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, No. 76 Yan Ta West Road, Xi'an, 710061, People's Republic of China
| | - Ping Li
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, No. 76 Yan Ta West Road, Xi'an, 710061, People's Republic of China
| | - Chujun Liang
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, No. 76 Yan Ta West Road, Xi'an, 710061, People's Republic of China
| | - Bolun Cheng
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, No. 76 Yan Ta West Road, Xi'an, 710061, People's Republic of China
| | - Mei Ma
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, No. 76 Yan Ta West Road, Xi'an, 710061, People's Republic of China
| | - Shiqiang Cheng
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, No. 76 Yan Ta West Road, Xi'an, 710061, People's Republic of China
| | - Lu Zhang
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, No. 76 Yan Ta West Road, Xi'an, 710061, People's Republic of China
| | - Li Liu
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, No. 76 Yan Ta West Road, Xi'an, 710061, People's Republic of China
| | - Om Prakash Kafle
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, No. 76 Yan Ta West Road, Xi'an, 710061, People's Republic of China
| | - Feng Zhang
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, No. 76 Yan Ta West Road, Xi'an, 710061, People's Republic of China.
| |
Collapse
|
17
|
Yao Y, Ramsey SA. CERENKOV3: Clustering and molecular network-derived features improve computational prediction of functional noncoding SNPs. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020; 25:535-546. [PMID: 31797625 PMCID: PMC6897322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Identification of causal noncoding single nucleotide polymorphisms (SNPs) is important for maximizing the knowledge dividend from human genome-wide association studies (GWAS). Recently, diverse machine learning-based methods have been used for functional SNP identification; however, this task remains a fundamental challenge in computational biology. We report CERENKOV3, a machine learning pipeline that leverages clustering-derived and molecular network-derived features to improve prediction accuracy of regulatory SNPs (rSNPs) in the context of post-GWAS analysis. The clustering-derived feature, locus size (number of SNPs in the locus), derives from our locus partitioning procedure and represents the sizes of clusters based on SNP locations. We generated two molecular network-derived features from representation learning on a network representing SNP-gene and gene-gene relations. Based on empirical studies using a ground-truth SNP dataset, CERENKOV3 significantly improves rSNP recognition performance in AUPRC, AUROC, and AVGRANK (a locus-wise rank-based measure of classification accuracy we previously proposed).
Collapse
Affiliation(s)
- Yao Yao
- School of Electrical Engineering and Computer Science, Oregon State University
| | - Stephen A. Ramsey
- School of Electrical Engineering and Computer Science, Oregon State University,Department of Biomedical Sciences, Oregon State University Corvallis, OR 97330, USA
| |
Collapse
|
18
|
Dapas M, Sisk R, Legro RS, Urbanek M, Dunaif A, Hayes MG. Family-Based Quantitative Trait Meta-Analysis Implicates Rare Noncoding Variants in DENND1A in Polycystic Ovary Syndrome. J Clin Endocrinol Metab 2019; 104:3835-3850. [PMID: 31038695 PMCID: PMC6660913 DOI: 10.1210/jc.2018-02496] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 04/17/2019] [Indexed: 02/07/2023]
Abstract
CONTEXT Polycystic ovary syndrome (PCOS) is among the most common endocrine disorders of premenopausal women, affecting 5% to15% of this population depending on the diagnostic criteria applied. It is characterized by hyperandrogenism, ovulatory dysfunction, and polycystic ovarian morphology. PCOS is highly heritable, but only a small proportion of this heritability can be accounted for by the common genetic susceptibility variants identified to date. OBJECTIVE The objective of this study was to test whether rare genetic variants contribute to PCOS pathogenesis. DESIGN, PATIENTS, AND METHODS We performed whole-genome sequencing on DNA from 261 individuals from 62 families with one or more daughters with PCOS. We tested for associations of rare variants with PCOS and its concomitant hormonal traits using a quantitative trait meta-analysis. RESULTS We found rare variants in DENND1A (P = 5.31 × 10-5, adjusted P = 0.039) that were significantly associated with reproductive and metabolic traits in PCOS families. CONCLUSIONS Common variants in DENND1A have previously been associated with PCOS diagnosis in genome-wide association studies. Subsequent studies indicated that DENND1A is an important regulator of human ovarian androgen biosynthesis. Our findings provide additional evidence that DENND1A plays a central role in PCOS and suggest that rare noncoding variants contribute to disease pathogenesis.
Collapse
Affiliation(s)
- Matthew Dapas
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Ryan Sisk
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Richard S Legro
- Department of Obstetrics and Gynecology, Penn State College of Medicine, Hershey, Pennsylvania
| | - Margrit Urbanek
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Center for Reproductive Science, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Andrea Dunaif
- Division of Endocrinology, Diabetes, and Bone Disease, Icahn School of Medicine at Mount Sinai, New York, New York
| | - M Geoffrey Hayes
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Department of Anthropology, Northwestern University, Evanston, Illinois
| |
Collapse
|
19
|
Hu Z, Yu C, Furutsuki M, Andreoletti G, Ly M, Hoskins R, Adhikari AN, Brenner SE. VIPdb, a genetic Variant Impact Predictor Database. Hum Mutat 2019; 40:1202-1214. [PMID: 31283070 PMCID: PMC7288905 DOI: 10.1002/humu.23858] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 06/27/2019] [Indexed: 12/30/2022]
Abstract
Genome sequencing identifies vast number of genetic variants. Predicting these variants' molecular and clinical effects is one of the preeminent challenges in human genetics. Accurate prediction of the impact of genetic variants improves our understanding of how genetic information is conveyed to molecular and cellular functions, and is an essential step towards precision medicine. Over one hundred tools/resources have been developed specifically for this purpose. We summarize these tools as well as their characteristics, in the genetic Variant Impact Predictor Database (VIPdb). This database will help researchers and clinicians explore appropriate tools, and inform the development of improved methods. VIPdb can be browsed and downloaded at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Changhua Yu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Department of Bioengineering, University of California, Berkeley, California 94720, USA
| | - Mabel Furutsuki
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720, USA
| | - Gaia Andreoletti
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Melissa Ly
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Division of Data Sciences, University of California, Berkeley, California 94720, USA
| | - Roger Hoskins
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Aashish N. Adhikari
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Steven E. Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
20
|
Nishizaki SS, Ng N, Dong S, Porter RS, Morterud C, Williams C, Asman C, Switzenberg JA, Boyle AP. Predicting the effects of SNPs on transcription factor binding affinity. Bioinformatics 2019; 36:364-372. [PMID: 31373606 PMCID: PMC7999143 DOI: 10.1093/bioinformatics/btz612] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Revised: 07/15/2019] [Accepted: 08/01/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Genome-wide association studies have revealed that 88% of disease-associated single-nucleotide polymorphisms (SNPs) reside in noncoding regions. However, noncoding SNPs remain understudied, partly because they are challenging to prioritize for experimental validation. To address this deficiency, we developed the SNP effect matrix pipeline (SEMpl). RESULTS SEMpl estimates transcription factor-binding affinity by observing differences in chromatin immunoprecipitation followed by deep sequencing signal intensity for SNPs within functional transcription factor-binding sites (TFBSs) genome-wide. By cataloging the effects of every possible mutation within the TFBS motif, SEMpl can predict the consequences of SNPs to transcription factor binding. This knowledge can be used to identify potential disease-causing regulatory loci. AVAILABILITY AND IMPLEMENTATION SEMpl is available from https://github.com/Boyle-Lab/SEM_CPP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sierra S Nishizaki
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Natalie Ng
- Department of Human Genetics, Stanford University, Stanford, CA 94305, USA
| | - Shengcheng Dong
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Robert S Porter
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Cody Morterud
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Colten Williams
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Courtney Asman
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jessica A Switzenberg
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | | |
Collapse
|
21
|
Gorsic LK, Dapas M, Legro RS, Hayes MG, Urbanek M. Functional Genetic Variation in the Anti-Müllerian Hormone Pathway in Women With Polycystic Ovary Syndrome. J Clin Endocrinol Metab 2019; 104:2855-2874. [PMID: 30786001 PMCID: PMC6543512 DOI: 10.1210/jc.2018-02178] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Accepted: 02/15/2019] [Indexed: 01/08/2023]
Abstract
CONTEXT Polycystic ovary syndrome (PCOS) is a highly heritable, common endocrine disorder characterized by hyperandrogenism, irregular menses, and polycystic ovaries. PCOS is often accompanied by elevated levels of anti-Müllerian hormone (AMH). AMH inhibits follicle maturation. AMH also inhibits steroidogenesis through transcriptional repression of CYP17A1. We recently identified 16 rare PCOS-specific pathogenic variants in AMH. OBJECTIVE To test whether additional members of the AMH signaling pathway also contribute to the etiology of PCOS. PARTICIPANTS/DESIGN Targeted resequencing of coding and regulatory regions of AMH and its specific type 2 receptor, AMHR2, was performed on 608 women affected with PCOS and 142 reproductively normal control women. Prediction tools of deleteriousness and in silico evidence of epigenetic modification were used to prioritize variants for functional evaluation. Dual-luciferase reporter assays and splicing assays were used to measure the impact of genetic variants on function. RESULTS We identified 20 additional variants in/near AMH and AMHR2 with significantly reduced signaling activity in in vitro assays. Collectively, from our previous study and as reported herein, we have identified a total of 37 variants with impaired activity in/near AMH and AMHR2 in 41 women affected with PCOS, or 6.7% of our PCOS cohort. Furthermore, no functional variants were observed in the 142 phenotyped controls. The functional variants were significantly associated with PCOS in our cohort of 608 women with PCOS and 142 controls (P = 2.3 × 10-5) and very strongly associated with PCOS relative to a larger non-Finnish European (gnomAD) population-based control cohort (P < 1 × 10-9). CONCLUSION The AMH signaling cascade plays an important role in PCOS etiology.
Collapse
Affiliation(s)
- Lidija K Gorsic
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Matthew Dapas
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Richard S Legro
- Department of Obstetrics and Gynecology, Pennsylvania State University College of Medicine, Hershey, Pennsylvania
| | - M Geoffrey Hayes
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Department of Anthropology, Northwestern University, Evanston, Illinois
| | - Margrit Urbanek
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Department of Obstetrics and Gynecology, Pennsylvania State University College of Medicine, Hershey, Pennsylvania
- Correspondence and Reprint Requests: Margrit Urbanek, PhD, Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, 303 East Chicago Avenue, Tarry 15-717, Chicago, Illinois 60611. E-mail:
| |
Collapse
|
22
|
Yao Y, Liu Z, Wei Q, Ramsey SA. CERENKOV2: improved detection of functional noncoding SNPs using data-space geometric features. BMC Bioinformatics 2019; 20:63. [PMID: 30727967 PMCID: PMC6364436 DOI: 10.1186/s12859-019-2637-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 01/18/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND We previously reported on CERENKOV, an approach for identifying regulatory single nucleotide polymorphisms (rSNPs) that is based on 246 annotation features. CERENKOV uses the xgboost classifier and is designed to be used to find causal noncoding SNPs in loci identified by genome-wide association studies (GWAS). We reported that CERENKOV has state-of-the-art performance (by two traditional measures and a novel GWAS-oriented measure, AVGRANK) in a comparison to nine other tools for identifying functional noncoding SNPs, using a comprehensive reference SNP set (OSU17, 15,331 SNPs). Given that SNPs are grouped within loci in the reference SNP set and given the importance of the data-space manifold geometry for machine-learning model selection, we hypothesized that within-locus inter-SNP distances would have class-based distributional biases that could be exploited to improve rSNP recognition accuracy. We thus defined an intralocus SNP "radius" as the average data-space distance from a SNP to the other intralocus neighbors, and explored radius likelihoods for five distance measures. RESULTS We expanded the set of reference SNPs to 39,083 (the OSU18 set) and extracted CERENKOV SNP feature data. We computed radius empirical likelihoods and likelihood densities for rSNPs and control SNPs, and found significant likelihood differences between rSNPs and control SNPs. We fit parametric models of likelihood distributions for five different distance measures to obtain ten log-likelihood features that we combined with the 248-dimensional CERENKOV feature matrix. On the OSU18 SNP set, we measured the classification accuracy of CERENKOV with and without the new distance-based features, and found that the addition of distance-based features significantly improves rSNP recognition performance as measured by AUPVR, AUROC, and AVGRANK. Along with feature data for the OSU18 set, the software code for extracting the base feature matrix, estimating ten distance-based likelihood ratio features, and scoring candidate causal SNPs, are released as open-source software CERENKOV2. CONCLUSIONS Accounting for the locus-specific geometry of SNPs in data-space significantly improved the accuracy with which noncoding rSNPs can be computationally identified.
Collapse
Affiliation(s)
- Yao Yao
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, 97330 OR USA
- Department of Biomedical Sciences, Oregon State University, 106 Dryden Hall, Corvallis, 97330 OR USA
| | - Zheng Liu
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, 97330 OR USA
- Department of Biomedical Sciences, Oregon State University, 106 Dryden Hall, Corvallis, 97330 OR USA
| | - Qi Wei
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, 97330 OR USA
- Department of Biomedical Sciences, Oregon State University, 106 Dryden Hall, Corvallis, 97330 OR USA
| | - Stephen A. Ramsey
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, 97330 OR USA
- Department of Biomedical Sciences, Oregon State University, 106 Dryden Hall, Corvallis, 97330 OR USA
| |
Collapse
|
23
|
Wong KC. DNA Motif Recognition Modeling from Protein Sequences. iScience 2018; 7:198-211. [PMID: 30267681 PMCID: PMC6153143 DOI: 10.1016/j.isci.2018.09.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Revised: 08/08/2018] [Accepted: 09/04/2018] [Indexed: 12/31/2022] Open
Abstract
Although the existing works on DNA motif discovery on DNA sequences are plethoric, mechanistic knowledge to infer DNA motifs from protein sequences across multiple DNA-binding domain families without conducting any wet-lab experiments is still lacking. Therefore, the k-spectrum recognition modeling is proposed to address the issues at the highest possible resolutions. The k-spectrum model can capture DNA motif patterns from protein sequences at the resolution in which local sequence context and nucleotide dependency can be taken into account completely. Multiple evaluation metrics are adopted and measured on millions of k-mer binding intensities from 92 proteins across 5 DNA-binding families (i.e., bHLH, bZIP, ETS, Forkhead, and Homeodomain), demonstrating its competitive edges. In addition, it not only can contribute to DNA motif recognition modeling but also can help prioritize the observed or even unobserved binding of single nucleotide variants on transcription factor binding sites in a genome-wide manner. DNA motif modeling from protein is fundamental for understanding gene regulation A framework is proposed at the highest possible sequence resolution for the first time It is validated on millions of k-mer intensities from 92 proteins across 5 families It can prioritize the unobserved regulatory single nucleotide variants on DNA motifs
Collapse
Affiliation(s)
- Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong.
| |
Collapse
|
24
|
O'Connor T, Bodén M, Bailey TL. CisMapper: predicting regulatory interactions from transcription factor ChIP-seq data. Nucleic Acids Res 2018; 45:e19. [PMID: 28204599 PMCID: PMC5389714 DOI: 10.1093/nar/gkw956] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Revised: 09/30/2016] [Accepted: 10/10/2016] [Indexed: 12/18/2022] Open
Abstract
Identifying the genomic regions and regulatory factors that control the transcription of genes is an important, unsolved problem. The current method of choice predicts transcription factor (TF) binding sites using chromatin immunoprecipitation followed by sequencing (ChIP-seq), and then links the binding sites to putative target genes solely on the basis of the genomic distance between them. Evidence from chromatin conformation capture experiments shows that this approach is inadequate due to long-distance regulation via chromatin looping. We present CisMapper, which predicts the regulatory targets of a TF using the correlation between a histone mark at the TF's bound sites and the expression of each gene across a panel of tissues. Using both chromatin conformation capture and differential expression data, we show that CisMapper is more accurate at predicting the target genes of a TF than the distance-based approaches currently used, and is particularly advantageous for predicting the long-range regulatory interactions typical of tissue-specific gene expression. CisMapper also predicts which TF binding sites regulate a given gene more accurately than using genomic distance. Unlike distance-based methods, CisMapper can predict which transcription start site of a gene is regulated by a particular binding site of the TF.
Collapse
Affiliation(s)
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane 4072, Australia
| | - Timothy L Bailey
- Department of Pharmacology, University of Nevada School of Medicine, Reno, NV 89557-0357, USA
| |
Collapse
|
25
|
Abstract
Transcription is regulated by transcription factor (TF) binding at promoters and distal regulatory elements and histone modifications that control the accessibility of these elements. Chromatin immunoprecipitation followed by sequencing (ChIP-seq) has become the standard assay for identifying genome-wide protein-DNA interactions in vitro and in vivo. As large-scale ChIP-seq data sets have been collected for different TFs and histone modifications, their potential to predict gene expression can be used to test hypotheses about the mechanisms of gene regulation. In addition, complementary functional genomics assays provide a global view of chromatin accessibility and long-range cis-regulatory interactions that are being combined with TF binding and histone remodeling to study the regulation of gene expression. Thus, ChIP-seq analysis is now widely integrated with other functional genomics assays to better understand gene regulatory mechanisms. In this review, we discuss advances and challenges in integrating ChIP-seq data to identify context-specific chromatin states associated with gene activity. We describe the overall computational design of integrating ChIP-seq data with other functional genomics assays. We also discuss the challenges of extending these methods to low-input ChIP-seq assays and related single-cell assays.
Collapse
Affiliation(s)
| | - Ali Mortazavi
- Corresponding author: Ali Mortazavi, Department of Developmental and Cell Biology, 2300 Biological Sciences 3, University of California, Irvine, CA 92697, USA. Tel: (949)824-6762; E-mail:
| |
Collapse
|
26
|
Chadaeva IV, Ponomarenko PM, Rasskazov DA, Sharypova EB, Kashina EV, Zhechev DA, Drachkova IA, Arkova OV, Savinkova LK, Ponomarenko MP, Kolchanov NA, Osadchuk LV, Osadchuk AV. Candidate SNP markers of reproductive potential are predicted by a significant change in the affinity of TATA-binding protein for human gene promoters. BMC Genomics 2018; 19:0. [PMID: 29504899 PMCID: PMC5836831 DOI: 10.1186/s12864-018-4478-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The progress of medicine, science, technology, education, and culture improves, year by year, quality of life and life expectancy of the populace. The modern human has a chance to further improve the quality and duration of his/her life and the lives of his/her loved ones by bringing their lifestyle in line with their sequenced individual genomes. With this in mind, one of genome-based developments at the junction of personalized medicine and bioinformatics will be considered in this work, where we used two Web services: (i) SNP_TATA_Comparator to search for alleles with a single nucleotide polymorphism (SNP) that alters the affinity of TATA-binding protein (TBP) for the TATA boxes of human gene promoters and (ii) PubMed to look for retrospective clinical reviews on changes in physiological indicators of reproductive potential in carriers of these alleles. RESULTS A total of 126 SNP markers of female reproductive potential, capable of altering the affinity of TBP for gene promoters, were found using the two above-mentioned Web services. For example, 10 candidate SNP markers of thrombosis (e.g., rs563763767) can cause overproduction of coagulation inducers. In pregnant women, Hughes syndrome provokes thrombosis with a fatal outcome although this syndrome can be diagnosed and eliminated even at the earliest stages of its development. Thus, in women carrying any of the above SNPs, preventive treatment of this syndrome before a planned pregnancy can reduce the risk of death. Similarly, seven SNP markers predicted here (e.g., rs774688955) can elevate the risk of myocardial infarction. In line with Bowles' lifespan theory, women carrying any of these SNPs may modify their lifestyle to improve their longevity if they can take under advisement that risks of myocardial infarction increase with age of the mother, total number of pregnancies, in multiple pregnancies, pregnancies under the age of 20, hypertension, preeclampsia, menstrual cycle irregularity, and in women smokers. CONCLUSIONS According to Bowles' lifespan theory-which links reproductive potential, quality of life, and life expectancy-the above information was compiled for those who would like to reduce risks of diseases corresponding to alleles in own sequenced genomes. Candidate SNP markers can focus the clinical analysis of unannotated SNPs, after which they may become useful for people who would like to bring their lifestyle in line with their sequenced individual genomes.
Collapse
Affiliation(s)
- Irina V Chadaeva
- Brain Neurobiology and Neurogenetics Center, Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia
- Novosibirsk State University, Novosibirsk, 630090, Russia
| | | | - Dmitry A Rasskazov
- Brain Neurobiology and Neurogenetics Center, Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia
| | - Ekaterina B Sharypova
- Brain Neurobiology and Neurogenetics Center, Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia
| | - Elena V Kashina
- Brain Neurobiology and Neurogenetics Center, Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia
| | - Dmitry A Zhechev
- Brain Neurobiology and Neurogenetics Center, Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia
| | - Irina A Drachkova
- Brain Neurobiology and Neurogenetics Center, Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia
| | - Olga V Arkova
- Brain Neurobiology and Neurogenetics Center, Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia
- Vector-Best Inc., Koltsovo, Novosibirsk Region, 630559, Russia
| | - Ludmila K Savinkova
- Brain Neurobiology and Neurogenetics Center, Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia
| | - Mikhail P Ponomarenko
- Brain Neurobiology and Neurogenetics Center, Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia.
- Novosibirsk State University, Novosibirsk, 630090, Russia.
| | - Nikolay A Kolchanov
- Brain Neurobiology and Neurogenetics Center, Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia
- Novosibirsk State University, Novosibirsk, 630090, Russia
| | - Ludmila V Osadchuk
- Brain Neurobiology and Neurogenetics Center, Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia
- Novosibirsk State Agricultural University, Novosibirsk, 630039, Russia
| | - Alexandr V Osadchuk
- Brain Neurobiology and Neurogenetics Center, Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia
| |
Collapse
|
27
|
Korhonen JH, Palin K, Taipale J, Ukkonen E. Fast motif matching revisited: high-order PWMs, SNPs and indels. Bioinformatics 2017; 33:514-521. [PMID: 28011774 DOI: 10.1093/bioinformatics/btw683] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Accepted: 10/27/2016] [Indexed: 01/09/2023] Open
Abstract
Motivation While the position weight matrix (PWM) is the most popular model for sequence motifs, there is growing evidence of the usefulness of more advanced models such as first-order Markov representations, and such models are also becoming available in well-known motif databases. There has been lots of research of how to learn these models from training data but the problem of predicting putative sites of the learned motifs by matching the model against new sequences has been given less attention. Moreover, motif site analysis is often concerned about how different variants in the sequence affect the sites. So far, though, the corresponding efficient software tools for motif matching have been lacking. Results We develop fast motif matching algorithms for the aforementioned tasks. First, we formalize a framework based on high-order position weight matrices for generic representation of motif models with dinucleotide or general q -mer dependencies, and adapt fast PWM matching algorithms to the high-order PWM framework. Second, we show how to incorporate different types of sequence variants , such as SNPs and indels, and their combined effects into efficient PWM matching workflows. Benchmark results show that our algorithms perform well in practice on genome-sized sequence sets and are for multiple motif search much faster than the basic sliding window algorithm. Availability and Implementation Implementations are available as a part of the MOODS software package under the GNU General Public License v3.0 and the Biopython license ( http://www.cs.helsinki.fi/group/pssmfind ). Contact janne.h.korhonen@gmail.com.
Collapse
Affiliation(s)
- Janne H Korhonen
- School of Computer Science, Reykjavík University, Reykjavík, Iceland.,Helsinki Institute for Information Technology HIIT, Helsinki, Finland.,Department of Computer Science
| | - Kimmo Palin
- Genome-Scale Biology Research Program, Research Programs Unit
| | - Jussi Taipale
- Department of Biosciences and Nutrition, Karolinska Institutet, Genome Scale Biology Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Esko Ukkonen
- Helsinki Institute for Information Technology HIIT, Helsinki, Finland.,Department of Computer Science
| |
Collapse
|
28
|
Forno E, Wang T, Yan Q, Brehm J, Acosta-Perez E, Colon-Semidey A, Alvarez M, Boutaoui N, Cloutier MM, Alcorn JF, Canino G, Chen W, Celedón JC. A Multiomics Approach to Identify Genes Associated with Childhood Asthma Risk and Morbidity. Am J Respir Cell Mol Biol 2017; 57:439-447. [PMID: 28574721 DOI: 10.1165/rcmb.2017-0002oc] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Childhood asthma is a complex disease. In this study, we aim to identify genes associated with childhood asthma through a multiomics "vertical" approach that integrates multiple analytical steps using linear and logistic regression models. In a case-control study of childhood asthma in Puerto Ricans (n = 1,127), we used adjusted linear or logistic regression models to evaluate associations between several analytical steps of omics data, including genome-wide (GW) genotype data, GW methylation, GW expression profiling, cytokine levels, asthma-intermediate phenotypes, and asthma status. At each point, only the top genes/single-nucleotide polymorphisms/probes/cytokines were carried forward for subsequent analysis. In step 1, asthma modified the gene expression-protein level association for 1,645 genes; pathway analysis showed an enrichment of these genes in the cytokine signaling system (n = 269 genes). In steps 2-3, expression levels of 40 genes were associated with intermediate phenotypes (asthma onset age, forced expiratory volume in 1 second, exacerbations, eosinophil counts, and skin test reactivity); of those, methylation of seven genes was also associated with asthma. Of these seven candidate genes, IL5RA was also significant in analytical steps 4-8. We then measured plasma IL-5 receptor α levels, which were associated with asthma age of onset and moderate-severe exacerbations. In addition, in silico database analysis showed that several of our identified IL5RA single-nucleotide polymorphisms are associated with transcription factors related to asthma and atopy. This approach integrates several analytical steps and is able to identify biologically relevant asthma-related genes, such as IL5RA. It differs from other methods that rely on complex statistical models with various assumptions.
Collapse
Affiliation(s)
- Erick Forno
- 1 Division of Pediatric Pulmonary Medicine, Allergy, and Immunology, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Ting Wang
- 1 Division of Pediatric Pulmonary Medicine, Allergy, and Immunology, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Qi Yan
- 1 Division of Pediatric Pulmonary Medicine, Allergy, and Immunology, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - John Brehm
- 1 Division of Pediatric Pulmonary Medicine, Allergy, and Immunology, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, Pennsylvania
| | | | - Angel Colon-Semidey
- 3 Department of Pediatrics, University of Puerto Rico, San Juan, Puerto Rico; and
| | - Maria Alvarez
- 3 Department of Pediatrics, University of Puerto Rico, San Juan, Puerto Rico; and
| | - Nadia Boutaoui
- 1 Division of Pediatric Pulmonary Medicine, Allergy, and Immunology, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Michelle M Cloutier
- 4 Department of Pediatrics, University of Connecticut Health Center, Connecticut Children's Medical Center, Farmington, Connecticut
| | - John F Alcorn
- 1 Division of Pediatric Pulmonary Medicine, Allergy, and Immunology, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, Pennsylvania
| | | | - Wei Chen
- 1 Division of Pediatric Pulmonary Medicine, Allergy, and Immunology, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Juan C Celedón
- 1 Division of Pediatric Pulmonary Medicine, Allergy, and Immunology, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, Pennsylvania
| |
Collapse
|
29
|
Integrative whole-genome sequence analysis reveals roles of regulatory mutations in BCL6 and BCL2 in follicular lymphoma. Sci Rep 2017; 7:7040. [PMID: 28765546 PMCID: PMC5539289 DOI: 10.1038/s41598-017-07226-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2016] [Accepted: 06/27/2017] [Indexed: 02/02/2023] Open
Abstract
The contribution of mutations in regulatory regions to tumorigenesis has been the subject of many recent studies. We propose a new framework for integrative analysis of genome-wide sequencing data by considering diverse genetic information. This approach is applied to study follicular lymphoma (FL), a disease for which little is known about the contribution of regulatory gene mutations. Results from a test FL cohort revealed three novel highly recurrent regulatory mutation blocks near important genes implicated in FL, BCL6 and BCL2. Similar findings were detected in a validation FL cohort. We also found transcription factors (TF) whose binding may be disturbed by these mutations in FL: disruption of FOX TF family near the BCL6 promoter may result in reduced BCL6 expression, which then increases BCL2 expression over that caused by BCL2 gene translocation. Knockdown experiments of two TF hits (FOXD2 or FOXD3) were performed in human B lymphocytes verifying that they modulate BCL6/BCL2 according to the computationally predicted effects of the SNVs on TF binding. Overall, our proposed integrative analysis facilitates non-coding driver identification and the new findings may enhance the understanding of FL.
Collapse
|
30
|
A Trans-acting Factor May Modify Age at Onset in Familial Amyloid Polyneuropathy ATTRV30M in Portugal. Mol Neurobiol 2017; 55:3676-3683. [PMID: 28527106 DOI: 10.1007/s12035-017-0593-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Accepted: 05/02/2017] [Indexed: 02/06/2023]
Abstract
Although all familial amyloid polyneuropathy (FAP) ATTRV30M patients carry the same causative mutation, early (<40) and late-onset forms (≥50 years) of FAP may coexist in the same family. However, this variability in age at onset is still unexplained. To identify modifiers closely linked to the TTR locus that may in part be associated with age at onset of FAP ATTRV30M, in particular in a group of very early-onset patients (≤30 years) when compared with late-onset individuals. A clinical genetic study at a referral center comprising a sample of 910 Portuguese individuals includes 589 Val30Met carriers, 102 spouses, and 189 controls from the general population. Haplotype analysis was performed, using eight intragenic single nucleotide polymorphisms (SNPs) at the TTR locus. We compared haplotypes frequency in FAP samples and controls and in parent-offspring pairs using appropriated statistical analysis. Haplotype A was the most common in the general population. Noteworthy, haplotype C was more frequent in early-onset (<40) than in late-onset patients (≥50 years) (p = 0.012). When comparing allelic frequencies of each SNP within haplotype C between "very early" (≤30 years) and late-onset (≥50 years) cases, the A allele of rs72922947 was associated with an earlier onset (p = 0.009); this remained significant after a permutation-based correction. Also, the heterozygous genotype (GA) for this SNP was associated with a decrease in mean age at onset of 8.6 years (p = 0.014). We found a more common haplotype (A) linked to the Val30Met variant and a possible modulatory trans effect on age at onset. These findings may lead to potential therapeutical targets.
Collapse
|
31
|
Yi S, Lin S, Li Y, Zhao W, Mills GB, Sahni N. Functional variomics and network perturbation: connecting genotype to phenotype in cancer. Nat Rev Genet 2017; 18:395-410. [PMID: 28344341 DOI: 10.1038/nrg.2017.8] [Citation(s) in RCA: 68] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Proteins interact with other macromolecules in complex cellular networks for signal transduction and biological function. In cancer, genetic aberrations have been traditionally thought to disrupt the entire gene function. It has been increasingly appreciated that each mutation of a gene could have a subtle but unique effect on protein function or network rewiring, contributing to diverse phenotypic consequences across cancer patient populations. In this Review, we discuss the current understanding of cancer genetic variants, including the broad spectrum of mutation classes and the wide range of mechanistic effects on gene function in the context of signalling networks. We highlight recent advances in computational and experimental strategies to study the diverse functional and phenotypic consequences of mutations at the base-pair resolution. Such information is crucial to understanding the complex pleiotropic effect of cancer genes and provides a possible link between genotype and phenotype in cancer.
Collapse
Affiliation(s)
- Song Yi
- Department of Systems Biology, University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - Shengda Lin
- Department of Medicine, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Yongsheng Li
- Department of Systems Biology, University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - Wei Zhao
- Department of Systems Biology, University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - Gordon B Mills
- Department of Systems Biology, University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - Nidhi Sahni
- Department of Systems Biology, University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA.,Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas 77030, USA
| |
Collapse
|
32
|
Afanasyeva MA, Putlyaeva LV, Demin DE, Kulakovskiy IV, Vorontsov IE, Fridman MV, Makeev VJ, Kuprash DV, Schwartz AM. The single nucleotide variant rs12722489 determines differential estrogen receptor binding and enhancer properties of an IL2RA intronic region. PLoS One 2017; 12:e0172681. [PMID: 28234966 PMCID: PMC5325477 DOI: 10.1371/journal.pone.0172681] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Accepted: 02/08/2017] [Indexed: 12/11/2022] Open
Abstract
We studied functional effect of rs12722489 single nucleotide polymorphism located in the first intron of human IL2RA gene on transcriptional regulation. This polymorphism is associated with multiple autoimmune conditions (rheumatoid arthritis, multiple sclerosis, Crohn's disease, and ulcerative colitis). Analysis in silico suggested significant difference in the affinity of estrogen receptor (ER) binding site between alternative allelic variants, with stronger predicted affinity for the risk (G) allele. Electrophoretic mobility shift assay showed that purified human ERα bound only G variant of a 32-bp genomic sequence containing rs12722489. Chromatin immunoprecipitation demonstrated that endogenous human ERα interacted with rs12722489 genomic region in vivo and DNA pull-down assay confirmed differential allelic binding of amplified 189-bp genomic fragments containing rs12722489 with endogenous human ERα. In a luciferase reporter assay, a kilobase-long genomic segment containing G but not A allele of rs12722489 demonstrated enhancer properties in MT-2 cell line, an HTLV-1 transformed human cell line with a regulatory T cell phenotype.
Collapse
Affiliation(s)
- Marina A. Afanasyeva
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- * E-mail:
| | - Lidia V. Putlyaeva
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Denis E. Demin
- Moscow Institute of Physics and Technology, Moscow, Russia
| | - Ivan V. Kulakovskiy
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Skolkovo Institute of Science and Technology, Skolkovo Innovation Center, Moscow, Russia
| | - Ilya E. Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Marina V. Fridman
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Vsevolod J. Makeev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Moscow Institute of Physics and Technology, Moscow, Russia
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Dmitry V. Kuprash
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Moscow Institute of Physics and Technology, Moscow, Russia
- Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Anton M. Schwartz
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
33
|
Forno E, Sordillo J, Brehm J, Chen W, Benos T, Yan Q, Avila L, Soto-Quirós M, Cloutier MM, Colón-Semidey A, Alvarez M, Acosta-Pérez E, Weiss ST, Litonjua AA, Canino G, Celedón JC. Genome-wide interaction study of dust mite allergen on lung function in children with asthma. J Allergy Clin Immunol 2017; 140:996-1003.e7. [PMID: 28167095 DOI: 10.1016/j.jaci.2016.12.967] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2016] [Revised: 11/25/2016] [Accepted: 12/12/2016] [Indexed: 12/28/2022]
Abstract
BACKGROUND Childhood asthma is likely the result of gene-by-environment (G × E) interactions. Dust mite is a known risk factor for asthma morbidity. Yet, there have been no genome-wide G × E studies of dust mite allergen on asthma-related phenotypes. OBJECTIVE We sought to identify genetic variants whose effects on lung function in children with asthma are modified by the level of dust mite allergen exposure. METHODS A genome-wide interaction analysis of dust mite allergen level and lung function was performed in a cohort of Puerto Rican children with asthma (Puerto Rico Genetics of Asthma and Lifestyle [PRGOAL]). Replication was attempted in 2 independent cohorts, the Childhood Asthma Management Program (CAMP) and the Genetics of Asthma in Costa Rica Study. RESULTS Single nucleotide polymorphism (SNP) rs117902240 showed a significant interaction effect on FEV1 with dust mite allergen level in PRGOAL (interaction P = 3.1 × 10-8), and replicated in the same direction in CAMP white children and CAMP Hispanic children (combined interaction P = .0065 for replication cohorts and 7.4 × 10-9 for all cohorts). Rs117902240 was positively associated with FEV1 in children exposed to low dust mite allergen levels, but negatively associated with FEV1 in children exposed to high levels. This SNP is on chromosome 8q24, adjacent to a binding site for CCAAT/enhancer-binding protein beta, a transcription factor that forms part of the IL-17 signaling pathway. None of the SNPs identified for FEV1/forced vital capacity replicated in the independent cohorts. CONCLUSIONS Dust mite allergen exposure modifies the estimated effect of rs117902240 on FEV1 in children with asthma. Analysis of existing data suggests that this SNP may have transcription factor regulatory functions.
Collapse
Affiliation(s)
- Erick Forno
- Division of Pediatric Pulmonary Medicine, Allergy, and Immunology, Children's Hospital of Pittsburgh of UPMC, Pittsburgh, Pa; University of Pittsburgh School of Medicine, Pittsburgh, Pa
| | - Joanne Sordillo
- Channing Division of Network Medicine, Department of Medicine, Harvard Medical School and Brigham and Women's Hospital, Boston, Mass
| | - John Brehm
- Division of Pediatric Pulmonary Medicine, Allergy, and Immunology, Children's Hospital of Pittsburgh of UPMC, Pittsburgh, Pa; University of Pittsburgh School of Medicine, Pittsburgh, Pa
| | - Wei Chen
- Division of Pediatric Pulmonary Medicine, Allergy, and Immunology, Children's Hospital of Pittsburgh of UPMC, Pittsburgh, Pa; University of Pittsburgh School of Medicine, Pittsburgh, Pa
| | - Takis Benos
- University of Pittsburgh School of Medicine, Pittsburgh, Pa
| | - Qi Yan
- Division of Pediatric Pulmonary Medicine, Allergy, and Immunology, Children's Hospital of Pittsburgh of UPMC, Pittsburgh, Pa; University of Pittsburgh School of Medicine, Pittsburgh, Pa
| | - Lydiana Avila
- Department of Pediatric Pulmonology, Hospital Nacional de Niños, San José, Costa Rica
| | - Manuel Soto-Quirós
- Department of Pediatric Pulmonology, Hospital Nacional de Niños, San José, Costa Rica
| | - Michelle M Cloutier
- Department of Pediatrics, University of Connecticut Health Center, Farmington, Conn
| | | | - Maria Alvarez
- Department of Pediatrics, University of Puerto Rico, San Juan, Puerto Rico
| | - Edna Acosta-Pérez
- Behavioral Sciences Research Institute, University of Puerto Rico, San Juan, Puerto Rico
| | - Scott T Weiss
- Channing Division of Network Medicine, Department of Medicine, Harvard Medical School and Brigham and Women's Hospital, Boston, Mass
| | - Augusto A Litonjua
- Channing Division of Network Medicine, Department of Medicine, Harvard Medical School and Brigham and Women's Hospital, Boston, Mass
| | - Glorisa Canino
- Behavioral Sciences Research Institute, University of Puerto Rico, San Juan, Puerto Rico
| | - Juan C Celedón
- Division of Pediatric Pulmonary Medicine, Allergy, and Immunology, Children's Hospital of Pittsburgh of UPMC, Pittsburgh, Pa; University of Pittsburgh School of Medicine, Pittsburgh, Pa.
| |
Collapse
|
34
|
Li R, Zhong D, Liu R, Lv H, Zhang X, Liu J, Han J. A novel method for in silico identification of regulatory SNPs in human genome. J Theor Biol 2017; 415:84-89. [DOI: 10.1016/j.jtbi.2016.11.022] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2016] [Revised: 11/17/2016] [Accepted: 11/25/2016] [Indexed: 11/29/2022]
|
35
|
Chadaeva IV, Ponomarenko MP, Rasskazov DA, Sharypova EB, Kashina EV, Matveeva MY, Arshinova TV, Ponomarenko PM, Arkova OV, Bondar NP, Savinkova LK, Kolchanov NA. Candidate SNP markers of aggressiveness-related complications and comorbidities of genetic diseases are predicted by a significant change in the affinity of TATA-binding protein for human gene promoters. BMC Genomics 2016; 17:995. [PMID: 28105927 PMCID: PMC5249025 DOI: 10.1186/s12864-016-3353-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Aggressiveness in humans is a hereditary behavioral trait that mobilizes all systems of the body-first of all, the nervous and endocrine systems, and then the respiratory, vascular, muscular, and others-e.g., for the defense of oneself, children, family, shelter, territory, and other possessions as well as personal interests. The level of aggressiveness of a person determines many other characteristics of quality of life and lifespan, acting as a stress factor. Aggressive behavior depends on many parameters such as age, gender, diseases and treatment, diet, and environmental conditions. Among them, genetic factors are believed to be the main parameters that are well-studied at the factual level, but in actuality, genome-wide studies of aggressive behavior appeared relatively recently. One of the biggest projects of the modern science-1000 Genomes-involves identification of single nucleotide polymorphisms (SNPs), i.e., differences of individual genomes from the reference genome. SNPs can be associated with hereditary diseases, their complications, comorbidities, and responses to stress or a drug. Clinical comparisons between cohorts of patients and healthy volunteers (as a control) allow for identifying SNPs whose allele frequencies significantly separate them from one another as markers of the above conditions. Computer-based preliminary analysis of millions of SNPs detected by the 1000 Genomes project can accelerate clinical search for SNP markers due to preliminary whole-genome search for the most meaningful candidate SNP markers and discarding of neutral and poorly substantiated SNPs. RESULTS Here, we combine two computer-based search methods for SNPs (that alter gene expression) {i} Web service SNP_TATA_Comparator (DNA sequence analysis) and {ii} PubMed-based manual search for articles on aggressiveness using heuristic keywords. Near the known binding sites for TATA-binding protein (TBP) in human gene promoters, we found aggressiveness-related candidate SNP markers, including rs1143627 (associated with higher aggressiveness in patients undergoing cytokine immunotherapy), rs544850971 (higher aggressiveness in old women taking lipid-lowering medication), and rs10895068 (childhood aggressiveness-related obesity in adolescence with cardiovascular complications in adulthood). CONCLUSIONS After validation of these candidate markers by clinical protocols, these SNPs may become useful for physicians (may help to improve treatment of patients) and for the general population (a lifestyle choice preventing aggressiveness-related complications).
Collapse
Affiliation(s)
- Irina V. Chadaeva
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
- Novosibirsk State University, 2 Pirogova Street, Novosibirsk, 630090 Russia
| | - Mikhail P. Ponomarenko
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
- Novosibirsk State University, 2 Pirogova Street, Novosibirsk, 630090 Russia
| | - Dmitry A. Rasskazov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
| | - Ekaterina B. Sharypova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
| | - Elena V. Kashina
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
| | - Marina Yu Matveeva
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
| | - Tatjana V. Arshinova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
| | - Petr M. Ponomarenko
- Children’s Hospital Los Angeles, 4640 Hollywood Boulevard, University of Southern California, Los Angeles, CA 90027 USA
| | - Olga V. Arkova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
- Vector-Best Inc, Koltsovo, Novosibirsk Region 630559 Russia
| | - Natalia P. Bondar
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
| | - Ludmila K. Savinkova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
| | - Nikolay A. Kolchanov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
- Novosibirsk State University, 2 Pirogova Street, Novosibirsk, 630090 Russia
| |
Collapse
|
36
|
Bahreini A, Levine K, Santana-Santos L, Benos PV, Wang P, Andersen C, Oesterreich S, Lee AV. Non-coding single nucleotide variants affecting estrogen receptor binding and activity. Genome Med 2016; 8:128. [PMID: 27964748 PMCID: PMC5154163 DOI: 10.1186/s13073-016-0382-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Accepted: 11/23/2016] [Indexed: 11/26/2022] Open
Abstract
Background Estrogen receptor (ER) activity is critical for the development and progression of the majority of breast cancers. It is known that ER is differentially bound to DNA leading to transcriptomic and phenotypic changes in different breast cancer models. We investigated whether single nucleotide variants (SNVs) in ER binding sites (regSNVs) contribute to ER action through changes in the ER cistrome, thereby affecting disease progression. Here we developed a computational pipeline to identify SNVs in ER binding sites using chromatin immunoprecipitation sequencing (ChIP-seq) data from ER+ breast cancer models. Methods ER ChIP-seq data were downloaded from the Gene Expression Omnibus (GEO). GATK pipeline was used to identify SNVs and the MACS algorithm was employed to call DNA-binding sites. Determination of the potential effect of a given SNV in a binding site was inferred using reimplementation of the is-rSNP algorithm. The Cancer Genome Atlas (TCGA) data were integrated to correlate the regSNVs and gene expression in breast tumors. ChIP and luciferase assays were used to assess the allele-specific binding. Results Analysis of ER ChIP-seq data from MCF7 cells identified an intronic SNV in the IGF1R gene, rs62022087, predicted to increase ER binding. Functional studies confirmed that ER binds preferentially to rs62022087 versus the wild-type allele. By integrating 43 ER ChIP-seq datasets, multi-omics, and clinical data, we identified 17 regSNVs associated with altered expression of adjacent genes in ER+ disease. Of these, the top candidate was in the promoter of the GSTM1 gene and was associated with higher expression of GSTM1 in breast tumors. Survival analysis of patients with ER+ tumors revealed that higher expression of GSTM1, responsible for detoxifying carcinogens, was correlated with better outcome. Conclusions In conclusion, we have developed a computational approach that is capable of identifying putative regSNVs in ER ChIP-binding sites. These non-coding variants could potentially regulate target genes and may contribute to clinical prognosis in breast cancer. Electronic supplementary material The online version of this article (doi:10.1186/s13073-016-0382-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Amir Bahreini
- Deparmtent of Human Genetics, University of Pittsburgh, Pittsburgh, PA, USA.,Department of Pharmacology and Chemical Biology, University of Pittsburgh Cancer Institute, Pittsburgh, PA, USA.,Womens Cancer Research Center, Magee-Women Research Institute, Pittsburgh, PA, USA
| | - Kevin Levine
- Womens Cancer Research Center, Magee-Women Research Institute, Pittsburgh, PA, USA.,Department of Pathology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Lucas Santana-Santos
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA.,Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Panayiotis V Benos
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Peilu Wang
- Womens Cancer Research Center, Magee-Women Research Institute, Pittsburgh, PA, USA.,School of Medicine, Tsinghua University, Beijing, 100084, People's Republic of China
| | - Courtney Andersen
- Womens Cancer Research Center, Magee-Women Research Institute, Pittsburgh, PA, USA.,AstraZeneca, Oncology iMED, 35 Gatehouse Drive, Waltham, MA, USA
| | - Steffi Oesterreich
- Department of Pharmacology and Chemical Biology, University of Pittsburgh Cancer Institute, Pittsburgh, PA, USA. .,Womens Cancer Research Center, Magee-Women Research Institute, Pittsburgh, PA, USA.
| | - Adrian V Lee
- Deparmtent of Human Genetics, University of Pittsburgh, Pittsburgh, PA, USA. .,Department of Pharmacology and Chemical Biology, University of Pittsburgh Cancer Institute, Pittsburgh, PA, USA. .,Womens Cancer Research Center, Magee-Women Research Institute, Pittsburgh, PA, USA.
| |
Collapse
|
37
|
Kumar S, Ambrosini G, Bucher P. SNP2TFBS - a database of regulatory SNPs affecting predicted transcription factor binding site affinity. Nucleic Acids Res 2016; 45:D139-D144. [PMID: 27899579 PMCID: PMC5210548 DOI: 10.1093/nar/gkw1064] [Citation(s) in RCA: 133] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Revised: 10/05/2016] [Accepted: 10/24/2016] [Indexed: 01/21/2023] Open
Abstract
SNP2TFBS is a computational resource intended to support researchers investigating the molecular mechanisms underlying regulatory variation in the human genome. The database essentially consists of a collection of text files providing specific annotations for human single nucleotide polymorphisms (SNPs), namely whether they are predicted to abolish, create or change the affinity of one or several transcription factor (TF) binding sites. A SNP's effect on TF binding is estimated based on a position weight matrix (PWM) model for the binding specificity of the corresponding factor. These data files are regenerated at regular intervals by an automatic procedure that takes as input a reference genome, a comprehensive SNP catalogue and a collection of PWMs. SNP2TFBS is also accessible over a web interface, enabling users to view the information provided for an individual SNP, to extract SNPs based on various search criteria, to annotate uploaded sets of SNPs or to display statistics about the frequencies of binding sites affected by selected SNPs. Homepage: http://ccg.vital-it.ch/snp2tfbs/.
Collapse
Affiliation(s)
- Sunil Kumar
- Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland
| | - Giovanna Ambrosini
- Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland
| | - Philipp Bucher
- Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland .,Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland
| |
Collapse
|
38
|
Candidate SNP Markers of Chronopathologies Are Predicted by a Significant Change in the Affinity of TATA-Binding Protein for Human Gene Promoters. BIOMED RESEARCH INTERNATIONAL 2016; 2016:8642703. [PMID: 27635400 PMCID: PMC5011241 DOI: 10.1155/2016/8642703] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Revised: 06/25/2016] [Accepted: 06/28/2016] [Indexed: 01/14/2023]
Abstract
Variations in human genome (e.g., single nucleotide polymorphisms, SNPs) may be associated with hereditary diseases, their complications, comorbidities, and drug responses. Using Web service SNP_TATA_Comparator presented in our previous paper, here we analyzed immediate surroundings of known SNP markers of diseases and identified several candidate SNP markers that can significantly change the affinity of TATA-binding protein for human gene promoters, with circadian consequences. For example, rs572527200 may be related to asthma, where symptoms are circadian (worse at night), and rs367732974 may be associated with heart attacks that are characterized by a circadian preference (early morning). By the same method, we analyzed the 90 bp proximal promoter region of each protein-coding transcript of each human gene of the circadian clock core. This analysis yielded 53 candidate SNP markers, such as rs181985043 (susceptibility to acute Q fever in male patients), rs192518038 (higher risk of a heart attack in patients with diabetes), and rs374778785 (emphysema and lung cancer in smokers). If they are properly validated according to clinical standards, these candidate SNP markers may turn out to be useful for physicians (to select optimal treatment for each patient) and for the general population (to choose a lifestyle preventing possible circadian complications of diseases).
Collapse
|
39
|
Shi W, Fornes O, Mathelier A, Wasserman WW. Evaluating the impact of single nucleotide variants on transcription factor binding. Nucleic Acids Res 2016; 44:10106-10116. [PMID: 27492288 PMCID: PMC5137422 DOI: 10.1093/nar/gkw691] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2016] [Revised: 07/25/2016] [Accepted: 07/26/2016] [Indexed: 12/21/2022] Open
Abstract
Diseases and phenotypes caused by disrupted transcription factor (TF) binding are being identified, but progress is hampered by our limited capacity to predict such functional alterations. Improving predictions may be dependent on expanding the set of bona fide TF binding alterations. Allele-specific binding (ASB) events, where TFs preferentially bind to one of the two alleles at heterozygous sites, reveal the impact of sequence variations in altered TF binding. Here, we present the largest ASB compilation to our knowledge, 10 765 ASB events retrieved from 45 ENCODE ChIP-Seq data sets. Our analysis showed that ASB events were frequently associated with motif alterations of the ChIP'ed TF and potential partner TFs, allelic difference of DNase I hypersensitivity and allelic difference of histone modifications. For TF dimers bound symmetrically to DNA, ASB data revealed that central positions of the TF binding motifs were disproportionately important for binding. Lastly, the impact of variation on TF binding was predicted by a classification model incorporating all the investigated features of ASB events. Classification models using only DNase I hypersensitivity and sequence data exhibited predictive accuracy approaching the models with substantially more features. Taken together, the combination of ASB data and the classification model represents an important step toward elucidating regulatory variants across the human genome.
Collapse
Affiliation(s)
- Wenqiang Shi
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, Child & Family Research Institute, University of British Columbia, 950 28th Ave W, Vancouver, BC V5Z 4H4, Canada.,Bioinformatics Graduate Program, University of British Columbia, 2329 W Mall, Vancouver, BC V6T 1Z4, Canada
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, Child & Family Research Institute, University of British Columbia, 950 28th Ave W, Vancouver, BC V5Z 4H4, Canada
| | - Anthony Mathelier
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, Child & Family Research Institute, University of British Columbia, 950 28th Ave W, Vancouver, BC V5Z 4H4, Canada.,Centre for Molecular Medicine Norway (NCMM), Nordic EMBL partnership, University of Oslo and Oslo University Hospital, Norway
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, Child & Family Research Institute, University of British Columbia, 950 28th Ave W, Vancouver, BC V5Z 4H4, Canada
| |
Collapse
|
40
|
Karmakar A, Maitra S, Chakraborti B, Verma D, Sinha S, Mohanakumar KP, Rajamma U, Mukhopadhyay K. Monoamine oxidase B gene variants associated with attention deficit hyperactivity disorder in the Indo-Caucasoid population from West Bengal. BMC Genet 2016; 17:92. [PMID: 27341797 PMCID: PMC4921030 DOI: 10.1186/s12863-016-0401-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Accepted: 06/17/2016] [Indexed: 11/17/2022] Open
Abstract
Background Attention deficit hyperactivity disorder (ADHD) is characterized by symptoms of inattention, excessive motor activity and impulsivity detected mostly during childhood. These traits are known to be controlled by monoamine neurotransmitters, chiefly dopamine, serotonin and norepinephrine. Monoamine oxidase A (MAOA) and B (MAOB), two isoenzymes bound to the outer membrane of mitochondria, are involved in the degradation of monoamines and were explored for association with ADHD in different ethnic groups. In the present study, few exonic as well as intronic MAOB variants were analyzed in ADHD probands (N = 150) and ethnically matched controls (N = 150) recruited following the Diagnostic and Statistical Manual for Mental Disorders-4th edition (DSM-IV). Appropriate scales were used for measuring the behavioural attributes. Gene variants were analyzed by amplification of target sites followed by DNA sequencing and data obtained were analyzed by population based statistical methods. Results Out of 34 variants present in the analyzed sites, only seven functional variants, rs4824562, rs56220155, rs2283728, rs2283727, rs3027441, rs6324 and rs3027440, were found to be polymorphic. rs2283728 ‘C’ (P = 3.45e-006) and rs3027440 ‘T’ (P = 0.02) alleles showed higher frequencies in ADHD probands as compared to controls. rs56220155 ‘A’ (P = 0.04) allele and ‘GA’ (P = 0.04) genotype showed higher frequencies in the male and female ADHD probands respectively as compared to sex-matched controls. Analysis of pairwise linkage disequilibrium revealed striking differences between probands and controls. Haplotype analysis revealed significantly higher occurrence of different haplotypes in the ADHD probands while some haplotypes were detected in the controls only. Higher scores for conduct problems were found to be associated with rs56220155 ‘A’ (P = 0.05) allele in the male ADHD probands. Multifactor dimensionality reduction analysis showed independent as well as interactive effects of polymorphic variants which were more robust in the male probands. Conclusions Since all the polymorphic variants analyzed were functional, it may be inferred that MAOB gene variants are contributing to the etiology of ADHD in the Indo-Caucasoid population from eastern India which merits further in depth analysis. Electronic supplementary material The online version of this article (doi:10.1186/s12863-016-0401-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Arijit Karmakar
- Manovikas Biomedical Research and Diagnostic Centre, 482, Madudah, Plot I-24, Sec.-J, E.M. Bypass, Kolkata, 700107, India
| | - Subhamita Maitra
- Manovikas Biomedical Research and Diagnostic Centre, 482, Madudah, Plot I-24, Sec.-J, E.M. Bypass, Kolkata, 700107, India
| | - Barnali Chakraborti
- Manovikas Biomedical Research and Diagnostic Centre, 482, Madudah, Plot I-24, Sec.-J, E.M. Bypass, Kolkata, 700107, India
| | - Deepak Verma
- Manovikas Biomedical Research and Diagnostic Centre, 482, Madudah, Plot I-24, Sec.-J, E.M. Bypass, Kolkata, 700107, India
| | - Swagata Sinha
- Manovikas Biomedical Research and Diagnostic Centre, 482, Madudah, Plot I-24, Sec.-J, E.M. Bypass, Kolkata, 700107, India
| | - Kochupurackal P Mohanakumar
- Indian Institute of Chemical Biology-Council of Scientific & Industrial Research, Jadavpur, Kolkata, 700 032, India
| | - Usha Rajamma
- Manovikas Biomedical Research and Diagnostic Centre, 482, Madudah, Plot I-24, Sec.-J, E.M. Bypass, Kolkata, 700107, India
| | - Kanchan Mukhopadhyay
- Manovikas Biomedical Research and Diagnostic Centre, 482, Madudah, Plot I-24, Sec.-J, E.M. Bypass, Kolkata, 700107, India.
| |
Collapse
|
41
|
Abstract
Background Somatic mutations in cancer cells affect various genomic elements disrupting important cell functions. In particular, mutations in DNA binding sites recognized by transcription factors can alter regulator binding affinities and, consequently, expression of target genes. A number of promoter mutations have been linked with an increased risk of cancer. Cancer somatic mutations in binding sites of selected transcription factors have been found under positive selection. However, action and significance of negative selection in non-coding regions remain controversial. Results Here we present analysis of transcription factor binding motifs co-localized with non-coding variants. To avoid statistical bias we account for mutation signatures of different cancer types. For many transcription factors, including multiple members of FOX, HOX, and NR families, we show that human cancers accumulate fewer mutations than expected by chance that increase or decrease affinity of predicted binding sites. Such stability of binding motifs is even more exhibited in DNase accessible regions. Conclusions Our data demonstrate negative selection against binding sites alterations and suggest that such selection pressure protects cancer cells from rewiring of regulatory circuits. Further analysis of transcription factors with conserved binding motifs can reveal cell regulatory pathways crucial for the survivability of various human cancers. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2728-9) contains supplementary material, which is available to authorized users.
Collapse
|
42
|
Tang H, Thomas PD. Tools for Predicting the Functional Impact of Nonsynonymous Genetic Variation. Genetics 2016; 203:635-47. [PMID: 27270698 PMCID: PMC4896183 DOI: 10.1534/genetics.116.190033] [Citation(s) in RCA: 75] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2015] [Accepted: 04/01/2016] [Indexed: 01/09/2023] Open
Abstract
As personal genome sequencing becomes a reality, understanding the effects of genetic variants on phenotype-particularly the impact of germline variants on disease risk and the impact of somatic variants on cancer development and treatment-continues to increase in importance. Because of their clear potential for affecting phenotype, nonsynonymous genetic variants (variants that cause a change in the amino acid sequence of a protein encoded by a gene) have long been the target of efforts to predict the effects of genetic variation. Whole-genome sequencing is identifying large numbers of nonsynonymous variants in each genome, intensifying the need for computational methods that accurately predict which of these are likely to impact disease phenotypes. This review focuses on nonsynonymous variant prediction with two aims in mind: (1) to review the prioritization methods that have been developed to date and the principles on which they are based and (2) to discuss the challenges to further improving these methods.
Collapse
Affiliation(s)
- Haiming Tang
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, California 90033
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, California 90033
| |
Collapse
|
43
|
Levitsky VG, Oshchepkov DY, Klimova NV, Ignatieva EV, Vasiliev GV, Merkulov VM, Merkulova TI. Hidden heterogeneity of transcription factor binding sites: A case study of SF-1. Comput Biol Chem 2016; 64:19-32. [PMID: 27235721 DOI: 10.1016/j.compbiolchem.2016.04.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Revised: 04/19/2016] [Accepted: 04/19/2016] [Indexed: 01/15/2023]
Abstract
Steroidogenic factor 1 (SF-1) belongs to a small group of the transcription factors that bind DNA only as a monomer. Three different approaches-Sitecon, SiteGA, and oPWM-constructed using the same training sample of experimentally confirmed SF-1 binding sites have been used to recognize these sites. The appropriate prediction thresholds for recognition models have been selected. Namely, the thresholds concordant by false positive or negative rates for various methods were used to optimize the discrimination of steroidogenic gene promoters from the datasets of non-specific promoters. After experimental verification, the models were used to analyze the ChIP-seq data for SF-1. It has been shown that the sets of sites recognized by different models overlap only partially and that an integration of these models allows for identification of SF-1 sites in up to 80% of the ChIP-seq loci. The structures of the sites detected using the three recognition models in the ChIP-seq peaks falling within the [-5000, +5000] region relative to the transcription start sites (TSS) extracted from the FANTOM5 project have been analyzed. The MATLIGN classified the frequency matrices for the sites predicted by oPWM, Sitecon, and SiteGA into two groups. The first group is described by oPWM/Sitecon and the second, by SiteGA. Gene ontology (GO) analysis has been used to clarify the differences between the sets of genes carrying different variants of SF-1 binding sites. Although this analysis in general revealed a considerable overlap in GO terms for the genes carrying the binding sites predicted by oPWM, Sitecon, or SiteGA, only the last method elicited notable trend to terms related to negative regulation and apoptosis. The results suggest that the SF-1 binding sites are different in both their structure and the functional annotation of the set of target genes correspond to the predictions by oPWM+Sitecon and SiteGA. Further application of Homer software for de novo identification of enriched motifs in ChIP-Seq data for SF-1ChIP-seq dataset gave the data similar to oPWM+Sitecon.
Collapse
Affiliation(s)
- V G Levitsky
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia; Novosibirsk State University, Novosibirsk, Russia.
| | - D Yu Oshchepkov
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
| | - N V Klimova
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
| | - E V Ignatieva
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia; Novosibirsk State University, Novosibirsk, Russia
| | - G V Vasiliev
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
| | - V M Merkulov
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
| | - T I Merkulova
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia; Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
44
|
A computational method for prediction of rSNPs in human genome. Comput Biol Chem 2016; 62:96-103. [PMID: 27107687 DOI: 10.1016/j.compbiolchem.2016.04.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Revised: 02/27/2016] [Accepted: 04/01/2016] [Indexed: 11/22/2022]
Abstract
Regulatory single nucleotide polymorphisms (rSNPs) in human genomes are thought to be responsible for phenotypic differences, including susceptibility to diseases and treatment outcomes, even they do not change any gene product. However, a genome-wide search for rSNPs has not been properly addressed so far. In this work, a computational method for rSNP identification is proposed. As background SNPs far outnumber rSNPs, an ensemble method is applied to handle imbalanced data, which firstly converts an unbalanced dataset into several balanced ones and then models for every balanced dataset. Two major types of features are extracted, that are sequence based features and allele-specific based features. Then random forest is applied to build the recognition model for each balanced dataset. Finally, ensemble strategies are adopted to combine the result of each model together. We have tested our method on a set of experimentally verified rSNPs, and leave-one-out cross-validation results showed that our method can achieve accuracy with sensitivity of 73.8%, specificity of 71.8% and the area under ROC curve (AUC) is 0.756. In addition, our method is threshold free and doesn't rely on data of regulatory elements, thus it will have better adaptability when facing different data scenarios. The original data and the source matlab codes involved are available at https://sourceforge.net/projects/rsnpdect/.
Collapse
|
45
|
Litovchenko M, Laurent S. TEMPLE: analysing population genetic variation at transcription factor binding sites. Mol Ecol Resour 2016; 16:1428-1434. [PMID: 27106869 DOI: 10.1111/1755-0998.12535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2016] [Revised: 04/01/2016] [Accepted: 04/07/2016] [Indexed: 11/30/2022]
Abstract
Genetic variation occurring at the level of regulatory sequences can affect phenotypes and fitness in natural populations. This variation can be analysed in a population genetic framework to study how genetic drift and selection affect the evolution of these functional elements. However, doing this requires a good understanding of the location and nature of regulatory regions and has long been a major hurdle. The current proliferation of genomewide profiling experiments of transcription factor occupancies greatly improves our ability to identify genomic regions involved in specific DNA-protein interactions. Although software exists for predicting transcription factor binding sites (TFBS), and the effects of genetic variants on TFBS specificity, there are no tools currently available for inferring this information jointly with the genetic variation at TFBS in natural populations. We developed the software Transcription Elements Mapping at the Population LEvel (TEMPLE), which predicts TFBS, evaluates the effects of genetic variants on TFBS specificity and summarizes the genetic variation occurring at TFBS in intraspecific sequence alignments. We demonstrate that TEMPLE's TFBS prediction algorithms gives identical results to PATSER, a software distribution commonly used in the field. We also illustrate the unique features of TEMPLE by analysing TFBS diversity for the TF Senseless (SENS) in one ancestral and one cosmopolitan population of the fruit fly Drosophila melanogaster. TEMPLE can be used to localize TFBS that are characterized by strong genetic differentiation across natural populations. This will be particularly useful for studies aiming to identify adaptive mutations. TEMPLE is a java-based cross-platform software that easily maps the genetic diversity at predicted TFBSs using a graphical interface, or from the Unix command line.
Collapse
Affiliation(s)
- Maria Litovchenko
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland. .,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| | - Stefan Laurent
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland. .,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| |
Collapse
|
46
|
Niroula A, Vihinen M. Variation Interpretation Predictors: Principles, Types, Performance, and Choice. Hum Mutat 2016; 37:579-97. [DOI: 10.1002/humu.22987] [Citation(s) in RCA: 90] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 03/07/2016] [Indexed: 12/18/2022]
Affiliation(s)
- Abhishek Niroula
- Department of Experimental Medical Science; Lund University; BMC B13 Lund SE-22184 Sweden
| | - Mauno Vihinen
- Department of Experimental Medical Science; Lund University; BMC B13 Lund SE-22184 Sweden
| |
Collapse
|
47
|
Ponomarenko MP, Arkova O, Rasskazov D, Ponomarenko P, Savinkova L, Kolchanov N. Candidate SNP Markers of Gender-Biased Autoimmune Complications of Monogenic Diseases Are Predicted by a Significant Change in the Affinity of TATA-Binding Protein for Human Gene Promoters. Front Immunol 2016; 7:130. [PMID: 27092142 PMCID: PMC4819121 DOI: 10.3389/fimmu.2016.00130] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2015] [Accepted: 03/21/2016] [Indexed: 12/17/2022] Open
Abstract
Some variations of human genome [for example, single nucleotide polymorphisms (SNPs)] are markers of hereditary diseases and drug responses. Analysis of them can help to improve treatment. Computer-based analysis of millions of SNPs in the 1000 Genomes project makes a search for SNP markers more targeted. Here, we combined two computer-based approaches: DNA sequence analysis and keyword search in databases. In the binding sites for TATA-binding protein (TBP) in human gene promoters, we found candidate SNP markers of gender-biased autoimmune diseases, including rs1143627 [cachexia in rheumatoid arthritis (double prevalence among women)]; rs11557611 [demyelinating diseases (thrice more prevalent among young white women than among non-white individuals)]; rs17231520 and rs569033466 [both: atherosclerosis comorbid with related diseases (double prevalence among women)]; rs563763767 [Hughes syndrome-related thrombosis (lethal during pregnancy)]; rs2814778 [autoimmune diseases (excluding multiple sclerosis and rheumatoid arthritis) underlying hypergammaglobulinemia in women]; rs72661131 and rs562962093 (both: preterm delivery in pregnant diabetic women); and rs35518301, rs34166473, rs34500389, rs33981098, rs33980857, rs397509430, rs34598529, rs33931746, rs281864525, and rs63750953 (all: autoimmune diseases underlying hypergammaglobulinemia in women). Validation of these predicted candidate SNP markers using the clinical standards may advance personalized medicine.
Collapse
Affiliation(s)
- Mikhail P. Ponomarenko
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
- Novosibirsk State University, Novosibirsk, Russia
| | - Olga Arkova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
| | - Dmitry Rasskazov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
| | | | - Ludmila Savinkova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
| | - Nikolay Kolchanov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
- Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
48
|
Turnaev II, Rasskazov DA, Arkova OV, Ponomarenko MP, Ponomarenko PM, Savinkova LK, Kolchanov NA. Hypothetical SNP markers that significantly affect the affinity of the TATA-binding protein to VEGFA, ERBB2, IGF1R, FLT1, KDR, and MET oncogene promoters as chemotherapy targets. Mol Biol 2016. [DOI: 10.1134/s0026893316010209] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
49
|
Khurana E, Fu Y, Chakravarty D, Demichelis F, Rubin MA, Gerstein M. Role of non-coding sequence variants in cancer. Nat Rev Genet 2016; 17:93-108. [PMID: 26781813 DOI: 10.1038/nrg.2015.17] [Citation(s) in RCA: 319] [Impact Index Per Article: 35.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Patients with cancer carry somatic sequence variants in their tumour in addition to the germline variants in their inherited genome. Although variants in protein-coding regions have received the most attention, numerous studies have noted the importance of non-coding variants in cancer. Moreover, the overwhelming majority of variants, both somatic and germline, occur in non-coding portions of the genome. We review the current understanding of non-coding variants in cancer, including the great diversity of the mutation types--from single nucleotide variants to large genomic rearrangements--and the wide range of mechanisms by which they affect gene expression to promote tumorigenesis, such as disrupting transcription factor-binding sites or functions of non-coding RNAs. We highlight specific case studies of somatic and germline variants, and discuss how non-coding variants can be interpreted on a large-scale through computational and experimental methods.
Collapse
Affiliation(s)
- Ekta Khurana
- Meyer Cancer Center, Weill Cornell Medical College, New York, New York 10065, USA.,Institute for Precision Medicine, Weill Cornell Medical College, New York, New York 10065, USA.,Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York 10021, USA.,Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York 10065, USA
| | - Yao Fu
- Bina Technologies, Roche Sequencing, Redwood City, California 94065, USA
| | - Dimple Chakravarty
- Institute for Precision Medicine, Weill Cornell Medical College, New York, New York 10065, USA.,Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York 10065, USA
| | - Francesca Demichelis
- Institute for Precision Medicine, Weill Cornell Medical College, New York, New York 10065, USA.,Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York 10021, USA.,Centre for Integrative Biology, University of Trento, 38123 Trento, Italy
| | - Mark A Rubin
- Meyer Cancer Center, Weill Cornell Medical College, New York, New York 10065, USA.,Institute for Precision Medicine, Weill Cornell Medical College, New York, New York 10065, USA.,Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York 10065, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA.,Department of Computer Science, Yale University, New Haven, Connecticut 06520, USA
| |
Collapse
|
50
|
Arkova OV, Ponomarenko MP, Rasskazov DA, Drachkova IA, Arshinova TV, Ponomarenko PM, Savinkova LK, Kolchanov NA. Obesity-related known and candidate SNP markers can significantly change affinity of TATA-binding protein for human gene promoters. BMC Genomics 2015; 16 Suppl 13:S5. [PMID: 26694100 PMCID: PMC4686794 DOI: 10.1186/1471-2164-16-s13-s5] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Obesity affects quality of life and life expectancy and is associated with cardiovascular disorders, cancer, diabetes, reproductive disorders in women, prostate diseases in men, and congenital anomalies in children. The use of single nucleotide polymorphism (SNP) markers of diseases and drug responses (i.e., significant differences of personal genomes of patients from the reference human genome) can help physicians to improve treatment. Clinical research can validate SNP markers via genotyping of patients and demonstration that SNP alleles are significantly more frequent in patients than in healthy people. The search for biomedical SNP markers of interest can be accelerated by computer-based analysis of hundreds of millions of SNPs in the 1000 Genomes project because of selection of the most meaningful candidate SNP markers and elimination of neutral SNPs. RESULTS We cross-validated the output of two computer-based methods: DNA sequence analysis using Web service SNP_TATA_Comparator and keyword search for articles on comorbidities of obesity. Near the sites binding to TATA-binding protein (TBP) in human gene promoters, we found 22 obesity-related candidate SNP markers, including rs10895068 (male breast cancer in obesity); rs35036378 (reduced risk of obesity after ovariectomy); rs201739205 (reduced risk of obesity-related cancers due to weight loss by diet/exercise in obese postmenopausal women); rs183433761 (obesity resistance during a high-fat diet); rs367732974 and rs549591993 (both: cardiovascular complications in obese patients with type 2 diabetes mellitus); rs200487063 and rs34104384 (both: obesity-caused hypertension); rs35518301, rs72661131, and rs562962093 (all: obesity); and rs397509430, rs33980857, rs34598529, rs33931746, rs33981098, rs34500389, rs63750953, rs281864525, rs35518301, and rs34166473 (all: chronic inflammation in comorbidities of obesity). Using an electrophoretic mobility shift assay under nonequilibrium conditions, we empirically validated the statistical significance (α < 0.00025) of the differences in TBP affinity values between the minor and ancestral alleles of 4 out of the 22 SNPs: rs200487063, rs201381696, rs34104384, and rs183433761. We also measured half-life (t1/2), Gibbs free energy change (ΔG), and the association and dissociation rate constants, ka and kd, of the TBP-DNA complex for these SNPs. CONCLUSIONS Validation of the 22 candidate SNP markers by proper clinical protocols appears to have a strong rationale and may advance postgenomic predictive preventive personalized medicine.
Collapse
Affiliation(s)
- Olga V Arkova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyeva Avenue, Novosibirsk 630090, Russia
| | - Mikhail P Ponomarenko
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyeva Avenue, Novosibirsk 630090, Russia
- Novosibirsk State University, 2 Pirogova Street, Novosibirsk 630090, Russia
- Laboratory of Evolutionary Bioinformatics and Theoretical Genetics, Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk 630090, Russia
| | - Dmitry A Rasskazov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyeva Avenue, Novosibirsk 630090, Russia
| | - Irina A Drachkova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyeva Avenue, Novosibirsk 630090, Russia
| | - Tatjana V Arshinova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyeva Avenue, Novosibirsk 630090, Russia
| | - Petr M Ponomarenko
- Children's Hospital Los Angeles, 4640 Hollywood Boulevard, University of Southern California, Los Angeles, CA 90027, USA
| | - Ludmila K Savinkova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyeva Avenue, Novosibirsk 630090, Russia
| | - Nikolay A Kolchanov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyeva Avenue, Novosibirsk 630090, Russia
- Novosibirsk State University, 2 Pirogova Street, Novosibirsk 630090, Russia
| |
Collapse
|