1
|
Samy A, Suzek BE, Ozdemir MK, Sensoy O. In Silico Analysis of a Highly Mutated Gene in Cancer Provides Insight into Abnormal mRNA Splicing: Splicing Factor 3B Subunit 1 K700E Mutant. Biomolecules 2020; 10:E680. [PMID: 32354150 PMCID: PMC7277358 DOI: 10.3390/biom10050680] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 04/17/2020] [Accepted: 04/20/2020] [Indexed: 12/25/2022] Open
Abstract
Cancer is the second leading cause of death worldwide. The etiology of the disease has remained elusive, but mutations causing aberrant RNA splicing have been considered one of the significant factors in various cancer types. The association of aberrant RNA splicing with drug/therapy resistance further increases the importance of these mutations. In this work, the impact of the splicing factor 3B subunit 1 (SF3B1) K700E mutation, a highly prevalent mutation in various cancer types, is investigated through molecular dynamics simulations. Based on our results, K700E mutation increases flexibility of the mutant SF3B1. Consequently, this mutation leads to i) disruption of interaction of pre-mRNA with SF3B1 and p14, thus preventing proper alignment of mRNA and causing usage of abnormal 3' splice site, and ii) disruption of communication in critical regions participating in interactions with other proteins in pre-mRNA splicing machinery. We anticipate that this study enhances our understanding of the mechanism of functional abnormalities associated with splicing machinery, thereby, increasing possibility for designing effective therapies to combat cancer at an earlier stage.
Collapse
Affiliation(s)
- Asmaa Samy
- The Graduate School of Engineering and Natural Science, Istanbul Medipol University, 34810 Istanbul, Turkey
| | - Baris Ethem Suzek
- Department of Computer Engineering, Muğla Sıtkı Koçman University, 48000 Muğla, Turkey
| | - Mehmet Kemal Ozdemir
- The School of Engineering and Natural Science, Istanbul Medipol University, 34810 Istanbul, Turkey
| | - Ozge Sensoy
- The School of Engineering and Natural Science, Istanbul Medipol University, 34810 Istanbul, Turkey
- Regenerative and Restorative Medicine Research Center (REMER), Istanbul Medipol University, 34810 Istanbul, Turkey
| |
Collapse
|
2
|
Jiang X, Neapolitan RE. Evaluation of a two-stage framework for prediction using big genomic data. Brief Bioinform 2015; 16:912-21. [PMID: 25788325 PMCID: PMC4652616 DOI: 10.1093/bib/bbv010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Revised: 02/05/2015] [Indexed: 01/13/2023] Open
Abstract
We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods.
Collapse
|
3
|
Survey of network-based approaches to research of cardiovascular diseases. BIOMED RESEARCH INTERNATIONAL 2014; 2014:527029. [PMID: 24772427 PMCID: PMC3977459 DOI: 10.1155/2014/527029] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2013] [Accepted: 02/07/2014] [Indexed: 01/08/2023]
Abstract
Cardiovascular diseases (CVDs) are the leading health problem worldwide. Investigating causes and mechanisms of CVDs calls for an integrative approach that would take into account its complex etiology. Biological networks generated from available data on biomolecular interactions are an excellent platform for understanding interconnectedness of all processes within a living cell, including processes that underlie diseases. Consequently, topology of biological networks has successfully been used for identifying genes, pathways, and modules that govern molecular actions underlying various complex diseases. Here, we review approaches that explore and use relationships between topological properties of biological networks and mechanisms underlying CVDs.
Collapse
|
4
|
Jiang X, Barmada MM, Cooper GF, Becich MJ. A bayesian method for evaluating and discovering disease loci associations. PLoS One 2011; 6:e22075. [PMID: 21853025 PMCID: PMC3154195 DOI: 10.1371/journal.pone.0022075] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2011] [Accepted: 06/14/2011] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND A genome-wide association study (GWAS) typically involves examining representative SNPs in individuals from some population. A GWAS data set can concern a million SNPs and may soon concern billions. Researchers investigate the association of each SNP individually with a disease, and it is becoming increasingly commonplace to also analyze multi-SNP associations. Techniques for handling so many hypotheses include the Bonferroni correction and recently developed bayesian methods. These methods can encounter problems. Most importantly, they are not applicable to a complex multi-locus hypothesis which has several competing hypotheses rather than only a null hypothesis. A method that computes the posterior probability of complex hypotheses is a pressing need. METHODOLOGY/FINDINGS We introduce the bayesian network posterior probability (BNPP) method which addresses the difficulties. The method represents the relationship between a disease and SNPs using a directed acyclic graph (DAG) model, and computes the likelihood of such models using a bayesian network scoring criterion. The posterior probability of a hypothesis is computed based on the likelihoods of all competing hypotheses. The BNPP can not only be used to evaluate a hypothesis that has previously been discovered or suspected, but also to discover new disease loci associations. The results of experiments using simulated and real data sets are presented. Our results concerning simulated data sets indicate that the BNPP exhibits both better evaluation and discovery performance than does a p-value based method. For the real data sets, previous findings in the literature are confirmed and additional findings are found. CONCLUSIONS/SIGNIFICANCE We conclude that the BNPP resolves a pressing problem by providing a way to compute the posterior probability of complex multi-locus hypotheses. A researcher can use the BNPP to determine the expected utility of investigating a hypothesis further. Furthermore, we conclude that the BNPP is a promising method for discovering disease loci associations.
Collapse
Affiliation(s)
- Xia Jiang
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America.
| | | | | | | |
Collapse
|
5
|
Barabási AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet 2011; 12:56-68. [PMID: 21164525 DOI: 10.1038/nrg2918] [Citation(s) in RCA: 2795] [Impact Index Per Article: 215.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Given the functional interdependencies between the molecular components in a human cell, a disease is rarely a consequence of an abnormality in a single gene, but reflects the perturbations of the complex intracellular and intercellular network that links tissue and organ systems. The emerging tools of network medicine offer a platform to explore systematically not only the molecular complexity of a particular disease, leading to the identification of disease modules and pathways, but also the molecular relationships among apparently distinct (patho)phenotypes. Advances in this direction are essential for identifying new disease genes, for uncovering the biological significance of disease-associated mutations identified by genome-wide association studies and full-genome sequencing, and for identifying drug targets and biomarkers for complex diseases.
Collapse
Affiliation(s)
- Albert-László Barabási
- Center for Complex Networks Research and Department of Physics, Northeastern University, 110 Forsyth Street, 111 Dana Research Center, Boston, Massachusetts 02115, USA.
| | | | | |
Collapse
|
6
|
Bonifaci N, Górski B, Masojć B, Wokołorczyk D, Jakubowska A, Dębniak T, Berenguer A, Serra Musach J, Brunet J, Dopazo J, Narod SA, Lubiński J, Lázaro C, Cybulski C, Pujana MA. Exploring the link between germline and somatic genetic alterations in breast carcinogenesis. PLoS One 2010; 5:e14078. [PMID: 21124932 PMCID: PMC2989917 DOI: 10.1371/journal.pone.0014078] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Accepted: 11/02/2010] [Indexed: 12/19/2022] Open
Abstract
Recent genome-wide association studies (GWASs) have identified candidate genes contributing to cancer risk through low-penetrance mutations. Many of these genes were unexpected and, intriguingly, included well-known players in carcinogenesis at the somatic level. To assess the hypothesis of a germline-somatic link in carcinogenesis, we evaluated the distribution of somatic gene labels within the ordered results of a breast cancer risk GWAS. This analysis suggested frequent influence on risk of genetic variation in loci encoding for "driver kinases" (i.e., kinases encoded by genes that showed higher somatic mutation rates than expected by chance and, therefore, whose deregulation may contribute to cancer development and/or progression). Assessment of these predictions using a population-based case-control study in Poland replicated the association for rs3732568 in EPHB1 (odds ratio (OR) = 0.79; 95% confidence interval (CI): 0.63-0.98; P(trend) = 0.031). Analyses by early age at diagnosis and by estrogen receptor α (ERα) tumor status indicated potential associations for rs6852678 in CDKL2 (OR = 0.32, 95% CI: 0.10-1.00; P(recessive) = 0.044) and rs10878640 in DYRK2 (OR = 2.39, 95% CI: 1.32-4.30; P(dominant) = 0.003), and for rs12765929, rs9836340, rs4707795 in BMPR1A, EPHA3 and EPHA7, respectively (ERα tumor status P(interaction)<0.05). The identification of three novel candidates as EPH receptor genes might indicate a link between perturbed compartmentalization of early neoplastic lesions and breast cancer risk and progression. Together, these data may lay the foundations for replication in additional populations and could potentially increase our knowledge of the underlying molecular mechanisms of breast carcinogenesis.
Collapse
Affiliation(s)
- Núria Bonifaci
- Biomarkers and Susceptibility Unit, Spanish Biomedical Research Centre Network for Epidemiology and Public Health, Catalan Institute of Oncology, L'Institut d'Investigació Biomèdica de Bellvitge (IDIBELL), L'Hospitalet, Barcelona, Spain
| | - Bohdan Górski
- Department of Genetics and Pathology, International Hereditary Cancer Center, Pomeranian Medical University, Szczecin, Poland
| | - Bartlomiej Masojć
- Department of Genetics and Pathology, International Hereditary Cancer Center, Pomeranian Medical University, Szczecin, Poland
| | - Dominika Wokołorczyk
- Department of Genetics and Pathology, International Hereditary Cancer Center, Pomeranian Medical University, Szczecin, Poland
| | - Anna Jakubowska
- Department of Genetics and Pathology, International Hereditary Cancer Center, Pomeranian Medical University, Szczecin, Poland
| | - Tadeusz Dębniak
- Department of Genetics and Pathology, International Hereditary Cancer Center, Pomeranian Medical University, Szczecin, Poland
| | - Antoni Berenguer
- Biomarkers and Susceptibility Unit, Spanish Biomedical Research Centre Network for Epidemiology and Public Health, Catalan Institute of Oncology, L'Institut d'Investigació Biomèdica de Bellvitge (IDIBELL), L'Hospitalet, Barcelona, Spain
| | - Jordi Serra Musach
- Biomarkers and Susceptibility Unit, Spanish Biomedical Research Centre Network for Epidemiology and Public Health, Catalan Institute of Oncology, L'Institut d'Investigació Biomèdica de Bellvitge (IDIBELL), L'Hospitalet, Barcelona, Spain
| | - Joan Brunet
- Hereditary Cancer Programme, Catalan Institute of Oncology, IdIBGi, Girona, Spain
| | - Joaquín Dopazo
- Department of Bioinformatics and Genomics, Centro de Investigación Príncipe Felipe, Functional Genomics Node and Spanish Biomedical Research Centre Network for Rare Diseases, Valencia, Spain
| | - Steven A. Narod
- Womens College Research Institute, University of Toronto and Women's College Hospital, Toronto, Ontario, Canada
| | - Jan Lubiński
- Department of Genetics and Pathology, International Hereditary Cancer Center, Pomeranian Medical University, Szczecin, Poland
| | - Conxi Lázaro
- Hereditary Cancer Programme, Catalan Institute of Oncology, IDIBELL, L'Hospitalet, Barcelona, Spain
| | - Cezary Cybulski
- Department of Genetics and Pathology, International Hereditary Cancer Center, Pomeranian Medical University, Szczecin, Poland
| | - Miguel Angel Pujana
- Biomarkers and Susceptibility Unit, Spanish Biomedical Research Centre Network for Epidemiology and Public Health, Catalan Institute of Oncology, L'Institut d'Investigació Biomèdica de Bellvitge (IDIBELL), L'Hospitalet, Barcelona, Spain
- Translational Research Laboratory, Catalan Institute of Oncology, IDIBELL, L'Hospitalet, Barcelona, Spain
| |
Collapse
|
7
|
Montaner D, Dopazo J. Multidimensional gene set analysis of genomic data. PLoS One 2010; 5:e10348. [PMID: 20436964 PMCID: PMC2860497 DOI: 10.1371/journal.pone.0010348] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2009] [Accepted: 03/30/2010] [Indexed: 11/27/2022] Open
Abstract
Understanding the functional implications of changes in gene expression, mutations, etc., is the aim of most genomic experiments. To achieve this, several functional profiling methods have been proposed. Such methods study the behaviour of different gene modules (e.g. gene ontology terms) in response to one particular variable (e.g. differential gene expression). In spite to the wealth of information provided by functional profiling methods, a common limitation to all of them is their inherent unidimensional nature. In order to overcome this restriction we present a multidimensional logistic model that allows studying the relationship of gene modules with different genome-scale measurements (e.g. differential expression, genotyping association, methylation, copy number alterations, heterozygosity, etc.) simultaneously. Moreover, the relationship of such functional modules with the interactions among the variables can also be studied, which produces novel results impossible to be derived from the conventional unidimensional functional profiling methods. We report sound results of gene sets associations that remained undetected by the conventional one-dimensional gene set analysis in several examples. Our findings demonstrate the potential of the proposed approach for the discovery of new cell functionalities with complex dependences on more than one variable.
Collapse
Affiliation(s)
- David Montaner
- Department of Bioinformatics and Genomics, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
- Functional Genomics Node (INB), Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Joaquín Dopazo
- Department of Bioinformatics and Genomics, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
- Functional Genomics Node (INB), Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
- CIBER de Enfermedades Raras (CIBERER), Valencia, Spain
| |
Collapse
|
9
|
Medina I, Montaner D, Bonifaci N, Pujana MA, Carbonell J, Tarraga J, Al-Shahrour F, Dopazo J. Gene set-based analysis of polymorphisms: finding pathways or biological processes associated to traits in genome-wide association studies. Nucleic Acids Res 2009; 37:W340-4. [PMID: 19502494 PMCID: PMC2703970 DOI: 10.1093/nar/gkp481] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Genome-wide association studies have become a popular strategy to find associations of genes to traits of interest. Despite the high-resolution available today to carry out genotyping studies, the success of its application in real studies has been limited by the testing strategy used. As an alternative to brute force solutions involving the use of very large cohorts, we propose the use of the Gene Set Analysis (GSA), a different analysis strategy based on testing the association of modules of functionally related genes. We show here how the Gene Set-based Analysis of Polymorphisms (GeSBAP), which is a simple implementation of the GSA strategy for the analysis of genome-wide association studies, provides a significant increase in the power testing for this type of studies. GeSBAP is freely available at http://bioinfo.cipf.es/gesbap/
Collapse
Affiliation(s)
- Ignacio Medina
- Department of Bioinformatics and Genomics, CIPF, Valencia, Spain
| | | | | | | | | | | | | | | |
Collapse
|