1
|
Romdhane L, Bouhamed H, Ghedira K, Ben Hamda C, Louhichi A, Jmel H, Romdhane S, Charfeddine C, Mokni M, Abdelhak S, Rebai A. The morbid cutaneous anatomy of the human genome revealed by a bioinformatic approach. Genomics 2020; 112:4232-4241. [PMID: 32650097 DOI: 10.1016/j.ygeno.2020.07.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 03/28/2020] [Accepted: 07/02/2020] [Indexed: 01/05/2023]
Abstract
Computational approaches have been developed to prioritize candidate genes in disease gene identification. They are based on different pieces of evidences associating each gene with the given disease. In this study, 648 genes underlying genodermatoses have been compared to 1808 genes involved in other genetic diseases using a bioinformatic approach. These genes were studied at the structural, evolutionary and functional levels. Results show that genes underlying genodermatoses present longer CDS and have more exons. Significant differences were observed in nucleotide motif and amino-acid compositions. Evolutionary conservation analysis revealed that genodermatoses genes have less paralogs, more orthologs in Mouse and Dog and are less conserved. Functional analysis revealed that genodermatosis genes seem to be involved in immune system and skin layers. The Bayesian network model returned a rate of good classification of around 80%. This computational approach could help investigators working in the field of dermatology by prioritizing positional candidate genes for mutation screening.
Collapse
Affiliation(s)
- Lilia Romdhane
- Biomedical Genomics and Oncogenetics Laboratory LR11IPT05, LR16IPT05, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia; Department of Biology, Faculty of Sciences of Bizerte, Jarzouna, Université Tunis Carthage, Tunis, Tunisia.
| | - Heni Bouhamed
- Molecular and Cellular Screening Process Laboratory, Centre of Biotechnology of Sfax, Sfax, Tunisia
| | - Kais Ghedira
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR16IPT09), Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Cherif Ben Hamda
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR16IPT09), Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Amel Louhichi
- Molecular and Cellular Screening Process Laboratory, Centre of Biotechnology of Sfax, Sfax, Tunisia
| | - Haifa Jmel
- Biomedical Genomics and Oncogenetics Laboratory LR11IPT05, LR16IPT05, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Safa Romdhane
- Biomedical Genomics and Oncogenetics Laboratory LR11IPT05, LR16IPT05, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Chérine Charfeddine
- Biomedical Genomics and Oncogenetics Laboratory LR11IPT05, LR16IPT05, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia; High Institut of Biotechnology of Sidi Thabet, University of Manouba, BiotechPole of Sidi Thabet, Ariana, Tunisia
| | - Mourad Mokni
- Department of Dermatology, CHU La Rabta Tunis, Tunis, Tunisia; Public health and infection Research Laboratory, La Rabta Hospital, Tunis, Tunisia
| | - Sonia Abdelhak
- Biomedical Genomics and Oncogenetics Laboratory LR11IPT05, LR16IPT05, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Ahmed Rebai
- Molecular and Cellular Screening Process Laboratory, Centre of Biotechnology of Sfax, Sfax, Tunisia
| |
Collapse
|
2
|
Nikdelfaz O, Jalili S. Disease genes prediction by HMM based PU-learning using gene expression profiles. J Biomed Inform 2018; 81:102-111. [PMID: 29571901 DOI: 10.1016/j.jbi.2018.03.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Revised: 11/22/2017] [Accepted: 03/12/2018] [Indexed: 12/24/2022]
Abstract
Predicting disease candidate genes from human genome is a crucial part of nowadays biomedical research. According to observations, diseases with the same phenotype have the similar biological characteristics and genes associated with these same diseases tend to share common functional properties. Therefore, by applying machine learning methods, new disease genes are predicted based on previous ones. In recent studies, some semi-supervised learning methods, called Positive-Unlabeled Learning (PU-Learning) are used for predicting disease candidate genes. In this study, a novel method is introduced to predict disease candidate genes through gene expression profiles by learning hidden Markov models. In order to evaluate the proposed method, it is applied on a mixed part of 398 disease genes from three disease types and 12001 unlabeled genes. Compared to the other methods in literature, the experimental results indicate a significant improvement in favor of the proposed method.
Collapse
Affiliation(s)
- Ozra Nikdelfaz
- Tarbiat Modares University, Computer Engineering Department, Islamic Republic of Iran.
| | - Saeed Jalili
- Tarbiat Modares University, Computer Engineering Department, Islamic Republic of Iran.
| |
Collapse
|
3
|
Mao X, Phanavanh B, Hamdan H, Moerman-Herzog A, Barger SW. NFκB-inducing kinase inhibits NFκB activity specifically in neurons of the CNS. J Neurochem 2016; 137:154-63. [PMID: 26778773 PMCID: PMC5115916 DOI: 10.1111/jnc.13526] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2015] [Revised: 12/15/2015] [Accepted: 01/04/2016] [Indexed: 12/30/2022]
Abstract
The control of NFκB in CNS neurons appears to differ from that in other cell types. Studies have reported induction of NFκB in neuronal cultures and immunostaining in vivo, but others have consistently detected little or no transcriptional activation by NFκB in brain neurons. To test if neurons lack some component of the signal transduction system for NFκB activation, we transfected cortical neurons with several members of this signaling system along with a luciferase-based NFκB-reporter plasmid; RelA was cotransfected in some conditions. No component of the NFκB pathway was permissive for endogenous NFκB activity, and none stimulated the activity of exogenous RelA. Surprisingly, however, the latter was inhibited by cotransfection of NFκB-inducing kinase (NIK). Fluorescence imaging of RelA indicated that co-expression of NIK sequestered RelA in the cytoplasm, similar to the effect of IκBα. NIK-knockout mice showed elevated expression of an NFκB-reporter construct in neurons in vivo. Cortical neurons cultured from NIK-knockout mice showed elevated expression of an NFκB-reporter transgene. Consistent with data from other cell types, a C-terminal fragment of NIK suppressed RelA activity in astrocytes as well as neurons. Therefore, the inhibitory ability of the NIK C-terminus was unbiased with regard to cell type. However, inhibition of NFκB by full-length NIK is a novel outcome that appears to be specific to CNS neurons. This has implications for unique aspects of transcription in the CNS, perhaps relevant to aspects of development, neuroplasticity, and neuroinflammation. Full-length NIK was found to inhibit (down arrow) transcriptional activation of NFκB in neurons, while it elevated (up arrow) activity in astrocytes. Deletion constructs corresponding to the N-terminus or C-terminus also inhibited NFκB in neurons, while only the C-terminus did so in astrocytes. One possible explanation is that the inhibition in neurons occurs via two different mechanisms, including the potential for a neuron-specific protein (e.g., one of the 14-3-3 class) to create a novel complex in neurons, whereas the C-terminus may interact directly with NFκB. [Structure of NIK is based on Liu J., Sudom A., Min X., Cao Z., Gao X., Ayres M., Lee F., Cao P., Johnstone S., Plotnikova O., Walker N., Chen G., and Wang Z. (2012) Structure of the nuclear factor κB-inducing kinase (NIK) kinase domain reveals a constitutively active conformation. J Biol Chem. 287, 27326-27334); N-terminal lobe is oriented at top].
Collapse
Affiliation(s)
- Xianrong Mao
- Department of Genetics, Washington University, St. Louis MO 63110
| | - Bounleut Phanavanh
- Department of Geriatrics, University of Arkansas for Medical Sciences, Little Rock AR 72205
| | - Hamdan Hamdan
- Department of Neuroscience, Baylor College of Medicine, Houston TX 77030
| | - Andréa Moerman-Herzog
- Department of Geriatrics, University of Arkansas for Medical Sciences, Little Rock AR 72205
| | - Steven W. Barger
- Department of Geriatrics, University of Arkansas for Medical Sciences, Little Rock AR 72205
- Department of Neurobiology and Developmental Sciences, University of Arkansas for Medical Sciences, Little Rock AR 72205
- Geriatric Research Education and Clinical Center, Central Arkansas Veterans Healthcare System, Little Rock AR 72205
| |
Collapse
|
4
|
Piraino SW, Furney SJ. Beyond the exome: the role of non-coding somatic mutations in cancer. Ann Oncol 2015; 27:240-8. [PMID: 26598542 DOI: 10.1093/annonc/mdv561] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Accepted: 11/04/2015] [Indexed: 02/06/2023] Open
Abstract
The comprehensive identification of mutations contributing to the development of cancer is a priority of large cancer sequencing projects. To date, most studies have scrutinized mutations in coding regions of the genome, but several recent discoveries, including the identification of recurrent somatic mutations in the TERT promoter in multiple cancer types, support the idea that mutations in non-coding regions are also important in tumour development. Furthermore, analysis of whole-genome sequencing data from tumours has elucidated novel mutational patterns and processes etched into cancer genomes. Here, we present an overview of insights gleaned from the analysis of mutations from sequenced cancer genomes. We then review the mechanisms by which non-coding mutations can play a role in cancer. Finally, we discuss recent efforts aimed at identifying non-coding driver mutations, as well as the unique challenges that the analysis of non-coding mutations present in contrast to the identification of driver mutations in coding regions.
Collapse
Affiliation(s)
- S W Piraino
- School of Medicine, Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
| | - S J Furney
- School of Medicine, Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
| |
Collapse
|
5
|
Abstract
Background In recent years, high-throughput protein interaction identification methods have generated a large amount of data. When combined with the results from other in vivo and in vitro experiments, a complex set of relationships between biological molecules emerges. The growing popularity of network analysis and data mining has allowed researchers to recognize indirect connections between these molecules. Due to the interdependent nature of network entities, evaluating proteins in this context can reveal relationships that may not otherwise be evident. Methods We examined the human protein interaction network as it relates to human illness using the Disease Ontology. After calculating several topological metrics, we trained an alternating decision tree (ADTree) classifier to identify disease-associated proteins. Using a bootstrapping method, we created a tree to highlight conserved characteristics shared by many of these proteins. Subsequently, we reviewed a set of non-disease-associated proteins that were misclassified by the algorithm with high confidence and searched for evidence of a disease relationship. Results Our classifier was able to predict disease-related genes with 79% area under the receiver operating characteristic (ROC) curve (AUC), which indicates the tradeoff between sensitivity and specificity and is a good predictor of how a classifier will perform on future data sets. We found that a combination of several network characteristics including degree centrality, disease neighbor ratio, eccentricity, and neighborhood connectivity help to distinguish between disease- and non-disease-related proteins. Furthermore, the ADTree allowed us to understand which combinations of strongly predictive attributes contributed most to protein-disease classification. In our post-processing evaluation, we found several examples of potential novel disease-related proteins and corresponding literature evidence. In addition, we showed that first- and second-order neighbors in the PPI network could be used to identify likely disease associations. Conclusions We analyzed the human protein interaction network and its relationship to disease and found that both the number of interactions with other proteins and the disease relationship of neighboring proteins helped to determine whether a protein had a relationship to disease. Our classifier predicted many proteins with no annotated disease association to be disease-related, which indicated that these proteins have network characteristics that are similar to disease-related proteins and may therefore have disease associations not previously identified. By performing a post-processing step after the prediction, we were able to identify evidence in literature supporting this possibility. This method could provide a useful filter for experimentalists searching for new candidate protein targets for drug repositioning and could also be extended to include other network and data types in order to refine these predictions.
Collapse
|
6
|
Upton A, Arvanitis TN. Using evolutional properties of gene networks in understanding survival prognosis of glioblastoma. IEEE J Biomed Health Inform 2013; 18:810-6. [PMID: 24058043 DOI: 10.1109/jbhi.2013.2282569] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Previously, we investigated survival prognosis of glioblastoma by applying a gene regulatory approach to a human glioblastoma dataset. Here, we further extend our understanding of survival prognosis of glioblastoma by refining the network inference technique we apply to the glioblastoma dataset with the intent of uncovering further topological properties of the networks. For this study, we modify the approach by specifically looking at both positive and negative correlations separately, as opposed to absolute correlations. There is great interest in applying mathematical modeling approaches to cancer cell line datasets to generate network models of gene regulatory interactions. Analysis of these networks using graph theory metrics can identify genes of interest. The principal approach for modeling microarray datasets has been to group all the cell lines together into one overall network, and then, analyze this network as a whole. As per the previous study, we categorize a human glioblastoma cell line dataset into five categories based on survival data, and analyze each category separately using both negative and positive correlation networks constructed using a modified version of the WGCNA algorithm. Using this approach, we identified a number of genes as being important across different survival stages of the glioblastoma cell lines.
Collapse
|
7
|
Li W, Chen L, He W, Li W, Qu X, Liang B, Gao Q, Feng C, Jia X, Lv Y, Zhang S, Li X. Prioritizing disease candidate proteins in cardiomyopathy-specific protein-protein interaction networks based on "guilt by association" analysis. PLoS One 2013; 8:e71191. [PMID: 23940716 PMCID: PMC3733802 DOI: 10.1371/journal.pone.0071191] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Accepted: 06/28/2013] [Indexed: 01/12/2023] Open
Abstract
The cardiomyopathies are a group of heart muscle diseases which can be inherited (familial). Identifying potential disease-related proteins is important to understand mechanisms of cardiomyopathies. Experimental identification of cardiomyophthies is costly and labour-intensive. In contrast, bioinformatics approach has a competitive advantage over experimental method. Based on “guilt by association” analysis, we prioritized candidate proteins involving in human cardiomyopathies. We first built weighted human cardiomyopathy-specific protein-protein interaction networks for three subtypes of cardiomyopathies using the known disease proteins from Online Mendelian Inheritance in Man as seeds. We then developed a method in prioritizing disease candidate proteins to rank candidate proteins in the network based on “guilt by association” analysis. It was found that most candidate proteins with high scores shared disease-related pathways with disease seed proteins. These top ranked candidate proteins were related with the corresponding disease subtypes, and were potential disease-related proteins. Cross-validation and comparison with other methods indicated that our approach could be used for the identification of potentially novel disease proteins, which may provide insights into cardiomyopathy-related mechanisms in a more comprehensive and integrated way.
Collapse
Affiliation(s)
- Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
- * E-mail: (LC); (XL)
| | - Weiming He
- Institute of Opto-electronics, Harbin Institute of Technology, Harbin, Heilongjiang Province, China
| | - Weiguo Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Xiaoli Qu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Binhua Liang
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, Manitoba, Canada
| | - Qianping Gao
- Department of Cardiology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Chenchen Feng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Xu Jia
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Yana Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Siya Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
- * E-mail: (LC); (XL)
| |
Collapse
|
8
|
Wagner AH, Taylor KR, DeLuca AP, Casavant TL, Mullins RF, Stone EM, Scheetz TE, Braun TA. Prioritization of retinal disease genes: an integrative approach. Hum Mutat 2013; 34:853-9. [PMID: 23508994 DOI: 10.1002/humu.22317] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2012] [Accepted: 03/07/2013] [Indexed: 02/03/2023]
Abstract
The discovery of novel disease-associated variations in genes is often a daunting task in highly heterogeneous disease classes. We seek a generalizable algorithm that integrates multiple publicly available genomic data sources in a machine-learning model for the prioritization of candidates identified in patients with retinal disease. To approach this problem, we generate a set of feature vectors from publicly available microarray, RNA-seq, and ChIP-seq datasets of biological relevance to retinal disease, to observe patterns in gene expression specificity among tissues of the body and the eye, in addition to photoreceptor-specific signals by the CRX transcription factor. Using these features, we describe a novel algorithm, positive and unlabeled learning for prioritization (PULP). This article compares several popular supervised learning techniques as the regression function for PULP. The results demonstrate a highly significant enrichment for previously characterized disease genes using a logistic regression method. Finally, a comparison of PULP with the popular gene prioritization tool ENDEAVOUR shows superior prioritization of retinal disease genes from previous studies. The java source code, compiled binary, assembled feature vectors, and instructions are available online at https://github.com/ahwagner/PULP.
Collapse
Affiliation(s)
- Alex H Wagner
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa 52242, USA.
| | | | | | | | | | | | | | | |
Collapse
|
9
|
Zhang Y, Xia J, Zhang Y, Qin Y, Yang D, Qi L, Zhao W, Wang C, Guo Z. Pitfalls in experimental designs for characterizing the transcriptional, methylational and copy number changes of oncogenes and tumor suppressor genes. PLoS One 2013; 8:e58163. [PMID: 23472150 PMCID: PMC3589351 DOI: 10.1371/journal.pone.0058163] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Accepted: 02/03/2013] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND It is a common practice that researchers collect a set of samples without discriminating the mutants and their wild-type counterparts to characterize the transcriptional, methylational and/or copy number changes of pre-defined candidate oncogenes or tumor suppressor genes (TSGs), although some examples are known that carcinogenic mutants may express and function completely differently from their wild-type counterparts. PRINCIPAL FINDINGS Based on various high-throughput data without mutation information for typical cancer types, we surprisingly found that about half of known oncogenes (or TSGs) pre-defined by mutations were down-regulated (or up-regulated) and hypermethylated (or hypomethylated) in their corresponding cancer types. Therefore, the overall expression and/or methylation changes of genes detected in a set of samples without discriminating the mutants and their wild-type counterparts cannot indicate the carcinogenic roles of the mutants. We also found that about half of known oncogenes were located in deletion regions, whereas all known TSGs were located in deletion regions. Thus, both oncogenes and TSGs may be located in deletion regions and thus deletions can indicate TSGs only if the gene is found to be deleted as a whole. In contrast, amplifications are restricted to oncogenes and thus can be used to support either the dysregulated wild-type gene or its mutant as an oncogene. CONCLUSIONS We demonstrated that using the transcriptional, methylational and/or copy number changes without mutation information to characterize oncogenes and TSGs, which is a currently still widely adopted strategy, will most often produce misleading results. Our analysis highlights the importance of evaluating expression, methylation and copy number changes together with gene mutation data in the same set of samples in order to determine the distinct roles of the mutants and their wild-type counterparts.
Collapse
Affiliation(s)
- Yuannv Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jiguang Xia
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yujing Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yao Qin
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Da Yang
- Department of Pathology, University of Texas MD, Anderson Cancer Center, Houston, Texas, United States of America
| | - Lishuang Qi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Wenyuan Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chenguang Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Zheng Guo
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- School of Life Science and Bioinformatics Centre, University of Electronic Science and Technology of China, Chengdu, China
- * E-mail:
| |
Collapse
|
10
|
Systems genetics in "-omics" era: current and future development. Theory Biosci 2012; 132:1-16. [PMID: 23138757 DOI: 10.1007/s12064-012-0168-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2012] [Accepted: 10/25/2012] [Indexed: 02/06/2023]
Abstract
The systems genetics is an emerging discipline that integrates high-throughput expression profiling technology and systems biology approaches for revealing the molecular mechanism of complex traits, and will improve our understanding of gene functions in the biochemical pathway and genetic interactions between biological molecules. With the rapid advances of microarray analysis technologies, bioinformatics is extensively used in the studies of gene functions, SNP-SNP genetic interactions, LD block-block interactions, miRNA-mRNA interactions, DNA-protein interactions, protein-protein interactions, and functional mapping for LD blocks. Based on bioinformatics panel, which can integrate "-omics" datasets to extract systems knowledge and useful information for explaining the molecular mechanism of complex traits, systems genetics is all about to enhance our understanding of biological processes. Systems biology has provided systems level recognition of various biological phenomena, and constructed the scientific background for the development of systems genetics. In addition, the next-generation sequencing technology and post-genome wide association studies empower the discovery of new gene and rare variants. The integration of different strategies will help to propose novel hypothesis and perfect the theoretical framework of systems genetics, which will make contribution to the future development of systems genetics, and open up a whole new area of genetics.
Collapse
|
11
|
|
12
|
Xiao Y, Guan J, Ping Y, Xu C, Huang T, Zhao H, Fan H, Li Y, Lv Y, Zhao T, Dong Y, Ren H, Li X. Prioritizing cancer-related key miRNA-target interactions by integrative genomics. Nucleic Acids Res 2012; 40:7653-65. [PMID: 22705797 PMCID: PMC3439920 DOI: 10.1093/nar/gks538] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Accumulating evidence indicates that microRNAs (miRNAs) can function as oncogenes or tumor suppressor genes by controlling few key targets, which in turn contribute to the pathogenesis of cancer. The identification of cancer-related key miRNA-target interactions remains a challenge. We performed a systematic analysis of known cancer-related key interactions manually curated from published papers based on different aspects including sequence, expression and function. Known cancer-related key interactions show more miRNA binding sites (especially for 8mer binding sites), more reliable binding of miRNA to the target region, higher expression associations and broader functional coverage when compared to non-disease-related interactions. Through integrating these sequence, expression and function features, we proposed a bioinformatics approach termed PCmtI to prioritize cancer-related key interactions. Ten-fold cross-validation of our approach revealed that it can achieve an area under the receiver operating characteristic curve of 93.9%. Subsequent leave-one-miRNA-out cross-validation also demonstrated the performance of our approach. Using miR-155 as a case, we found that the top ranked interactions can account for most functions of miR-155. In addition, we further demonstrated the power of our approach by 23 recently identified cancer-related key interactions. The approach described here offers a new way for the discovery of novel cancer-related key miRNA-target interactions.
Collapse
Affiliation(s)
- Yun Xiao
- College of Bioinformatics Science and Technology, Department of Neurology, The Affiliated Hospital and Harbin Medical University, Harbin, Heilongjiang 150086, China
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Furney SJ, Gundem G, Lopez-Bigas N. Oncogenomics methods and resources. Cold Spring Harb Protoc 2012; 2012:2012/5/pdb.top069229. [PMID: 22550293 DOI: 10.1101/pdb.top069229] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Today, cancer is viewed as a genetic disease and many genetic mechanisms of oncogenesis are known. The progression from normal tissue to invasive cancer is thought to occur over a timescale of 5-20 years. This transformation is driven by both inherited genetic factors and somatic genetic alterations and mutations, and it results in uncontrolled cell growth and, in many cases, death. In this article, we review the main types of genomic and genetic alterations involved in cancer, namely copy-number changes, genomic rearrangements, somatic mutations, polymorphisms, and epigenomic alterations in cancer. We then discuss the transcriptomic consequences of these alterations in tumor cells. The use of "next-generation" sequencing methods in cancer research is described in the relevant sections. Finally, we discuss different approaches for candidate prioritization and integration and analysis of these complex data.
Collapse
|
14
|
Flachner B, Lörincz Z, Carotti A, Nicolotti O, Kuchipudi P, Remez N, Sanz F, Tóvári J, Szabó MJ, Bertók B, Cseh S, Mestres J, Dormán G. A chemocentric approach to the identification of cancer targets. PLoS One 2012; 7:e35582. [PMID: 22558171 PMCID: PMC3338416 DOI: 10.1371/journal.pone.0035582] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2011] [Accepted: 03/19/2012] [Indexed: 01/01/2023] Open
Abstract
A novel chemocentric approach to identifying cancer-relevant targets is introduced. Starting with a large chemical collection, the strategy uses the list of small molecule hits arising from a differential cytotoxicity screening on tumor HCT116 and normal MRC-5 cell lines to identify proteins associated with cancer emerging from a differential virtual target profiling of the most selective compounds detected in both cell lines. It is shown that this smart combination of differential in vitro and in silico screenings (DIVISS) is capable of detecting a list of proteins that are already well accepted cancer drug targets, while complementing it with additional proteins that, targeted selectively or in combination with others, could lead to synergistic benefits for cancer therapeutics. The complete list of 115 proteins identified as being hit uniquely by compounds showing selective antiproliferative effects for tumor cell lines is provided.
Collapse
|
15
|
Piro RM, Di Cunto F. Computational approaches to disease-gene prediction: rationale, classification and successes. FEBS J 2012; 279:678-96. [PMID: 22221742 DOI: 10.1111/j.1742-4658.2012.08471.x] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
The identification of genes involved in human hereditary diseases often requires the time-consuming and expensive examination of a great number of possible candidate genes, since genome-wide techniques such as linkage analysis and association studies frequently select many hundreds of 'positional' candidates. Even considering the positive impact of next-generation sequencing technologies, the prioritization of candidate genes may be an important step for disease-gene identification. In this paper we develop a basic classification scheme for computational approaches to disease-gene prediction and apply it to exhaustively review bioinformatics tools that have been developed for this purpose, focusing on conceptual aspects rather than technical detail and performance. Finally, we discuss some past successes obtained by computational approaches to illustrate their beneficial contribution to medical research.
Collapse
Affiliation(s)
- Rosario M Piro
- Department of Theoretical Bioinformatics, German Cancer Research Center, (DKFZ), Heidelberg, Germany.
| | | |
Collapse
|
16
|
|
17
|
Dickerson JE, Zhu A, Robertson DL, Hentges KE. Defining the role of essential genes in human disease. PLoS One 2011; 6:e27368. [PMID: 22096564 PMCID: PMC3214036 DOI: 10.1371/journal.pone.0027368] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2011] [Accepted: 10/15/2011] [Indexed: 12/31/2022] Open
Abstract
A greater understanding of the causes of human disease can come from identifying characteristics that are specific to disease genes. However, a full understanding of the contribution of essential genes to human disease is lacking, due to the premise that these genes tend to cause developmental abnormalities rather than adult disease. We tested the hypothesis that human orthologs of mouse essential genes are associated with a variety of human diseases, rather than only those related to miscarriage and birth defects. We segregated human disease genes according to whether the knockout phenotype of their mouse ortholog was lethal or viable, defining those with orthologs producing lethal knockouts as essential disease genes. We show that the human orthologs of mouse essential genes are associated with a wide spectrum of diseases affecting diverse physiological systems. Notably, human disease genes with essential mouse orthologs are over-represented among disease genes associated with cancer, suggesting links between adult cellular abnormalities and developmental functions. The proteins encoded by essential genes are highly connected in protein-protein interaction networks, which we find correlates with an over-representation of nuclear proteins amongst essential disease genes. Disease genes associated with essential orthologs also are more likely than those with non-essential orthologs to contribute to disease through an autosomal dominant inheritance pattern, suggesting that these diseases may actually result from semi-dominant mutant alleles. Overall, we have described attributes found in disease genes according to the essentiality status of their mouse orthologs. These findings demonstrate that disease genes do occupy highly connected positions in protein-protein interaction networks, and that due to the complexity of disease-associated alleles, essential genes cannot be ignored as candidates for causing diverse human diseases.
Collapse
Affiliation(s)
| | - Ana Zhu
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom
| | - David L. Robertson
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom
| | - Kathryn E. Hentges
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom
- * E-mail:
| |
Collapse
|
18
|
Differential expression pattern-based prioritization of candidate genes through integrating disease-specific expression data. Genomics 2011; 98:64-71. [DOI: 10.1016/j.ygeno.2011.04.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Revised: 03/11/2011] [Accepted: 04/01/2011] [Indexed: 01/30/2023]
|
19
|
Yue P, Forrest WF, Kaminker JS, Lohr S, Zhang Z, Cavet G. Inferring the functional effects of mutation through clusters of mutations in homologous proteins. Hum Mutat 2010; 31:264-71. [PMID: 20052764 DOI: 10.1002/humu.21194] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Inferring functional consequences is a bottleneck in high-throughput cancer mutation discovery and genetic association studies. Most polymorphisms and germline mutations are unlikely to have functionally significant consequences. Most cancer somatic mutations do not contribute to tumorigenesis and are not under selective pressure. Identifying and understanding functionally important mutations can clarify disease biology and lead to new therapeutic and diagnostic opportunities. We investigated the extent to which protein mutations with functional consequences are enriched in clusters at conserved positions across related proteins. We found that disease-causing mutations form clusters more than random mutations or single nucleotide polymorphisms, confirming that mutation hotspots occur at the domain level. In addition to helping to identify functionally significant mutations, analysis of clustered mutations can indicate the mechanism and consequences for protein function. Our analysis focused on somatic cancer mutations suggests functional impact for many, including singleton mutations in FGFR1, FGFR3, GFI1B, PIK3CG, RALB, RAP2B, and STK11. This provides evidence and generates mechanistic hypotheses for the contribution of such mutations to cancer. The same approach can be applied to mutations suspected of involvement in other diseases. An interactive Web application for browsing mutation clusters is available at http://www.mcluster.org.
Collapse
Affiliation(s)
- Peng Yue
- Department of Bioinformatics, Genentech Inc, South San Francisco, California 94080, USA.
| | | | | | | | | | | |
Collapse
|
20
|
[Identifying candidate cancer genes based on co-evolving gene modules]. YI CHUAN = HEREDITAS 2010; 32:694-700. [PMID: 20650850 DOI: 10.3724/sp.j.1005.2010.00694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Data of somatic mutation screening of cancer genomes have provided us huge amounts of information for identifying new cancer genes. Current methods for identifying candidate cancer genes based on gene mutation frequencies tend to find cancer genes with high mutation frequencies. However, many genes with low mutation frequencies might also play important roles during tumorigenesis. Based on the assumption that genes with similar phylogenetic profiles and protein-protein interactions might have similar functions and their disruptions might lead to similar disease phenotypes, we proposed a new approach to find candidate cancer genes. First, we searched for protein-protein interaction subnetworks within which proteins have similar phylogenetic profiles, termed as co-evolving gene modules. Then, we identified genes that have at least one non-synonymous mutation in cancer genomes and directly interact with known cancer genes in the same co-evolving gene modules and predicted them as candidate cancer genes. In this way, we found 15 candidate cancer genes, among which only two genes had been identified previously as candidate cancer genes using the methods based on gene mutation frequencies. Thus, the candidate cancer genes with low mutation frequencies can be found by our method.
Collapse
|
21
|
Tranchevent LC, Capdevila FB, Nitsch D, De Moor B, De Causmaecker P, Moreau Y. A guide to web tools to prioritize candidate genes. Brief Bioinform 2010; 12:22-32. [PMID: 21278374 DOI: 10.1093/bib/bbq007] [Citation(s) in RCA: 141] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
|
22
|
Gong X, Wu R, Zhang Y, Zhao W, Cheng L, Gu Y, Zhang L, Wang J, Zhu J, Guo Z. Extracting consistent knowledge from highly inconsistent cancer gene data sources. BMC Bioinformatics 2010; 11:76. [PMID: 20137077 PMCID: PMC2832783 DOI: 10.1186/1471-2105-11-76] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2009] [Accepted: 02/05/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Hundreds of genes that are causally implicated in oncogenesis have been found and collected in various databases. For efficient application of these abundant but diverse data sources, it is of fundamental importance to evaluate their consistency. RESULTS First, we showed that the lists of cancer genes from some major data sources were highly inconsistent in terms of overlapping genes. In particular, most cancer genes accumulated in previous small-scale studies could not be rediscovered in current high-throughput genome screening studies. Then, based on a metric proposed in this study, we showed that most cancer gene lists from different data sources were highly functionally consistent. Finally, we extracted functionally consistent cancer genes from various data sources and collected them in our database F-Census. CONCLUSIONS Although they have very low gene overlapping, most cancer gene data sources are highly consistent at the functional level, which indicates that they can separately capture partial genes in a few key pathways associated with cancer. Our results suggest that the sample sizes currently used for cancer studies might be inadequate for consistently capturing individual cancer genes, but could be sufficient for finding a number of cancer genes that could represent functionally most cancer genes. The F-Census database provides biologists with a useful tool for browsing and extracting functionally consistent cancer genes from various data sources.
Collapse
Affiliation(s)
- Xue Gong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150086, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Kann MG. Advances in translational bioinformatics: computational approaches for the hunting of disease genes. Brief Bioinform 2010; 11:96-110. [PMID: 20007728 PMCID: PMC2810112 DOI: 10.1093/bib/bbp048] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Revised: 09/15/2009] [Indexed: 12/29/2022] Open
Abstract
Over a 100 years ago, William Bateson provided, through his observations of the transmission of alkaptonuria in first cousin offspring, evidence of the application of Mendelian genetics to certain human traits and diseases. His work was corroborated by Archibald Garrod (Archibald AE. The incidence of alkaptonuria: a study in chemical individuality. Lancert 1902;ii:1616-20) and William Farabee (Farabee WC. Inheritance of digital malformations in man. In: Papers of the Peabody Museum of American Archaeology and Ethnology. Cambridge, Mass: Harvard University, 1905; 65-78), who recorded the familial tendencies of inheritance of malformations of human hands and feet. These were the pioneers of the hunt for disease genes that would continue through the century and result in the discovery of hundreds of genes that can be associated with different diseases. Despite many ground-breaking discoveries during the last century, we are far from having a complete understanding of the intricate network of molecular processes involved in diseases, and we are still searching for the cures for most complex diseases. In the last few years, new genome sequencing and other high-throughput experimental techniques have generated vast amounts of molecular and clinical data that contain crucial information with the potential of leading to the next major biomedical discoveries. The need to mine, visualize and integrate these data has motivated the development of several informatics approaches that can broadly be grouped in the research area of 'translational bioinformatics'. This review highlights the latest advances in the field of translational bioinformatics, focusing on the advances of computational techniques to search for and classify disease genes.
Collapse
Affiliation(s)
- Maricel G Kann
- University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA.
| |
Collapse
|
24
|
|
25
|
From cancer genomes to cancer models: bridging the gaps. EMBO Rep 2009; 10:359-66. [PMID: 19305388 DOI: 10.1038/embor.2009.46] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2008] [Accepted: 02/23/2009] [Indexed: 11/08/2022] Open
Abstract
Cancer genome projects are now being expanded in an attempt to provide complete landscapes of the mutations that exist in tumours. Although the importance of cataloguing genome variations is well recognized, there are obvious difficulties in bridging the gaps between high-throughput resequencing information and the molecular mechanisms of cancer evolution. Here, we describe the current status of the high-throughput genomic technologies, and the current limitations of the associated computational analysis and experimental validation of cancer genetic variants. We emphasize how the current cancer-evolution models will be influenced by the high-throughput approaches, in particular through efforts devoted to monitoring tumour progression, and how, in turn, the integration of data and models will be translated into mechanistic knowledge and clinical applications.
Collapse
|
26
|
Ali MA, Sjöblom T. Molecular pathways in tumor progression: from discovery to functional understanding. MOLECULAR BIOSYSTEMS 2009; 5:902-8. [DOI: 10.1039/b903502h] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|