51
|
Xu R, Li L, Wang Q. Towards building a disease-phenotype knowledge base: extracting disease-manifestation relationship from literature. Bioinformatics 2013; 29:2186-94. [PMID: 23828786 DOI: 10.1093/bioinformatics/btt359] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
MOTIVATION Systems approaches to studying phenotypic relationships among diseases are emerging as an active area of research for both novel disease gene discovery and drug repurposing. Currently, systematic study of disease phenotypic relationships on a phenome-wide scale is limited because large-scale machine-understandable disease-phenotype relationship knowledge bases are often unavailable. Here, we present an automatic approach to extract disease-manifestation (D-M) pairs (one specific type of disease-phenotype relationship) from the wide body of published biomedical literature. DATA AND METHODS Our method leverages external knowledge and limits the amount of human effort required. For the text corpus, we used 119 085 682 MEDLINE sentences (21 354 075 citations). First, we used D-M pairs from existing biomedical ontologies as prior knowledge to automatically discover D-M-specific syntactic patterns. We then extracted additional pairs from MEDLINE using the learned patterns. Finally, we analysed correlations between disease manifestations and disease-associated genes and drugs to demonstrate the potential of this newly created knowledge base in disease gene discovery and drug repurposing. RESULTS In total, we extracted 121 359 unique D-M pairs with a high precision of 0.924. Among the extracted pairs, 120 419 (99.2%) have not been captured in existing structured knowledge sources. We have shown that disease manifestations correlate positively with both disease-associated genes and drug treatments. CONCLUSIONS The main contribution of our study is the creation of a large-scale and accurate D-M phenotype relationship knowledge base. This unique knowledge base, when combined with existing phenotypic, genetic and proteomic datasets, can have profound implications in our deeper understanding of disease etiology and in rapid drug repurposing. AVAILABILITY http://nlp.case.edu/public/data/DMPatternUMLS/
Collapse
Affiliation(s)
- Rong Xu
- Medical Informatics Program, Center for Clinical Investigation, Case Western Reserve University, Cleveland, OH 44106, USA.
| | | | | |
Collapse
|
52
|
Hansen L, Tawamie H, Murakami Y, Mang Y, ur Rehman S, Buchert R, Schaffer S, Muhammad S, Bak M, Nöthen MM, Bennett EP, Maeda Y, Aigner M, Reis A, Kinoshita T, Tommerup N, Baig SM, Abou Jamra R. Hypomorphic mutations in PGAP2, encoding a GPI-anchor-remodeling protein, cause autosomal-recessive intellectual disability. Am J Hum Genet 2013; 92:575-83. [PMID: 23561846 DOI: 10.1016/j.ajhg.2013.03.008] [Citation(s) in RCA: 80] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2012] [Revised: 02/11/2013] [Accepted: 03/12/2013] [Indexed: 12/28/2022] Open
Abstract
PGAP2 encodes a protein involved in remodeling the glycosylphosphatidylinositol (GPI) anchor in the Golgi apparatus. After synthesis in the endoplasmic reticulum (ER), GPI anchors are transferred to the proteins and are remodeled while transported through the Golgi to the cell membrane. Germline mutations in six genes (PIGA, PIGL, PIGM, PIGV, PIGN, and PIGO) in the ER-located part of the GPI-anchor-biosynthesis pathway have been reported, and all are associated with phenotypes extending from malformation and lethality to severe intellectual disability, epilepsy, minor dysmorphisms, and elevated alkaline phosphatase (ALP). We performed autozygosity mapping and ultra-deep sequencing followed by stringent filtering and identified two homozygous PGAP2 alterations, p.Tyr99Cys and p.Arg177Pro, in seven offspring with nonspecific autosomal-recessive intellectual disability from two consanguineous families. Rescue experiments with the altered proteins in PGAP2-deficient Chinese hamster ovary cell lines showed less expression of cell-surface GPI-anchored proteins DAF and CD59 than of the wild-type protein, substantiating the pathogenicity of the identified alterations. Furthermore, we observed a full rescue when we used strong promoters before the mutant cDNAs, suggesting a hypomorphic effect of the mutations. We report on alterations in the Golgi-located part of the GPI-anchor-biosynthesis pathway and extend the phenotypic spectrum of the GPI-anchor deficiencies to isolated intellectual disability with elevated ALP. GPI-anchor deficiencies can be interpreted within the concept of a disease family, and we propose that the severity of the phenotype is dependent on the location of the altered protein in the biosynthesis chain.
Collapse
Affiliation(s)
- Lars Hansen
- Wilhelm Johannsen Centre for Functional Genome Research, The Panum Institute, University of Copenhagen, Blegdamsvej 3B, DK-2200 Copenhagen N, Denmark.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
53
|
Yang P, Li XL, Mei JP, Kwoh CK, Ng SK. Positive-unlabeled learning for disease gene identification. Bioinformatics 2012; 28:2640-7. [PMID: 22923290 PMCID: PMC3467748 DOI: 10.1093/bioinformatics/bts504] [Citation(s) in RCA: 99] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2012] [Revised: 07/24/2012] [Accepted: 08/06/2012] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Identifying disease genes from human genome is an important but challenging task in biomedical research. Machine learning methods can be applied to discover new disease genes based on the known ones. Existing machine learning methods typically use the known disease genes as the positive training set P and the unknown genes as the negative training set N (non-disease gene set does not exist) to build classifiers to identify new disease genes from the unknown genes. However, such kind of classifiers is actually built from a noisy negative set N as there can be unknown disease genes in N itself. As a result, the classifiers do not perform as well as they could be. RESULT Instead of treating the unknown genes as negative examples in N, we treat them as an unlabeled set U. We design a novel positive-unlabeled (PU) learning algorithm PUDI (PU learning for disease gene identification) to build a classifier using P and U. We first partition U into four sets, namely, reliable negative set RN, likely positive set LP, likely negative set LN and weak negative set WN. The weighted support vector machines are then used to build a multi-level classifier based on the four training sets and positive training set P to identify disease genes. Our experimental results demonstrate that our proposed PUDI algorithm outperformed the existing methods significantly. CONCLUSION The proposed PUDI algorithm is able to identify disease genes more accurately by treating the unknown data more appropriately as unlabeled set U instead of negative set N. Given that many machine learning problems in biomedical research do involve positive and unlabeled data instead of negative data, it is possible that the machine learning methods for these problems can be further improved by adopting PU learning methods, as we have done here for disease gene identification. AVAILABILITY AND IMPLEMENTATION The executable program and data are available at http://www1.i2r.a-star.edu.sg/~xlli/PUDI/PUDI.html.
Collapse
Affiliation(s)
- Peng Yang
- Bioinformatics Research Centre, School of Computer Engineering, Nanyang Technological University, Singapore.
| | | | | | | | | |
Collapse
|
54
|
|
55
|
Köhler S, Doelken SC, Rath A, Aymé S, Robinson PN. Ontological phenotype standards for neurogenetics. Hum Mutat 2012; 33:1333-9. [PMID: 22573485 DOI: 10.1002/humu.22112] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2012] [Accepted: 04/13/2012] [Indexed: 12/22/2022]
Abstract
Neurological disorders comprise one of the largest groups of human diseases. Due to the myriad symptoms and the extreme degree of clinical variability characteristic of many neurological diseases, the differential diagnosis process is extremely challenging. Even though most neurogenetic diseases are individually rare, collectively, the subgroup of neurogenetic disorders is large, comprising more than 2,400 different disorders. Recently, increasing efforts have been undertaken to unravel the molecular basis of neurogenetic diseases and to correlate pathogenetic mechanisms with clinical signs and symptoms. In order to enable computer-based analyses, the systematic representation of the neurological phenotype is of major importance. We demonstrate how the Human Phenotype Ontology (HPO) can be incorporated into these efforts by providing a systematic semantic representation of phenotypic abnormalities encountered in human genetic diseases. The combination of the HPO together with the Orphanet disease classification represents a promising resource for automated disease classification, performing computational clustering and analysis of the neurogenetic phenome. Furthermore, standardized representations of neurologic phenotypic abnormalities employing the HPO link neurological phenotypic abnormalities to anatomical and functional entities represented in other biomedical ontologies through the semantic references provided by the HPO.
Collapse
Affiliation(s)
- Sebastian Köhler
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | | | | | | | | |
Collapse
|
56
|
Jaeger S, Aloy P. From protein interaction networks to novel therapeutic strategies. IUBMB Life 2012; 64:529-37. [DOI: 10.1002/iub.1040] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2012] [Accepted: 03/14/2012] [Indexed: 01/18/2023]
|
57
|
Lai YH, Li ZC, Chen LL, Dai Z, Zou XY. Identification of potential host proteins for influenza A virus based on topological and biological characteristics by proteome-wide network approach. J Proteomics 2012; 75:2500-13. [DOI: 10.1016/j.jprot.2012.02.034] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2011] [Revised: 02/21/2012] [Accepted: 02/26/2012] [Indexed: 12/31/2022]
|
58
|
Piro RM, Di Cunto F. Computational approaches to disease-gene prediction: rationale, classification and successes. FEBS J 2012; 279:678-96. [PMID: 22221742 DOI: 10.1111/j.1742-4658.2012.08471.x] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
The identification of genes involved in human hereditary diseases often requires the time-consuming and expensive examination of a great number of possible candidate genes, since genome-wide techniques such as linkage analysis and association studies frequently select many hundreds of 'positional' candidates. Even considering the positive impact of next-generation sequencing technologies, the prioritization of candidate genes may be an important step for disease-gene identification. In this paper we develop a basic classification scheme for computational approaches to disease-gene prediction and apply it to exhaustively review bioinformatics tools that have been developed for this purpose, focusing on conceptual aspects rather than technical detail and performance. Finally, we discuss some past successes obtained by computational approaches to illustrate their beneficial contribution to medical research.
Collapse
Affiliation(s)
- Rosario M Piro
- Department of Theoretical Bioinformatics, German Cancer Research Center, (DKFZ), Heidelberg, Germany.
| | | |
Collapse
|
59
|
Abstract
There now exist multiple lines of evidence pointing to a significant genetic component underlying the aetiology of autism spectrum disorders (ASDs). The advent of methodologies for scanning the human genome at high resolution, coupled with the recognition of copy number variation (CNV) as a prevalent source of genomic variation, has led to new strategies in the identification of clinically relevant loci. Balanced genomic changes, such as translocations and inversions, also contribute to ASD, but current studies have shown that screening with microarrays has up to fivefold increase in diagnostic yield. Recent work by our group and others has shown unbalanced genomic alterations that are likely pathogenic in upwards of 10% of cases, highlighting an important role for CNVs in the genetic aetiology of ASD. A trend in our empirical data has shifted focus for discovery of candidate loci towards individually rare but highly penetrant CNVs instead of looking for common variants of low penetrance. This strategy has proven largely successful in identifying ASD-susceptibility candidate loci, including gains and losses at 16p11.2, SHANK2, NRXN1, and PTCHD1. Another emerging and intriguing trend is the identification of the same genes implicated by rare CNVs across neurodevelopmental disorders, including schizophrenia, attention deficit hyperactivity disorder, and intellectual disability. These observations indicate that similar pathways may be involved in phenotypically distinct outcomes. Although interrogation of the genome at high resolution has led to these novel discoveries, it has also made cataloguing, characterization, and clinical interpretation of the increasing amount of CNV data difficult. Herein, we describe the history of genomic structural variation in ASD and how CNV discovery has been used to pinpoint novel ASD-susceptibility loci. We also discuss the overlap of CNVs across neurodevelopmental disorders and comment on the current challenges of understanding the relationship between CNVs and associated phenotypes in a clinical context.
Collapse
|
60
|
Erten S, Bebek G, Koyutürk M. Vavien: an algorithm for prioritizing candidate disease genes based on topological similarity of proteins in interaction networks. J Comput Biol 2011; 18:1561-74. [PMID: 22035267 PMCID: PMC3216100 DOI: 10.1089/cmb.2011.0154] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Genome-wide linkage and association studies have demonstrated promise in identifying genetic factors that influence health and disease. An important challenge is to narrow down the set of candidate genes that are implicated by these analyses. Protein-protein interaction (PPI) networks are useful in extracting the functional relationships between known disease and candidate genes, based on the principle that products of genes implicated in similar diseases are likely to exhibit significant connectivity/proximity. Information flow?based methods are shown to be very effective in prioritizing candidate disease genes. In this article, we utilize the topology of PPI networks to infer functional information in the context of disease association. Our approach is based on the assumption that PPI networks are organized into recurrent schemes that underlie the mechanisms of cooperation among different proteins. We hypothesize that proteins associated with similar diseases would exhibit similar topological characteristics in PPI networks. Utilizing the location of a protein in the network with respect to other proteins (i.e., the "topological profile" of the proteins), we develop a novel measure to assess the topological similarity of proteins in a PPI network. We then use this measure to prioritize candidate disease genes based on the topological similarity of their products and the products of known disease genes. We test the resulting algorithm, Vavien, via systematic experimental studies using an integrated human PPI network and the Online Mendelian Inheritance in Man (OMIM) database. Vavien outperforms other network-based prioritization algorithms as shown in the results and is available at www.diseasegenes.org.
Collapse
Affiliation(s)
- Sinan Erten
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, Ohio 44106, USA.
| | | | | |
Collapse
|
61
|
Inferring gene-phenotype associations via global protein complex network propagation. PLoS One 2011; 6:e21502. [PMID: 21799737 PMCID: PMC3143124 DOI: 10.1371/journal.pone.0021502] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2011] [Accepted: 05/30/2011] [Indexed: 12/05/2022] Open
Abstract
Background Phenotypically similar diseases have been found to be caused by functionally related genes, suggesting a modular organization of the genetic landscape of human diseases that mirrors the modularity observed in biological interaction networks. Protein complexes, as molecular machines that integrate multiple gene products to perform biological functions, express the underlying modular organization of protein-protein interaction networks. As such, protein complexes can be useful for interrogating the networks of phenome and interactome to elucidate gene-phenotype associations of diseases. Methodology/Principal Findings We proposed a technique called RWPCN (Random Walker on Protein Complex Network) for predicting and prioritizing disease genes. The basis of RWPCN is a protein complex network constructed using existing human protein complexes and protein interaction network. To prioritize candidate disease genes for the query disease phenotypes, we compute the associations between the protein complexes and the query phenotypes in their respective protein complex and phenotype networks. We tested RWPCN on predicting gene-phenotype associations using leave-one-out cross-validation; our method was observed to outperform existing approaches. We also applied RWPCN to predict novel disease genes for two representative diseases, namely, Breast Cancer and Diabetes. Conclusions/Significance Guilt-by-association prediction and prioritization of disease genes can be enhanced by fully exploiting the underlying modular organizations of both the disease phenome and the protein interactome. Our RWPCN uses a novel protein complex network as a basis for interrogating the human phenome-interactome network. As the protein complex network can capture the underlying modularity in the biological interaction networks better than simple protein interaction networks, RWPCN was found to be able to detect and prioritize disease genes better than traditional approaches that used only protein-phenotype associations.
Collapse
|
62
|
Wang X, Gulbahce N, Yu H. Network-based methods for human disease gene prediction. Brief Funct Genomics 2011; 10:280-93. [PMID: 21764832 DOI: 10.1093/bfgp/elr024] [Citation(s) in RCA: 144] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Despite the considerable progress in disease gene discovery, we are far from uncovering the underlying cellular mechanisms of diseases since complex traits, even many Mendelian diseases, cannot be explained by simple genotype-phenotype relationships. More recently, an increasingly accepted view is that human diseases result from perturbations of cellular systems, especially molecular networks. Genes associated with the same or similar diseases commonly reside in the same neighborhood of molecular networks. Such observations have built the basis for a large collection of computational approaches to find previously unknown genes associated with certain diseases. The majority of the methods are based on protein interactome networks, with integration of other large-scale genomic data or disease phenotype information, to infer how likely it is that a gene is associated with a disease. Here, we review recent, state of the art, network-based methods used for prioritizing disease genes as well as unraveling the molecular basis of human diseases.
Collapse
Affiliation(s)
- Xiujuan Wang
- Department of Biological Statistics and Computational Biology and Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14850, USA
| | | | | |
Collapse
|
63
|
Erten S, Bebek G, Ewing RM, Koyutürk M. DADA: Degree-Aware Algorithms for Network-Based Disease Gene Prioritization. BioData Min 2011; 4:19. [PMID: 21699738 PMCID: PMC3143097 DOI: 10.1186/1756-0381-4-19] [Citation(s) in RCA: 103] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2010] [Accepted: 06/24/2011] [Indexed: 11/25/2022] Open
Abstract
Background High-throughput molecular interaction data have been used effectively to prioritize candidate genes that are linked to a disease, based on the observation that the products of genes associated with similar diseases are likely to interact with each other heavily in a network of protein-protein interactions (PPIs). An important challenge for these applications, however, is the incomplete and noisy nature of PPI data. Information flow based methods alleviate these problems to a certain extent, by considering indirect interactions and multiplicity of paths. Results We demonstrate that existing methods are likely to favor highly connected genes, making prioritization sensitive to the skewed degree distribution of PPI networks, as well as ascertainment bias in available interaction and disease association data. Motivated by this observation, we propose several statistical adjustment methods to account for the degree distribution of known disease and candidate genes, using a PPI network with associated confidence scores for interactions. We show that the proposed methods can detect loosely connected disease genes that are missed by existing approaches, however, this improvement might come at the price of more false negatives for highly connected genes. Consequently, we develop a suite called DADA, which includes different uniform prioritization methods that effectively integrate existing approaches with the proposed statistical adjustment strategies. Comprehensive experimental results on the Online Mendelian Inheritance in Man (OMIM) database show that DADA outperforms existing methods in prioritizing candidate disease genes. Conclusions These results demonstrate the importance of employing accurate statistical models and associated adjustment methods in network-based disease gene prioritization, as well as other network-based functional inference applications. DADA is implemented in Matlab and is freely available at http://compbio.case.edu/dada/.
Collapse
Affiliation(s)
- Sinan Erten
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH, USA.
| | | | | | | |
Collapse
|
64
|
Gupta M, Cheung CL, Hsu YH, Demissie S, Cupples LA, Kiel DP, Karasik D. Identification of homogeneous genetic architecture of multiple genetically correlated traits by block clustering of genome-wide associations. J Bone Miner Res 2011; 26:1261-71. [PMID: 21611967 PMCID: PMC3312758 DOI: 10.1002/jbmr.333] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Genome-wide association studies (GWAS) using high-density genotyping platforms offer an unbiased strategy to identify new candidate genes for osteoporosis. It is imperative to be able to clearly distinguish signal from noise by focusing on the best phenotype in a genetic study. We performed GWAS of multiple phenotypes associated with fractures [bone mineral density (BMD), bone quantitative ultrasound (QUS), bone geometry, and muscle mass] with approximately 433,000 single-nucleotide polymorphisms (SNPs) and created a database of resulting associations. We performed analysis of GWAS data from 23 phenotypes by a novel modification of a block clustering algorithm followed by gene-set enrichment analysis. A data matrix of standardized regression coefficients was partitioned along both axes--SNPs and phenotypes. Each partition represents a distinct cluster of SNPs that have similar effects over a particular set of phenotypes. Application of this method to our data shows several SNP-phenotype connections. We found a strong cluster of association coefficients of high magnitude for 10 traits (BMD at several skeletal sites, ultrasound measures, cross-sectional bone area, and section modulus of femoral neck and shaft). These clustered traits were highly genetically correlated. Gene-set enrichment analyses indicated the augmentation of genes that cluster with the 10 osteoporosis-related traits in pathways such as aldosterone signaling in epithelial cells, role of osteoblasts, osteoclasts, and chondrocytes in rheumatoid arthritis, and Parkinson signaling. In addition to several known candidate genes, we also identified PRKCH and SCNN1B as potential candidate genes for multiple bone traits. In conclusion, our mining of GWAS results revealed the similarity of association results between bone strength phenotypes that may be attributed to pleiotropic effects of genes. This knowledge may prove helpful in identifying novel genes and pathways that underlie several correlated phenotypes, as well as in deciphering genetic and phenotypic modularity underlying osteoporosis risk.
Collapse
Affiliation(s)
- Mayetri Gupta
- Department of Biostatistics, Boston University, Boston, MA, USA
| | | | | | | | | | | | | |
Collapse
|
65
|
Yao X, Hao H, Li Y, Li S. Modularity-based credible prediction of disease genes and detection of disease subtypes on the phenotype-gene heterogeneous network. BMC SYSTEMS BIOLOGY 2011; 5:79. [PMID: 21599985 PMCID: PMC3130676 DOI: 10.1186/1752-0509-5-79] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2011] [Accepted: 05/20/2011] [Indexed: 12/05/2022]
Abstract
Background Protein-protein interaction networks and phenotype similarity information have been synthesized together to discover novel disease-causing genes. Genetic or phenotypic similarities are manifested as certain modularity properties in a phenotype-gene heterogeneous network consisting of the phenotype-phenotype similarity network, protein-protein interaction network and gene-disease association network. However, the quantitative analysis of modularity in the heterogeneous network and its influence on disease-gene discovery are still unaddressed. Furthermore, the genetic correspondence of the disease subtypes can be identified by marking the genes and phenotypes in the phenotype-gene network. We present a novel network inference method to measure the network modularity, and in particular to suggest the subtypes of diseases based on the heterogeneous network. Results Based on a measure which is introduced to evaluate the closeness between two nodes in the phenotype-gene heterogeneous network, we developed a Hitting-Time-based method, CIPHER-HIT, for assessing the modularity of disease gene predictions and credibly prioritizing disease-causing genes, and then identifying the genetic modules corresponding to potential subtypes of the queried phenotype. The CIPHER-HIT is free to rely on any preset parameters. We found that when taking into account the modularity levels, the CIPHER-HIT method can significantly improve the performance of disease gene predictions, which demonstrates modularity is one of the key features for credible inference of disease genes on the phenotype-gene heterogeneous network. By applying the CIPHER-HIT to the subtype analysis of Breast cancer, we found that the prioritized genes can be divided into two sub-modules, one contains the members of the Fanconi anemia gene family, and the other contains a reported protein complex MRE11/RAD50/NBN. Conclusions The phenotype-gene heterogeneous network contains abundant information for not only disease genes discovery but also disease subtypes detection. The CIPHER-HIT method presented here is effective for network inference, particularly on credible prediction of disease genes and the subtype analysis of diseases, for example Breast cancer. This method provides a promising way to analyze heterogeneous biological networks, both globally and locally.
Collapse
Affiliation(s)
- Xin Yao
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China
| | | | | | | |
Collapse
|
66
|
The pleiotropic structure of the genotype-phenotype map: the evolvability of complex organisms. Nat Rev Genet 2011; 12:204-13. [PMID: 21331091 DOI: 10.1038/nrg2949] [Citation(s) in RCA: 419] [Impact Index Per Article: 32.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
It was first noticed 100 years ago that mutations tend to affect more than one phenotypic characteristic, a phenomenon that was called 'pleiotropy'. Because pleiotropy was found so frequently, the notion arose that pleiotropy is 'universal'. However, quantitative estimates of pleiotropy have not been available until recently. These estimates show that pleiotropy is highly restricted and are more in line with the notion of variational modularity than with universal pleiotropy. This finding has major implications for the evolvability of complex organisms and the mapping of disease-causing mutations.
Collapse
|
67
|
Zhang X, Zhang R, Jiang Y, Sun P, Tang G, Wang X, Lv H, Li X. The expanded human disease network combining protein-protein interaction information. Eur J Hum Genet 2011; 19:783-8. [PMID: 21386875 DOI: 10.1038/ejhg.2011.30] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The human disease network (HDN) has become a powerful tool for revealing disease-disease associations. Some studies have shown that genes that share similar or same disease phenotypes tend to encode proteins that interact with each other. Therefore, protein-protein interactions (PPIs) may help us to further understand the relationships between diseases with overlapping clinical phenotypes. In this study, we constructed the expanded HDN (eHDN) by combining disease gene information with PPI information, and analyzed its topological features and functional properties. We found that the network is hierarchical and, most diseases are connected to only a few diseases, whereas a small part of diseases are linked to many different diseases. Diseases in a specific disease class tend to cluster together, and genes associated with the same disease are functionally related. Comparing the eHDN with the original HDN (oHDN, constructed using disease gene information) revealed high consistency over all topological and functional properties. This, to some extent, indicates that our eHDN is reliable. In the eHDN, we found some new associations among diseases resulting from the shared genes interacting with disease genes. The new eHDN will provide a valuable reference for clinicians and medical researchers.
Collapse
Affiliation(s)
- Xuehong Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | | | | | | | | | | | | | | |
Collapse
|
68
|
Zhang W, Sun F, Jiang R. Integrating multiple protein-protein interaction networks to prioritize disease genes: a Bayesian regression approach. BMC Bioinformatics 2011; 12 Suppl 1:S11. [PMID: 21342540 PMCID: PMC3044265 DOI: 10.1186/1471-2105-12-s1-s11] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Background The identification of genes responsible for human inherited diseases is one of the most challenging tasks in human genetics. Recent studies based on phenotype similarity and gene proximity have demonstrated great success in prioritizing candidate genes for human diseases. However, most of these methods rely on a single protein-protein interaction (PPI) network to calculate similarities between genes, and thus greatly restrict the scope of application of such methods. Meanwhile, independently constructed and maintained PPI networks are usually quite diverse in coverage and quality, making the selection of a suitable PPI network inevitable but difficult. Methods We adopt a linear model to explain similarities between disease phenotypes using gene proximities that are quantified by diffusion kernels of one or more PPI networks. We solve this model via a Bayesian approach, and we derive an analytic form for Bayes factor that naturally measures the strength of association between a query disease and a candidate gene and thus can be used as a score to prioritize candidate genes. This method is intrinsically capable of integrating multiple PPI networks. Results We show that gene proximities calculated from PPI networks imply phenotype similarities. We demonstrate the effectiveness of the Bayesian regression approach on five PPI networks via large scale leave-one-out cross-validation experiments and summarize the results in terms of the mean rank ratio of known disease genes and the area under the receiver operating characteristic curve (AUC). We further show the capability of our approach in integrating multiple PPI networks. Conclusions The Bayesian regression approach can achieve much higher performance than the existing CIPHER approach and the ordinary linear regression method. The integration of multiple PPI networks can greatly improve the scope of application of the proposed method in the inference of disease genes.
Collapse
Affiliation(s)
- Wangshu Zhang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 10084, China
| | | | | |
Collapse
|
69
|
Rossin EJ, Lage K, Raychaudhuri S, Xavier RJ, Tatar D, Benita Y, Cotsapas C, Daly MJ. Proteins encoded in genomic regions associated with immune-mediated disease physically interact and suggest underlying biology. PLoS Genet 2011; 7:e1001273. [PMID: 21249183 PMCID: PMC3020935 DOI: 10.1371/journal.pgen.1001273] [Citation(s) in RCA: 407] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2010] [Accepted: 12/09/2010] [Indexed: 12/14/2022] Open
Abstract
Genome-wide association studies (GWAS) have defined over 150 genomic regions unequivocally containing variation predisposing to immune-mediated disease. Inferring disease biology from these observations, however, hinges on our ability to discover the molecular processes being perturbed by these risk variants. It has previously been observed that different genes harboring causal mutations for the same Mendelian disease often physically interact. We sought to evaluate the degree to which this is true of genes within strongly associated loci in complex disease. Using sets of loci defined in rheumatoid arthritis (RA) and Crohn's disease (CD) GWAS, we build protein-protein interaction (PPI) networks for genes within associated loci and find abundant physical interactions between protein products of associated genes. We apply multiple permutation approaches to show that these networks are more densely connected than chance expectation. To confirm biological relevance, we show that the components of the networks tend to be expressed in similar tissues relevant to the phenotypes in question, suggesting the network indicates common underlying processes perturbed by risk loci. Furthermore, we show that the RA and CD networks have predictive power by demonstrating that proteins in these networks, not encoded in the confirmed list of disease associated loci, are significantly enriched for association to the phenotypes in question in extended GWAS analysis. Finally, we test our method in 3 non-immune traits to assess its applicability to complex traits in general. We find that genes in loci associated to height and lipid levels assemble into significantly connected networks but did not detect excess connectivity among Type 2 Diabetes (T2D) loci beyond chance. Taken together, our results constitute evidence that, for many of the complex diseases studied here, common genetic associations implicate regions encoding proteins that physically interact in a preferential manner, in line with observations in Mendelian disease.
Collapse
Affiliation(s)
- Elizabeth J. Rossin
- Center for Human Genetics Research and Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, The Broad Institute, Cambridge, Massachusetts, United States of America
- Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
- Health Science and Technology MD Program, Harvard University and Massachusetts Institute of Technology, Boston, Massachusetts, United States of America
- Harvard Biological and Biomedical Sciences Program, Harvard University, Boston, Massachusetts, United States of America
| | - Kasper Lage
- Program in Medical and Population Genetics, The Broad Institute, Cambridge, Massachusetts, United States of America
- Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
- Pediatric Surgical Research Laboratories, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Soumya Raychaudhuri
- Center for Human Genetics Research and Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, The Broad Institute, Cambridge, Massachusetts, United States of America
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Ramnik J. Xavier
- Center for Human Genetics Research and Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, The Broad Institute, Cambridge, Massachusetts, United States of America
- Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Diana Tatar
- Pediatric Surgical Research Laboratories, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Yair Benita
- Center for Human Genetics Research and Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | | | - Chris Cotsapas
- Center for Human Genetics Research and Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, The Broad Institute, Cambridge, Massachusetts, United States of America
| | - Mark J. Daly
- Center for Human Genetics Research and Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, The Broad Institute, Cambridge, Massachusetts, United States of America
- Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
- Health Science and Technology MD Program, Harvard University and Massachusetts Institute of Technology, Boston, Massachusetts, United States of America
- Harvard Biological and Biomedical Sciences Program, Harvard University, Boston, Massachusetts, United States of America
| |
Collapse
|
70
|
Li S, Zhang B, Jiang D, Wei Y, Zhang N. Herb network construction and co-module analysis for uncovering the combination rule of traditional Chinese herbal formulae. BMC Bioinformatics 2010; 11 Suppl 11:S6. [PMID: 21172056 PMCID: PMC3024874 DOI: 10.1186/1471-2105-11-s11-s6] [Citation(s) in RCA: 175] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background Traditional Chinese Medicine (TCM) is characterized by the wide use of herbal formulae, which are capable of systematically treating diseases determined by interactions among various herbs. However, the combination rule of TCM herbal formulae remains a mystery due to the lack of appropriate methods. Methods From a network perspective, we established a method called Distance-based Mutual Information Model (DMIM) to identify useful relationships among herbs in numerous herbal formulae. DMIM combines mutual information entropy and “between-herb-distance” to score herb interactions and construct herb network. To evaluate the efficacy of the DMIM-extracted herb network, we conducted in vitro assays to measure the activities of strongly connected herbs and herb pairs. Moreover, using the networked Liu-wei-di-huang (LWDH) formula as an example, we proposed a novel concept of “co-module” across herb-biomolecule-disease multilayer networks to explore the potential combination mechanism of herbal formulae. Results DMIM, when used for retrieving herb pairs, achieves a good balance among the herb’s frequency, independence, and distance in herbal formulae. A herb network constructed by DMIM from 3865 Collaterals-related herbal formulae can not only nicely recover traditionally-defined herb pairs and formulae, but also generate novel anti-angiogenic herb ingredients (e.g. Vitexicarpin with IC50=3.2 μM, and Timosaponin A-III with IC50=3.4 μM) as well as herb pairs with synergistic or antagonistic effects. Based on gene and phenotype information associated with both LWDH herbs and LWDH-treated diseases, we found that LWDH-treated diseases show high phenotype similarity and identified certain “co-modules” enriched in cancer pathways and neuro-endocrine-immune pathways, which may be responsible for the action of treating different diseases by the same LWDH formula. Conclusions DMIM is a powerful method to identify the combination rule of herbal formulae and lead to new discoveries. We also provide the first evidence that the co-module across multilayer networks may underlie the combination mechanism of herbal formulae and demonstrate the potential of network biology approaches in the studies of TCM.
Collapse
Affiliation(s)
- Shao Li
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST / Department of Automation, Tsinghua University, Beijing, China.
| | | | | | | | | |
Collapse
|
71
|
Li YH, Dong MQ, Guo Z. Systematic analysis and prediction of longevity genes in Caenorhabditis elegans. Mech Ageing Dev 2010; 131:700-9. [DOI: 10.1016/j.mad.2010.10.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2010] [Revised: 09/14/2010] [Accepted: 10/01/2010] [Indexed: 10/19/2022]
|
72
|
Abstract
Pleiotropy refers to the phenomenon of a single mutation or gene affecting multiple distinct phenotypic traits and has broad implications in many areas of biology. Due to its central importance, pleiotropy has also been extensively modeled, albeit with virtually no empirical basis. Analyzing phenotypes of large numbers of yeast, nematode, and mouse mutants, we here describe the genomic patterns of pleiotropy. We show that the fraction of traits altered appreciably by the deletion of a gene is minute for most genes and the gene-trait relationship is highly modular. The standardized size of the phenotypic effect of a gene on a trait is approximately normally distributed with variable SDs for different genes, which gives rise to the surprising observation of a larger per-trait effect for genes affecting more traits. This scaling property counteracts the pleiotropy-associated reduction in adaptation rate (i.e., the "cost of complexity") in a nonlinear fashion, resulting in the highest adaptation rate for organisms of intermediate complexity rather than low complexity. Intriguingly, the observed scaling exponent falls in a narrow range that maximizes the optimal complexity. Together, the genome-wide observations of overall low pleiotropy, high modularity, and larger per-trait effects from genes of higher pleiotropy necessitate major revisions of theoretical models of pleiotropy and suggest that pleiotropy has not only allowed but also promoted the evolution of complexity.
Collapse
|
73
|
Stegmaier P, Krull M, Voss N, Kel AE, Wingender E. Molecular mechanistic associations of human diseases. BMC SYSTEMS BIOLOGY 2010; 4:124. [PMID: 20815942 PMCID: PMC2946303 DOI: 10.1186/1752-0509-4-124] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2009] [Accepted: 09/06/2010] [Indexed: 01/05/2023]
Abstract
Background The study of relationships between human diseases provides new possibilities for biomedical research. Recent achievements on human genetic diseases have stimulated interest to derive methods to identify disease associations in order to gain further insight into the network of human diseases and to predict disease genes. Results Using about 10000 manually collected causal disease/gene associations, we developed a statistical approach to infer meaningful associations between human morbidities. The derived method clustered cardiometabolic and endocrine disorders, immune system-related diseases, solid tissue neoplasms and neurodegenerative pathologies into prominent disease groups. Analysis of biological functions confirmed characteristic features of corresponding disease clusters. Inference of disease associations was further employed as a starting point for prediction of disease genes. Efforts were made to underpin the validity of results by relevant literature evidence. Interestingly, many inferred disease relationships correspond to known clinical associations and comorbidities, and several predicted disease genes were subjects of therapeutic target research. Conclusions Causal molecular mechanisms present a unifying principle to derive methods for disease classification, analysis of clinical disorder associations, and prediction of disease genes. According to the definition of causal disease genes applied in this study, these results are not restricted to genetic disease/gene relationships. This may be particularly useful for the study of long-term or chronic illnesses, where pathological derangement due to environmental or as part of sequel conditions is of importance and may not be fully explained by genetic background.
Collapse
Affiliation(s)
- Philip Stegmaier
- BIOBASE GmbH, Halchtersche Strasse 33, D-38304 Wolfenbüttel, Germany.
| | | | | | | | | |
Collapse
|
74
|
Chen X, Yan GY, Liao XP. A Novel Candidate Disease Genes Prioritization Method Based on Module Partition and Rank Fusion. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2010; 14:337-56. [DOI: 10.1089/omi.2009.0143] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Xing Chen
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- Graduate University of Chinese Academy of Sciences, Beijing 100190, People's Republic of China
| | - Gui-Ying Yan
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
| | - Xiao-Ping Liao
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- Graduate University of Chinese Academy of Sciences, Beijing 100190, People's Republic of China
| |
Collapse
|
75
|
Wang J, Zhou X, Zhu J, Zhou C, Guo Z. Revealing and avoiding bias in semantic similarity scores for protein pairs. BMC Bioinformatics 2010; 11:290. [PMID: 20509916 PMCID: PMC2903568 DOI: 10.1186/1471-2105-11-290] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2010] [Accepted: 05/28/2010] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Semantic similarity scores for protein pairs are widely applied in functional genomic researches for finding functional clusters of proteins, predicting protein functions and protein-protein interactions, and for identifying putative disease genes. However, because some proteins, such as those related to diseases, tend to be studied more intensively, annotations are likely to be biased, which may affect applications based on semantic similarity measures. Thus, it is necessary to evaluate the effects of the bias on semantic similarity scores between proteins and then find a method to avoid them. RESULTS First, we evaluated 14 commonly used semantic similarity scores for protein pairs and demonstrated that they significantly correlated with the numbers of annotation terms for the proteins (also known as the protein annotation length). These results suggested that current applications of the semantic similarity scores between proteins might be unreliable. Then, to reduce this annotation bias effect, we proposed normalizing the semantic similarity scores between proteins using the power transformation of the scores. We provide evidence that this improves performance in some applications. CONCLUSIONS Current semantic similarity measures for protein pairs are highly dependent on protein annotation lengths, which are subject to biological research bias. This affects applications that are based on these semantic similarity scores, especially in clustering studies that rely on score magnitudes. The normalized scores proposed in this paper can reduce the effects of this bias to some extent.
Collapse
Affiliation(s)
- Jing Wang
- Bioinformatics Centre, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Xianxiao Zhou
- Bioinformatics Centre, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Jing Zhu
- Bioinformatics Centre, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Chenggui Zhou
- Bioinformatics Centre, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Zheng Guo
- Bioinformatics Centre, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150086, China
| |
Collapse
|
76
|
Doelken S, Köhler S, Bauer S, Ott CE, Krawitz P, Horn D, Mundlos S, Robinson P. Neue Wege in der bioinformatischen Phänotypanalyse. MED GENET-BERLIN 2010. [DOI: 10.1007/s11825-010-0215-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Zusammenfassung
Die präzise Beschreibung phänotypischer Auffälligkeiten ist für die klinische Diagnostik und für unser wissenschaftliches Verständnis von Erkrankungen von grundlegender Bedeutung. Derzeit sind mehrere tausend hereditäre Erkrankungen des Menschen bekannt, die jeweils durch eine mehr oder weniger spezifische Kombination phänotypischer Merkmale charakterisiert sind. Eine besondere Schwierigkeit bei der computergestützten Analyse phänotypischer Daten ergab sich bislang durch das Fehlen eines standardisierten medizinischen Vokabulars und den Mangel an adäquaten Datenstrukturen zur Erfassung phänotypischer Merkmale.
Die Human Phenotype Ontology (HPO) wurde von unserer Arbeitsgruppe mit dem Ziel entwickelt, alle phänotypischen Auffälligkeiten, die bei monogenen Erkrankungen des Menschen auftreten können, zu beschreiben (http://www.human-phenotype-ontology.org). Die HPO stellt ein hierarchisch strukturiertes, deskriptives und standardisiertes Vokabular zur Beschreibung phänotypischer Merkmale bereit und ist somit geeignet, signifkante phänotypische Ähnlichkeiten und Unterschiede verschiedener hereditärer Erkrankungen zu erfassen.
Eine Ontologie wie die HPO eröffnet viele neuartige Möglichkeiten, insbesondere auch auf dem Gebiet der klinisch-genetischen Diagnostik. Ein Beispiel hierfür ist der von uns entwickelte Phenomizer (http://compbio.charite.de/phenomizer), ein neuartiges, frei verfügbares ontologisches Suchprogramm. Der Phenomizer nutzt die semantische Struktur der HPO, um klinische Symptome anhand ihrer Spezifität zu wichten, und kann als Werkzeug für eine computergestützte klinische Differenzialdiagnostik verwendet werden.
Collapse
Affiliation(s)
- S.C. Doelken
- Aff1_215 grid.6363.0 0000000122184662 Institut für Medizinische Genetik Charité - Universitätsmedizin Berlin Augustenburger Platz 1 13353 Berlin Deutschland
| | - S. Köhler
- Aff1_215 grid.6363.0 0000000122184662 Institut für Medizinische Genetik Charité - Universitätsmedizin Berlin Augustenburger Platz 1 13353 Berlin Deutschland
- Aff2_215 grid.6363.0 0000000122184662 Berlin-Brandenburg Center for Regenerative Therapies (BCRT) Charité - Universitätsmedizin Berlin Berlin Deutschland
| | - S. Bauer
- Aff1_215 grid.6363.0 0000000122184662 Institut für Medizinische Genetik Charité - Universitätsmedizin Berlin Augustenburger Platz 1 13353 Berlin Deutschland
| | - C.-E. Ott
- Aff1_215 grid.6363.0 0000000122184662 Institut für Medizinische Genetik Charité - Universitätsmedizin Berlin Augustenburger Platz 1 13353 Berlin Deutschland
| | - P. Krawitz
- Aff1_215 grid.6363.0 0000000122184662 Institut für Medizinische Genetik Charité - Universitätsmedizin Berlin Augustenburger Platz 1 13353 Berlin Deutschland
- Aff2_215 grid.6363.0 0000000122184662 Berlin-Brandenburg Center for Regenerative Therapies (BCRT) Charité - Universitätsmedizin Berlin Berlin Deutschland
| | - D. Horn
- Aff1_215 grid.6363.0 0000000122184662 Institut für Medizinische Genetik Charité - Universitätsmedizin Berlin Augustenburger Platz 1 13353 Berlin Deutschland
| | - S. Mundlos
- Aff1_215 grid.6363.0 0000000122184662 Institut für Medizinische Genetik Charité - Universitätsmedizin Berlin Augustenburger Platz 1 13353 Berlin Deutschland
- Aff2_215 grid.6363.0 0000000122184662 Berlin-Brandenburg Center for Regenerative Therapies (BCRT) Charité - Universitätsmedizin Berlin Berlin Deutschland
- Aff3_215 grid.419538.2 0000000090710620 Max-Planck-Institut für Molekulare Genetik Berlin Deutschland
| | - P.N. Robinson
- Aff1_215 grid.6363.0 0000000122184662 Institut für Medizinische Genetik Charité - Universitätsmedizin Berlin Augustenburger Platz 1 13353 Berlin Deutschland
- Aff2_215 grid.6363.0 0000000122184662 Berlin-Brandenburg Center for Regenerative Therapies (BCRT) Charité - Universitätsmedizin Berlin Berlin Deutschland
- Aff3_215 grid.419538.2 0000000090710620 Max-Planck-Institut für Molekulare Genetik Berlin Deutschland
| |
Collapse
|
77
|
Minguez P, Dopazo J. Functional genomics and networks: new approaches in the extraction of complex gene modules. Expert Rev Proteomics 2010; 7:55-63. [PMID: 20121476 DOI: 10.1586/epr.09.103] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The engine that makes the cell work is made of an intricate network of molecular interactions. Nowadays, the elements and relationships of this complex network can be studied with several types of high-throughput techniques. The dream of having a global picture of the cell from different perspectives that can jointly explain cell behavior is, at least technically, feasible. However, this task can only be accomplished by filling the gap between data and information. The availability of methods capable of accurately managing, integrating and analyzing the results from these experiments is crucial for this purpose. Here, we review the new challenges raised by the availability of different genomic data, as well as the new proposals presented to cope with the increasing data complexity. Special emphasis is given to approaches that explore the transcriptome trying to describe the modules of genes that account for the traits studied.
Collapse
Affiliation(s)
- Pablo Minguez
- Department of Bioinformatics and Genomics, Centro de Investigación Príncipe Felipe, Valencia, Spain
| | | |
Collapse
|
78
|
Gholami AM, Fellenberg K. Cross-species common regulatory network inference without requirement for prior gene affiliation. ACTA ACUST UNITED AC 2010; 26:1082-90. [PMID: 20200011 DOI: 10.1093/bioinformatics/btq096] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Cross-species meta-analyses of microarray data usually require prior affiliation of genes based on orthology information that often relies on sequence similarity. RESULTS We present an algorithm merging microarray datasets on the basis of co-expression alone, without any requirement for orthology information to affiliate genes. Combining existing methods such as co-inertia analysis, back-transformation, Hungarian matching and majority voting in an iterative non-greedy hill-climbing approach, it affiliates arrays and genes at the same time, maximizing the co-structure between the datasets. To introduce the method, we demonstrate its performance on two closely and two distantly related datasets of different experimental context and produced on different platforms. Each pair stems from two different species. The resulting cross-species dynamic Bayesian gene networks improve on the networks inferred from each dataset alone by yielding more significant network motifs, as well as more of the interactions already recorded in KEGG and other databases. Also, it is shown that our algorithm converges on the optimal number of nodes for network inference. Being readily extendable to more than two datasets, it provides the opportunity to infer extensive gene regulatory networks. AVAILABILITY AND IMPLEMENTATION Source code (MATLAB and R) freely available for download at http://www.mchips.org/supplements/moghaddasi_source.tgz.
Collapse
Affiliation(s)
- Amin Moghaddas Gholami
- Chair of Proteomics and Bioanalytics, Center for Integrated Protein Sciences Munich (CIPSM), Technische Universität München, Emil Erlenmeyer Forum 5, 85354 Freising, Germany
| | | |
Collapse
|
79
|
|
80
|
The common biological basis for common complex diseases: evidence from lipoprotein lipase gene. Eur J Hum Genet 2010; 18:3-7. [PMID: 19639021 DOI: 10.1038/ejhg.2009.134] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
The lipoprotein lipase (LPL) gene encodes a rate-limiting enzyme protein that has a key role in the hydrolysis of triglycerides. Hypertriglyceridemia, one widely prevalent syndrome of LPL deficiency and dysfunction, may be a risk factor in the development of dyslipidemia, type II diabetes (T2D), essential hypertension (EH), coronary heart disease (CHD) and Alzheimer's disease (AD). Findings from earlier studies indicate that LPL may have a role in the pathology of these diseases and therefore is a common or shared biological basis for these common complex diseases. To examine this hypothesis, we reviewed articles on the molecular structure, expression and function of the LPL gene, and its potential role in the etiology of diseases. Evidence from these studies indicate that LPL dysfunction is involved in dyslipidemia, T2D, EH, CHD and AD; and support the hypothesis that there is a common or shared biological basis for these common complex diseases.
Collapse
|
81
|
Abstract
A standardized, controlled vocabulary allows phenotypic information to be described in an unambiguous fashion in medical publications and databases. The Human Phenotype Ontology (HPO) is being developed in an effort to provide such a vocabulary. The use of an ontology to capture phenotypic information allows the use of computational algorithms that exploit semantic similarity between related phenotypic abnormalities to define phenotypic similarity metrics, which can be used to perform database searches for clinical diagnostics or as a basis for incorporating the human phenome into large-scale computational analysis of gene expression patterns and other cellular phenomena associated with human disease. The HPO is freely available at http://www.human-phenotype-ontology.org.
Collapse
Affiliation(s)
- P N Robinson
- Institute for Medical Genetics, Augustenburger Platz 1, 13353 Berlin, Germany.
| | | |
Collapse
|
82
|
Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R. Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol 2010; 6:e1000641. [PMID: 20090828 PMCID: PMC2797085 DOI: 10.1371/journal.pcbi.1000641] [Citation(s) in RCA: 555] [Impact Index Per Article: 39.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2009] [Accepted: 12/14/2009] [Indexed: 11/18/2022] Open
Abstract
A fundamental challenge in human health is the identification of disease-causing genes. Recently, several studies have tackled this challenge via a network-based approach, motivated by the observation that genes causing the same or similar diseases tend to lie close to one another in a network of protein-protein or functional interactions. However, most of these approaches use only local network information in the inference process and are restricted to inferring single gene associations. Here, we provide a global, network-based method for prioritizing disease genes and inferring protein complex associations, which we call PRINCE. The method is based on formulating constraints on the prioritization function that relate to its smoothness over the network and usage of prior information. We exploit this function to predict not only genes but also protein complex associations with a disease of interest. We test our method on gene-disease association data, evaluating both the prioritization achieved and the protein complexes inferred. We show that our method outperforms extant approaches in both tasks. Using data on 1,369 diseases from the OMIM knowledgebase, our method is able (in a cross validation setting) to rank the true causal gene first for 34% of the diseases, and infer 139 disease-related complexes that are highly coherent in terms of the function, expression and conservation of their member proteins. Importantly, we apply our method to study three multi-factorial diseases for which some causal genes have been found already: prostate cancer, alzheimer and type 2 diabetes mellitus. PRINCE's predictions for these diseases highly match the known literature, suggesting several novel causal genes and protein complexes for further investigation. Understanding the genetic background of diseases is crucial to medical research, with implications in diagnosis, treatment and drug development. As molecular approaches to this challenge are time consuming and costly, computational approaches offer an efficient alternative. Such approaches aim at prioritizing genes in a genomic interval of interest according to their predicted strength-of-association with a given disease. State-of-the-art prioritization problems are based on the observation that genes causing similar diseases tend to lie close to one another in a network of protein-protein interactions. Here we develop a novel prioritization approach that uses the network data in a global manner and can tie not only single genes but also whole protein machineries with a given disease. Our method, PRINCE, is shown to outperform previous methods in both the gene prioritization task and the protein complex task. Applying PRINCE to prostate cancer, alzheimer's disease and type 2 diabetes, we are able to infer new causal genes and related protein complexes with high confidence.
Collapse
Affiliation(s)
- Oron Vanunu
- School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Oded Magger
- School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Eytan Ruppin
- School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Tomer Shlomi
- Department of Computer Science, Technion, Haifa, Israel
| | - Roded Sharan
- School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
- * E-mail:
| |
Collapse
|
83
|
Itin PH. Rationale and background as basis for a new classification of the ectodermal dysplasias. Am J Med Genet A 2010; 149A:1973-6. [PMID: 19353583 DOI: 10.1002/ajmg.a.32739] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Ectodermal dysplasias are heterogeneous heritable conditions characterized by congenital defects of one or more ectodermal structures and their appendages. Of approximately 200 different ectodermal dysplasias, about 30 have been identified at molecular level with identification of the causative gene. Itin and Fistarol emphasized that rather commonly non-fully expressed phenotypes exist, which make a clinical diagnosis more difficult. Freire-Maia and Pinheiro used the clinical aspects for their classification and Priolo integrated molecular genetic and clinical aspects for her scheme. Those two more historical classification schemes have the difficulty that when applied strictly, several additional groups of disorders should be integrated within the term of ectodermal dysplasias, for example, keratodermas with skin or hair alterations or the ichthyoses with associated abnormalities. Such consequent classification would lead to an endless list of conditions and would be useless for practical work. Recent evidence implicates a genetic defect in different pathways orchestrating ectodermal organogenesis. Modern molecular genetics will increasingly elucidate the basic defects of the different syndromes and yield more insight into the regulatory mechanisms of morphogenesis. In this way a reclassification of ectodermal dysplasias will be possible according to the function of their involved mutated genes. I will focus on the fact that with molecular methods it is possible to diagnose oligosymptomatic forms of ectodermal dysplasia. This is much more common than earlier anticipated and with the classification of ectodermal dysplasia on the basis of molecular diagnosis a new avenue is opened for symptom complexes which were impossible to classify in former times.
Collapse
Affiliation(s)
- Peter H Itin
- Department of Dermatology, University Hospital Basel, Petersgraben 4, Basel, Switzerland.
| |
Collapse
|
84
|
Role of Centrality in Network-Based Prioritization of Disease Genes. EVOLUTIONARY COMPUTATION, MACHINE LEARNING AND DATA MINING IN BIOINFORMATICS 2010. [DOI: 10.1007/978-3-642-12211-8_2] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
85
|
Oti M, Huynen MA, Brunner HG. The biological coherence of human phenome databases. Am J Hum Genet 2009; 85:801-8. [PMID: 20004759 DOI: 10.1016/j.ajhg.2009.10.026] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2009] [Revised: 10/15/2009] [Accepted: 10/20/2009] [Indexed: 11/28/2022] Open
Abstract
Disease networks are increasingly explored as a complement to networks centered around interactions between genes and proteins. The quality of disease networks is heavily dependent on the amount and quality of phenotype information in phenotype databases of human genetic diseases. We explored which aspects of phenotype database architecture and content best reflect the underlying biology of disease. We used the OMIM-based HPO, Orphanet, and POSSUM phenotype databases for this purpose and devised a biological coherence score based on the sharing of gene ontology annotation to investigate the degree to which phenotype similarity in these databases reflects related pathobiology. Our analyses support the notion that a fine-grained phenotype ontology enhances the accuracy of phenome representation. In addition, we find that the OMIM database that is most used by the human genetics community is heavily underannotated. We show that this problem can easily be overcome by simply adding data available in the POSSUM database to improve OMIM phenotype representations in the HPO. Also, we find that the use of feature frequency estimates--currently implemented only in the Orphanet database--significantly improves the quality of the phenome representation. Our data suggest that there is much to be gained by improving human phenome databases and that some of the measures needed to achieve this are relatively easy to implement. More generally, we propose that curation and more systematic annotation of human phenome databases can greatly improve the power of the phenotype for genetic disease analysis.
Collapse
Affiliation(s)
- Martin Oti
- Centre for Molecular and Biomolecular Informatics, Nijmegen Centre for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, Geert Grooteplein 26-28, 6525 GA Nijmegen, The Netherlands
| | | | | |
Collapse
|
86
|
Krauß S, So J, Hambrock M, Köhler A, Kunath M, Scharff C, Wessling M, Grzeschik KH, Schneider R, Schweiger S. Point mutations in GLI3 lead to misregulation of its subcellular localization. PLoS One 2009; 4:e7471. [PMID: 19829694 PMCID: PMC2758996 DOI: 10.1371/journal.pone.0007471] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2009] [Accepted: 09/22/2009] [Indexed: 11/23/2022] Open
Abstract
Background Mutations in the transcription factor GLI3, a downstream target of Sonic Hedgehog (SHH) signaling, are responsible for the development of malformation syndromes such as Greig-cephalopolysyndactyly-syndrome (GCPS), or Pallister-Hall-syndrome (PHS). Mutations that lead to loss of function of the protein and to haploinsufficiency cause GCPS, while truncating mutations that result in constitutive repressor function of GLI3 lead to PHS. As an exception, some point mutations in the C-terminal part of GLI3 observed in GCPS patients have so far not been linked to loss of function. We have shown recently that protein phosphatase 2A (PP2A) regulates the nuclear localization and transcriptional activity a of GLI3 function. Principal Findings We have shown recently that protein phosphatase 2A (PP2A) and the ubiquitin ligase MID1 regulate the nuclear localization and transcriptional activity of GLI3. Here we show mapping of the functional interaction between the MID1-α4-PP2A complex and GLI3 to a region between amino acid 568-1100 of GLI3. Furthermore we demonstrate that GCPS-associated point mutations, that are located in that region, lead to misregulation of the nuclear GLI3-localization and transcriptional activity. GLI3 phosphorylation itself however appears independent of its localization and remains untouched by either of the point mutations and by PP2A-activity, which suggests involvement of an as yet unknown GLI3 interaction partner, the phosphorylation status of which is regulated by PP2A activity, in the control of GLI3 subcellular localization and activity. Conclusions The present findings provide an explanation for the pathogenesis of GCPS in patients carrying C-terminal point mutations, and close the gap in our understanding of how GLI3-genotypes give rise to particular phenotypes. Furthermore, they provide a molecular explanation for the phenotypic overlap between Opitz syndrome patients with dysregulated PP2A-activity and syndromes caused by GLI3-mutations.
Collapse
Affiliation(s)
- Sybille Krauß
- Charité University Hospital, Department of Dermatology, Berlin, Germany
- Max-Planck Institute for Molecular Genetics, Department of Human Molecular Genetics (Ropers), Berlin, Germany
| | - Joyce So
- Max-Planck Institute for Molecular Genetics, Department of Human Molecular Genetics (Ropers), Berlin, Germany
| | - Melanie Hambrock
- Max-Planck Institute for Molecular Genetics, Department of Human Molecular Genetics (Ropers), Berlin, Germany
| | - Andrea Köhler
- Institute of Biochemistry and Center for Molecular Biosciences Innsbruck (CMBI), Innsbruck, Austria
| | - Melanie Kunath
- Max-Planck Institute for Molecular Genetics, Department of Human Molecular Genetics (Ropers), Berlin, Germany
| | - Constance Scharff
- Max-Planck Institute for Molecular Genetics, Department of Human Molecular Genetics (Ropers), Berlin, Germany
| | - Martina Wessling
- Center for Human Genetics, Phillipps University, Marburg, Germany
| | | | - Rainer Schneider
- Max-Planck Institute for Molecular Genetics, Department of Human Molecular Genetics (Ropers), Berlin, Germany
- Institute of Biochemistry and Center for Molecular Biosciences Innsbruck (CMBI), Innsbruck, Austria
- * E-mail:
| | - Susann Schweiger
- Max-Planck Institute for Molecular Genetics, Department of Human Molecular Genetics (Ropers), Berlin, Germany
- Ninewells Hospital, Department of Neuroscience and Pathology, Dundee, United Kingdom
| |
Collapse
|
87
|
Doherty D. Joubert syndrome: insights into brain development, cilium biology, and complex disease. Semin Pediatr Neurol 2009; 16:143-54. [PMID: 19778711 PMCID: PMC2804071 DOI: 10.1016/j.spen.2009.06.002] [Citation(s) in RCA: 129] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Joubert syndrome (JS) is a primarily autosomal recessive condition characterized by hypotonia, ataxia, abnormal eye movements, and intellectual disability with a distinctive mid-hindbrain malformation (the "molar tooth sign"). Variable features include retinal dystrophy, cystic kidney disease, liver fibrosis and polydactyly. Recently, substantial progress has been made in our understanding of the genetic basis of JS, including identification of seven causal genes (NPHP1, AHI1, CEP290, RPGRIP1L, TMEM67/MKS3, ARL13B and CC2D2A). Despite this progress, the known genes account for <50% of cases and few strong genotype-phenotype correlations exist in JS; however, genetic testing can be prioritized based on clinical features. While all seven JS genes have been implicated in the function of the primary cilium/basal body organelle (PC/BB), little is known about how the PC/BB is required for brain, kidney, retina and liver development/function, nor how disruption of PC/BB function leads to diseases of these organs. Recent work on the function of the PC/BB indicates that the organelle is required for multiple signaling pathways including sonic hedgehog, WNT and platelet derived growth factor. Due to shared clinical features and underlying molecular pathophysiology, JS is included in the rapidly expanding group of disorders called ciliopathies. The ciliopathies are emerging as models for more complex diseases, where sequence variants in multiple genes contribute to the phenotype expressed in any given patient.
Collapse
Affiliation(s)
- Dan Doherty
- University of Washington and Seattle Children's Hospital, Seattle, WA, USA.
| |
Collapse
|
88
|
Automated multidimensional phenotypic profiling using large public microarray repositories. Proc Natl Acad Sci U S A 2009; 106:12323-8. [PMID: 19590007 DOI: 10.1073/pnas.0900883106] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Phenotypes are complex, and difficult to quantify in a high-throughput fashion. The lack of comprehensive phenotype data can prevent or distort genotype-phenotype mapping. Here, we describe "PhenoProfiler," a computational method that enables in silico phenotype profiling. Drawing on the principle that similar gene expression patterns are likely to be associated with similar phenotype patterns, PhenoProfiler supplements the missing quantitative phenotype information for a given microarray dataset based on other well-characterized microarray datasets. We applied our method to 587 human microarray datasets covering >14,000 samples, and confirmed that the predicted phenotype profiles are highly consistent with true phenotype descriptions. PhenoProfiler offers several unique capabilities: (i) automated, multidimensional phenotype profiling, facilitating the analysis and treatment design of complex diseases; (ii) the extrapolation of phenotype profiles beyond provided classes; and (iii) the detection of confounding phenotype factors that could otherwise bias biological inferences. Finally, because no direct comparisons are made between gene expression values from different datasets, the method can use the entire body of cross-platform microarray data. This work has produced a compendium of phenotype profiles for the National Center for Biotechnology Information GEO datasets, which can facilitate an unbiased understanding of the transcriptome-phenome mapping. The continued accumulation of microarray data will further increase the power of PhenoProfiler, by increasing the variety and the quality of phenotypes to be profiled.
Collapse
|
89
|
Schur EA, Noonan C, Buchwald D, Goldberg J, Afari N. A twin study of depression and migraine: evidence for a shared genetic vulnerability. Headache 2009; 49:1493-502. [PMID: 19438739 DOI: 10.1111/j.1526-4610.2009.01425.x] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
OBJECTIVE To determine if shared genetic or environmental vulnerabilities could underlie depression and migraine. BACKGROUND Depression and migraine headaches frequently coexist and their comorbidity may be due to shared etiologies. METHODS Female twins in the University of Washington Twin Registry responded to a mailed survey regarding their health history. Depression and migraine were determined by self-report of a physician's diagnosis. We used bivariate structural equation modeling to test for shared genetic, common environmental, and unique environmental components, and to estimate the magnitude of any shared component. RESULTS Among 758 monozygotic and 306 dizygotic female pairs, 23% reported depression and 20% reported migraine headaches. Heritability was estimated to be 58% (95% confidence interval: 48-67%) for depression and 44% (95% confidence interval: 32-56%) for migraine. Bivariate structural equation modeling estimated that 20% of the variability in depression and migraine headaches was due to shared genes and 4% was due to shared unique environmental factors. CONCLUSIONS The comorbidity of depression and migraine headache may be due in part to shared genetic risk factors. Research should focus attention on shared pathways, thereby making progress on 2 disease fronts simultaneously and perhaps providing clinicians with unified treatment strategies.
Collapse
Affiliation(s)
- Ellen A Schur
- Department of Medicine, University of Washington School of Medicine, Harborview Medical Center, 325 Ninth Avenue, Seattle, WA 98104, USA
| | | | | | | | | |
Collapse
|
90
|
Mapping gene associations in human mitochondria using clinical disease phenotypes. PLoS Comput Biol 2009; 5:e1000374. [PMID: 19390613 PMCID: PMC2668170 DOI: 10.1371/journal.pcbi.1000374] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2008] [Accepted: 03/24/2009] [Indexed: 01/11/2023] Open
Abstract
Nuclear genes encode most mitochondrial proteins, and their mutations cause diverse and debilitating clinical disorders. To date, 1,200 of these mitochondrial genes have been recorded, while no standardized catalog exists of the associated clinical phenotypes. Such a catalog would be useful to develop methods to analyze human phenotypic data, to determine genotype-phenotype relations among many genes and diseases, and to support the clinical diagnosis of mitochondrial disorders. Here we establish a clinical phenotype catalog of 174 mitochondrial disease genes and study associations of diseases and genes. Phenotypic features such as clinical signs and symptoms were manually annotated from full-text medical articles and classified based on the hierarchical MeSH ontology. This classification of phenotypic features of each gene allowed for the comparison of diseases between different genes. In turn, we were then able to measure the phenotypic associations of disease genes for which we calculated a quantitative value that is based on their shared phenotypic features. The results showed that genes sharing more similar phenotypes have a stronger tendency for functional interactions, proving the usefulness of phenotype similarity values in disease gene network analysis. We then constructed a functional network of mitochondrial genes and discovered a higher connectivity for non-disease than for disease genes, and a tendency of disease genes to interact with each other. Utilizing these differences, we propose 168 candidate genes that resemble the characteristic interaction patterns of mitochondrial disease genes. Through their network associations, the candidates are further prioritized for the study of specific disorders such as optic neuropathies and Parkinson disease. Most mitochondrial disease phenotypes involve several clinical categories including neurologic, metabolic, and gastrointestinal disorders, which might indicate the effects of gene defects within the mitochondrial system. The accompanying knowledgebase (http://www.mitophenome.org/) supports the study of clinical diseases and associated genes. An important prerequisite for successful disease gene identification is the assessment, with minimal ambiguity, of a particular clinical trait or phenotype. Even with years of experience, recognizing and diagnosing mitochondrial diseases is still a major hurdle in clinical medicine. Computational tools supporting clinicians not only help identify affected individuals, but also guide studies of the genetic and biological causes of these disorders. In this study we dissect and categorize individual clinical features, signs, and symptoms of 174 disease genes and then identify gene similarities based on their shared phenotypic features. We demonstrate that genes sharing more similar phenotypes have a stronger tendency for functional interactions, proving the usefulness of phenotype similarity values in disease gene network analysis. Our study of a large functional network of mitochondrial genes revealed distinct properties that differentiate disease and non-disease genes. Disease genes showed a lower average total connectivity but a tendency to interact with each other; a finding that we used to predict 168 high-probability disease candidates. The accompanying knowledgebase allows for easy navigation between disease and gene information. We believe the open source format will support and encourage further research that will benefit this and other human phenome projects.
Collapse
|
91
|
Affiliation(s)
- Dian Donnai
- University of Manchester and Central Manchester Foundation Hospitals NHS Trust.
| |
Collapse
|
92
|
Ferguson-Smith MA. Testing and screening for chromosome abnormalities. Clin Med (Lond) 2009; 9:153-4. [PMID: 19435123 PMCID: PMC4952669 DOI: 10.7861/clinmedicine.9-2-153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
93
|
Cuccato G, Gatta GD, di Bernardo D. Systems and Synthetic biology: tackling genetic networks and complex diseases. Heredity (Edinb) 2009; 102:527-32. [DOI: 10.1038/hdy.2009.18] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
|
94
|
Care M, Bradford J, Needham C, Bulpitt A, Westhead D. Combining the interactome and deleterious SNP predictions to improve disease gene identification. Hum Mutat 2009; 30:485-92. [DOI: 10.1002/humu.20917] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
95
|
Girirajan S, Truong HT, Blanchard CL, Elsea SH. A functional network module for Smith-Magenis syndrome. Clin Genet 2009; 75:364-74. [PMID: 19236431 DOI: 10.1111/j.1399-0004.2008.01135.x] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Disorders with overlapping diagnostic features are grouped into a network module. Based on phenotypic similarities or differential diagnoses, it is possible to identify functional pathways leading to individual features. We generated a Smith-Magenis syndrome (SMS)-specific network module utilizing patient clinical data, text mining from the Online Mendelian Inheritance in Man database, and in vitro functional analysis. We tested our module by functional studies based on a hypothesis that RAI1 acts through phenotype-specific pathways involving several downstream genes, which are altered due to RAI1 haploinsufficiency. A preliminary genome-wide gene expression study was performed using microarrays on RAI1 haploinsufficient cells created by RNAi-based approximately 50% knockdown of RAI1 in HEK293T cells. The top dysregulated genes were involved in growth signaling and insulin sensitivity, neuronal differentiation, lipid biosynthesis and fat mobilization, circadian activity, behavior, renal, cardiovascular and skeletal development, gene expression, and cell-cycle regulation and recombination, reflecting the spectrum of clinical features observed in SMS. Validation using real-time quantitative reverse transcriptase polymerase chain reaction confirmed the gene expression profile of 75% of the selected genes analyzed in both HEK293T RAI1 knockdown cells and SMS lymphoblastoid cell lines. Overall, these data support a method for identifying genes and pathways responsible for individual clinical features in a complex disorder such as SMS.
Collapse
Affiliation(s)
- S Girirajan
- Department of Human and Molecular Genetics, Medical College of Virginia Campus, Virginia Commonwealth University, Richmond, VA 23298, USA
| | | | | | | |
Collapse
|
96
|
Robinson PN, Köhler S, Bauer S, Seelow D, Horn D, Mundlos S. The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet 2008; 83:610-5. [PMID: 18950739 DOI: 10.1016/j.ajhg.2008.09.017] [Citation(s) in RCA: 631] [Impact Index Per Article: 39.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2008] [Revised: 09/24/2008] [Accepted: 09/30/2008] [Indexed: 10/21/2022] Open
Abstract
There are many thousands of hereditary diseases in humans, each of which has a specific combination of phenotypic features, but computational analysis of phenotypic data has been hampered by lack of adequate computational data structures. Therefore, we have developed a Human Phenotype Ontology (HPO) with over 8000 terms representing individual phenotypic anomalies and have annotated all clinical entries in Online Mendelian Inheritance in Man with the terms of the HPO. We show that the HPO is able to capture phenotypic similarities between diseases in a useful and highly significant fashion.
Collapse
|
97
|
Jiang X, Liu B, Jiang J, Zhao H, Fan M, Zhang J, Fan Z, Jiang T. Modularity in the genetic disease-phenotype network. FEBS Lett 2008; 582:2549-54. [PMID: 18582463 DOI: 10.1016/j.febslet.2008.06.023] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2008] [Revised: 05/23/2008] [Accepted: 06/13/2008] [Indexed: 11/16/2022]
Abstract
Similar disease phenotypes are engendered as a result of the modular nature of gene networks; thus we hypothesized that all human genetic disease phenotypes appear in similar modular styles. Network representations of phenotypes make it possible to explore this hypothesis. We investigated the modularity of a network of genetic disease phenotypes. We computationally extracted phenotype modules and found that the modularity is well correlated with a physiological classification of human diseases. We also found correlations between the modularity and functional genomics as well as its connection to drug-target associations.
Collapse
Affiliation(s)
- Xingpeng Jiang
- National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, PR China
| | | | | | | | | | | | | | | |
Collapse
|
98
|
Network-based global inference of human disease genes. Mol Syst Biol 2008; 4:189. [PMID: 18463613 PMCID: PMC2424293 DOI: 10.1038/msb.2008.27] [Citation(s) in RCA: 430] [Impact Index Per Article: 26.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2007] [Accepted: 03/17/2008] [Indexed: 01/04/2023] Open
Abstract
Deciphering the genetic basis of human diseases is an important goal of biomedical research. On the basis of the assumption that phenotypically similar diseases are caused by functionally related genes, we propose a computational framework that integrates human protein–protein interactions, disease phenotype similarities, and known gene–phenotype associations to capture the complex relationships between phenotypes and genotypes. We develop a tool named CIPHER to predict and prioritize disease genes, and we show that the global concordance between the human protein network and the phenotype network reliably predicts disease genes. Our method is applicable to genetically uncharacterized phenotypes, effective in the genome-wide scan of disease genes, and also extendable to explore gene cooperativity in complex diseases. The predicted genetic landscape of over 1000 human phenotypes, which reveals the global modular organization of phenotype–genotype relationships. The genome-wide prioritization of candidate genes for over 5000 human phenotypes, including those with under-characterized disease loci or even those lacking known association, is publicly released to facilitate future discovery of disease genes.
Collapse
|
99
|
Oti M, van Reeuwijk J, Huynen MA, Brunner HG. Conserved co-expression for candidate disease gene prioritization. BMC Bioinformatics 2008; 9:208. [PMID: 18433471 PMCID: PMC2383918 DOI: 10.1186/1471-2105-9-208] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2008] [Accepted: 04/23/2008] [Indexed: 11/16/2022] Open
Abstract
Background Genes that are co-expressed tend to be involved in the same biological process. However, co-expression is not a very reliable predictor of functional links between genes. The evolutionary conservation of co-expression between species can be used to predict protein function more reliably than co-expression in a single species. Here we examine whether co-expression across multiple species is also a better prioritizer of disease genes than is co-expression between human genes alone. Results We use co-expression data from yeast (S. cerevisiae), nematode worm (C. elegans), fruit fly (D. melanogaster), mouse and human and find that the use of evolutionary conservation can indeed improve the predictive value of co-expression. The effect that genes causing the same disease have higher co-expression than do other genes from their associated disease loci, is significantly enhanced when co-expression data are combined across evolutionarily distant species. We also find that performance can vary significantly depending on the co-expression datasets used, and just using more data does not necessarily lead to better prioritization. Instead, we find that dataset quality is more important than quantity, and using a consistent microarray platform per species leads to better performance than using more inclusive datasets pooled from various platforms. Conclusion We find that evolutionarily conserved gene co-expression prioritizes disease candidate genes better than human gene co-expression alone, and provide the integrated data as a new resource for disease gene prioritization tools.
Collapse
Affiliation(s)
- Martin Oti
- Centre for Molecular and Biomolecular Informatics, Nijmegen Centre for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, Geert Grooteplein 26-28, 6525 GA, Nijmegen, The Netherlands.
| | | | | | | |
Collapse
|
100
|
Abstract
Lack of adipose tissue, either complete or partial, is the hallmark of disorders known as lipodystrophies. Patients with lipodystrophies suffer from metabolic complications similar to those associated with obesity, including insulin resistance, type 2 diabetes, hypertriglyceridemia, and hepatic steatosis. The loss of body fat in inherited lipodystrophies can be caused by defects in the development and/or differentiation of adipose tissue as a consequence of mutations in a number of genes, including PPARG (encoding a nuclear hormone receptor), AGPAT2 (encoding an enzyme involved in the biosynthesis of triglyceride and phospholipids), AKT2 (encoding a protein involved in insulin signal transduction), and BSCL2 (encoding seipin, whose role in the adipocyte biology remains unclear). The loss of body fat can also be caused by the premature death of adipocytes due to mutations in lamin A/C, nuclear lamina proteins, and ZMPSTE24, which modifies the prelamin A post-translationally. In this review, we focus on the molecular basis of inherited lipodystrophies as they relate to adipocyte biology and their associated phenotypic manifestations.
Collapse
Affiliation(s)
- Anil K Agarwal
- Division of Nutrition and Metabolic Diseases, Department of Internal Medicine and the Center for Human Nutrition, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas 75390-9052, USA
| | | |
Collapse
|