301
|
Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R. Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol 2010; 6:e1000641. [PMID: 20090828 PMCID: PMC2797085 DOI: 10.1371/journal.pcbi.1000641] [Citation(s) in RCA: 544] [Impact Index Per Article: 38.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2009] [Accepted: 12/14/2009] [Indexed: 11/18/2022] Open
Abstract
A fundamental challenge in human health is the identification of disease-causing genes. Recently, several studies have tackled this challenge via a network-based approach, motivated by the observation that genes causing the same or similar diseases tend to lie close to one another in a network of protein-protein or functional interactions. However, most of these approaches use only local network information in the inference process and are restricted to inferring single gene associations. Here, we provide a global, network-based method for prioritizing disease genes and inferring protein complex associations, which we call PRINCE. The method is based on formulating constraints on the prioritization function that relate to its smoothness over the network and usage of prior information. We exploit this function to predict not only genes but also protein complex associations with a disease of interest. We test our method on gene-disease association data, evaluating both the prioritization achieved and the protein complexes inferred. We show that our method outperforms extant approaches in both tasks. Using data on 1,369 diseases from the OMIM knowledgebase, our method is able (in a cross validation setting) to rank the true causal gene first for 34% of the diseases, and infer 139 disease-related complexes that are highly coherent in terms of the function, expression and conservation of their member proteins. Importantly, we apply our method to study three multi-factorial diseases for which some causal genes have been found already: prostate cancer, alzheimer and type 2 diabetes mellitus. PRINCE's predictions for these diseases highly match the known literature, suggesting several novel causal genes and protein complexes for further investigation. Understanding the genetic background of diseases is crucial to medical research, with implications in diagnosis, treatment and drug development. As molecular approaches to this challenge are time consuming and costly, computational approaches offer an efficient alternative. Such approaches aim at prioritizing genes in a genomic interval of interest according to their predicted strength-of-association with a given disease. State-of-the-art prioritization problems are based on the observation that genes causing similar diseases tend to lie close to one another in a network of protein-protein interactions. Here we develop a novel prioritization approach that uses the network data in a global manner and can tie not only single genes but also whole protein machineries with a given disease. Our method, PRINCE, is shown to outperform previous methods in both the gene prioritization task and the protein complex task. Applying PRINCE to prostate cancer, alzheimer's disease and type 2 diabetes, we are able to infer new causal genes and related protein complexes with high confidence.
Collapse
Affiliation(s)
- Oron Vanunu
- School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Oded Magger
- School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Eytan Ruppin
- School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Tomer Shlomi
- Department of Computer Science, Technion, Haifa, Israel
| | - Roded Sharan
- School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
- * E-mail:
| |
Collapse
|
302
|
Atias N, Sharan R. An Algorithmic Framework for Predicting Side-Effects of Drugs. LECTURE NOTES IN COMPUTER SCIENCE 2010. [DOI: 10.1007/978-3-642-12683-3_1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
|
303
|
Kann MG. Advances in translational bioinformatics: computational approaches for the hunting of disease genes. Brief Bioinform 2010; 11:96-110. [PMID: 20007728 PMCID: PMC2810112 DOI: 10.1093/bib/bbp048] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Revised: 09/15/2009] [Indexed: 12/29/2022] Open
Abstract
Over a 100 years ago, William Bateson provided, through his observations of the transmission of alkaptonuria in first cousin offspring, evidence of the application of Mendelian genetics to certain human traits and diseases. His work was corroborated by Archibald Garrod (Archibald AE. The incidence of alkaptonuria: a study in chemical individuality. Lancert 1902;ii:1616-20) and William Farabee (Farabee WC. Inheritance of digital malformations in man. In: Papers of the Peabody Museum of American Archaeology and Ethnology. Cambridge, Mass: Harvard University, 1905; 65-78), who recorded the familial tendencies of inheritance of malformations of human hands and feet. These were the pioneers of the hunt for disease genes that would continue through the century and result in the discovery of hundreds of genes that can be associated with different diseases. Despite many ground-breaking discoveries during the last century, we are far from having a complete understanding of the intricate network of molecular processes involved in diseases, and we are still searching for the cures for most complex diseases. In the last few years, new genome sequencing and other high-throughput experimental techniques have generated vast amounts of molecular and clinical data that contain crucial information with the potential of leading to the next major biomedical discoveries. The need to mine, visualize and integrate these data has motivated the development of several informatics approaches that can broadly be grouped in the research area of 'translational bioinformatics'. This review highlights the latest advances in the field of translational bioinformatics, focusing on the advances of computational techniques to search for and classify disease genes.
Collapse
Affiliation(s)
- Maricel G Kann
- University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA.
| |
Collapse
|
304
|
Exploring the Differences in Evolutionary Rates between Monogenic and Polygenic Disease Genes in Human. Mol Biol Evol 2009; 27:934-41. [DOI: 10.1093/molbev/msp297] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
|
305
|
Huynen MA, de Hollander M, Szklarczyk R. Mitochondrial proteome evolution and genetic disease. Biochim Biophys Acta Mol Basis Dis 2009; 1792:1122-9. [DOI: 10.1016/j.bbadis.2009.03.005] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2008] [Revised: 03/04/2009] [Accepted: 03/20/2009] [Indexed: 11/16/2022]
|
306
|
Barrenas F, Chavali S, Holme P, Mobini R, Benson M. Network properties of complex human disease genes identified through genome-wide association studies. PLoS One 2009; 4:e8090. [PMID: 19956617 PMCID: PMC2779513 DOI: 10.1371/journal.pone.0008090] [Citation(s) in RCA: 100] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2009] [Accepted: 11/03/2009] [Indexed: 11/21/2022] Open
Abstract
Background Previous studies of network properties of human disease genes have mainly focused on monogenic diseases or cancers and have suffered from discovery bias. Here we investigated the network properties of complex disease genes identified by genome-wide association studies (GWAs), thereby eliminating discovery bias. Principal findings We derived a network of complex diseases (n = 54) and complex disease genes (n = 349) to explore the shared genetic architecture of complex diseases. We evaluated the centrality measures of complex disease genes in comparison with essential and monogenic disease genes in the human interactome. The complex disease network showed that diseases belonging to the same disease class do not always share common disease genes. A possible explanation could be that the variants with higher minor allele frequency and larger effect size identified using GWAs constitute disjoint parts of the allelic spectra of similar complex diseases. The complex disease gene network showed high modularity with the size of the largest component being smaller than expected from a randomized null-model. This is consistent with limited sharing of genes between diseases. Complex disease genes are less central than the essential and monogenic disease genes in the human interactome. Genes associated with the same disease, compared to genes associated with different diseases, more often tend to share a protein-protein interaction and a Gene Ontology Biological Process. Conclusions This indicates that network neighbors of known disease genes form an important class of candidates for identifying novel genes for the same disease.
Collapse
Affiliation(s)
- Fredrik Barrenas
- The Unit for Clinical Systems Biology, University of Gothenburg, Gothenburg, Sweden.
| | | | | | | | | |
Collapse
|
307
|
ZHAO Y, CHEN LN, ZHANG LC, WANG Q, SHANG YK, WANG H, LI W. Predicting Disease Genes of Coronary Artery Disease Based on Functional Consistency and Network Topological Features*. PROG BIOCHEM BIOPHYS 2009. [DOI: 10.3724/sp.j.1206.2008.00623] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
308
|
Lee E, Jung H, Radivojac P, Kim JW, Lee D. Analysis of AML genes in dysregulated molecular networks. BMC Bioinformatics 2009; 10 Suppl 9:S2. [PMID: 19761572 PMCID: PMC2745689 DOI: 10.1186/1471-2105-10-s9-s2] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identifying disease causing genes and understanding their molecular mechanisms are essential to developing effective therapeutics. Thus, several computational methods have been proposed to prioritize candidate disease genes by integrating different data types, including sequence information, biomedical literature, and pathway information. Recently, molecular interaction networks have been incorporated to predict disease genes, but most of those methods do not utilize invaluable disease-specific information available in mRNA expression profiles of patient samples. RESULTS Through the integration of protein-protein interaction networks and gene expression profiles of acute myeloid leukemia (AML) patients, we identified subnetworks of interacting proteins dysregulated in AML and characterized known mutation genes causally implicated to AML embedded in the subnetworks. The analysis shows that the set of extracted subnetworks is a reservoir rich in AML genes reflecting key leukemogenic processes such as myeloid differentiation. CONCLUSION We showed that the integrative approach both utilizing gene expression profiles and molecular networks could identify AML causing genes most of which were not detectable with gene expression analysis alone due to the minor changes in mRNA level.
Collapse
Affiliation(s)
- Eunjung Lee
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, South Korea.
| | | | | | | | | |
Collapse
|
309
|
Bhavnani SK, Eichinger F, Martini S, Saxman P, Jagadish HV, Kretzler M. Network analysis of genes regulated in renal diseases: implications for a molecular-based classification. BMC Bioinformatics 2009; 10 Suppl 9:S3. [PMID: 19761573 PMCID: PMC2745690 DOI: 10.1186/1471-2105-10-s9-s3] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Chronic renal diseases are currently classified based on morphological similarities such as whether they produce predominantly inflammatory or non-inflammatory responses. However, such classifications do not reliably predict the course of the disease and its response to therapy. In contrast, recent studies in diseases such as breast cancer suggest that a classification which includes molecular information could lead to more accurate diagnoses and prediction of treatment response. This article describes how we extracted gene expression profiles from biopsies of patients with chronic renal diseases, and used network visualizations and associated quantitative measures to rapidly analyze similarities and differences between the diseases. Results The analysis revealed three main regularities: (1) Many genes associated with a single disease, and fewer genes associated with many diseases. (2) Unexpected combinations of renal diseases that share relatively large numbers of genes. (3) Uniform concordance in the regulation of all genes in the network. Conclusion The overall results suggest the need to define a molecular-based classification of renal diseases, in addition to hypotheses for the unexpected patterns of shared genes and the uniformity in gene concordance. Furthermore, the results demonstrate the utility of network analyses to rapidly understand complex relationships between diseases and regulated genes.
Collapse
Affiliation(s)
- Suresh K Bhavnani
- Center for Computational Medicine & Bioinformatics, 24 Frank Lloyd Wright Dr, Domino's Farm, Lobby L, Ann Arbor, MI 48109-0738, USA.
| | | | | | | | | | | |
Collapse
|
310
|
Linghu B, Snitkin ES, Hu Z, Xia Y, Delisi C. Genome-wide prioritization of disease genes and identification of disease-disease associations from an integrated human functional linkage network. Genome Biol 2009; 10:R91. [PMID: 19728866 PMCID: PMC2768980 DOI: 10.1186/gb-2009-10-9-r91] [Citation(s) in RCA: 180] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2009] [Revised: 07/09/2009] [Accepted: 09/03/2009] [Indexed: 11/16/2022] Open
Abstract
An evidence-weighted functional-linkage network of human genes reveals associations among diseases that share no known disease genes and have dissimilar phenotypes
We integrate 16 genomic features to construct an evidence-weighted functional-linkage network comprising 21,657 human genes. The functional-linkage network is used to prioritize candidate genes for 110 diseases, and to reliably disclose hidden associations between disease pairs having dissimilar phenotypes, such as hypercholesterolemia and Alzheimer's disease. Many of these disease-disease associations are supported by epidemiology, but with no previous genetic basis. Such associations can drive novel hypotheses on molecular mechanisms of diseases and therapies.
Collapse
Affiliation(s)
- Bolan Linghu
- Bioinformatics Program, Boston University, 24 Cummington Street, Boston, MA 02215, USA.
| | | | | | | | | |
Collapse
|
311
|
Li J, Zimmerman LJ, Park BH, Tabb DL, Liebler DC, Zhang B. Network-assisted protein identification and data interpretation in shotgun proteomics. Mol Syst Biol 2009; 5:303. [PMID: 19690572 PMCID: PMC2736651 DOI: 10.1038/msb.2009.54] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2009] [Accepted: 07/07/2009] [Indexed: 11/30/2022] Open
Abstract
Protein assembly and biological interpretation of the assembled protein lists are critical steps in shotgun proteomics data analysis. Although most biological functions arise from interactions among proteins, current protein assembly pipelines treat proteins as independent entities. Usually, only individual proteins with strong experimental evidence, that is, confident proteins, are reported, whereas many possible proteins of biological interest are eliminated. We have developed a clique-enrichment approach (CEA) to rescue eliminated proteins by incorporating the relationship among proteins as embedded in a protein interaction network. In several data sets tested, CEA increased protein identification by 8–23% with an estimated accuracy of 85%. Rescued proteins were supported by existing literature or transcriptome profiling studies at similar levels as confident proteins and at a significantly higher level than abandoned ones. Applying CEA on a breast cancer data set, rescued proteins coded by well-known breast cancer genes. In addition, CEA generated a network view of the proteins and helped show the modular organization of proteins that may underpin the molecular mechanisms of the disease.
Collapse
Affiliation(s)
- Jing Li
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232-8340, USA
| | | | | | | | | | | |
Collapse
|
312
|
Tiffin N, Andrade-Navarro MA, Perez-Iratxeta C. Linking genes to diseases: it's all in the data. Genome Med 2009; 1:77. [PMID: 19678910 PMCID: PMC2768963 DOI: 10.1186/gm77] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Genome-wide association analyses on large patient cohorts are generating large sets of candidate disease genes. This is coupled with the availability of ever-increasing genomic databases and a rapidly expanding repository of biomedical literature. Computational approaches to disease-gene association attempt to harness these data sources to identify the most likely disease gene candidates for further empirical analysis by translational researchers, resulting in efficient identification of genes of diagnostic, prognostic and therapeutic value. Existing computational methods analyze gene structure and sequence, functional annotation of candidate genes, characteristics of known disease genes, gene regulatory networks, protein-protein interactions, data from animal models and disease phenotype. To date, a few studies have successfully applied computational analysis of clinical phenotype data for specific diseases and shown genetic associations. In the near future, computational strategies will be facilitated by improved integration of clinical and computational research, and by increased availability of clinical phenotype data in a format accessible to computational approaches.
Collapse
Affiliation(s)
- Nicki Tiffin
- MRC/UWC/SANBI Bioinformatics Capacity Development Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville 7535, South Africa.
| | | | | |
Collapse
|
313
|
Abstract
The integration of medical information into gene and protein networks could lead to a better understanding of complex diseases. Molecular networks are being used to reconcile genotypes and phenotypes by integrating medical information. In this context, networks will be instrumental for the interpretation of disease at the personalized medicine level.
Collapse
Affiliation(s)
- Anaïs Baudot
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), C/Melchor Fernández Almagro 3, E-28029 Madrid, Spain
| | | | | |
Collapse
|
314
|
Affiliation(s)
- Dian Donnai
- University of Manchester and Central Manchester Foundation Hospitals NHS Trust.
| |
Collapse
|
315
|
Ferguson-Smith MA. Testing and screening for chromosome abnormalities. Clin Med (Lond) 2009; 9:153-4. [PMID: 19435123 PMCID: PMC4952669 DOI: 10.7861/clinmedicine.9-2-153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
316
|
Lee E, Jung H, Radivojac P, Kim JW, Lee D. Analysis of AML Genes in Dysregulated Molecular Networks. SUMMIT ON TRANSLATIONAL BIOINFORMATICS 2009; 2009:1-18. [PMID: 21347161 PMCID: PMC3041561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
BACKGROUND Identifying disease causing genes and understanding their molecular mechanisms are essential to developing effective therapeutics. Thus, several computational methods have been proposed to prioritize candidate disease genes by integrating different data types, including sequence information, biomedical literature, and pathway information. Recently, molecular interaction networks have been incorporated to predict disease genes, but most of those methods do not utilize invaluable disease-specific information available in mRNA expression profiles of patient samples. RESULTS Through the integration of protein-protein interaction networks and gene expression profiles of acute myeloid leukemia (AML) patients, we identified subnetworks of interacting proteins dysregulated in AML and characterized known mutation genes causally implicated to AML embedded in the subnetworks. The analysis shows that the set of extracted subnetworks is a reservoir rich in AML genes reflecting key leukemogenic processes such as myeloid differentiation, CONCLUSION We showed that the integrative approach both utilizing gene expression profiles and molecular networks could identify AML causing genes most of which were not detectable with gene expression analysis alone due to their minor changes in mRNA.
Collapse
Affiliation(s)
- Eunjung Lee
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, South Korea, Biomedical Research Center, KAIST, Daejeon 305-701, South Korea
| | - Hyunchul Jung
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, South Korea
| | - Predrag Radivojac
- School of Informatics, Indiana University, Bloomington, IN 47408, USA
| | - Jong-Won Kim
- Department of Laboratory Medicine and Genetics, Samsung Medical Center, Sungkyunkwan University, School of Medicine, Seoul 135-710, South Korea
| | - Doheon Lee
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, South Korea,Corresponding author
| |
Collapse
|
317
|
Care M, Bradford J, Needham C, Bulpitt A, Westhead D. Combining the interactome and deleterious SNP predictions to improve disease gene identification. Hum Mutat 2009; 30:485-92. [DOI: 10.1002/humu.20917] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
318
|
Chen J, Aronow BJ, Jegga AG. Disease candidate gene identification and prioritization using protein interaction networks. BMC Bioinformatics 2009; 10:73. [PMID: 19245720 PMCID: PMC2657789 DOI: 10.1186/1471-2105-10-73] [Citation(s) in RCA: 226] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2008] [Accepted: 02/27/2009] [Indexed: 12/22/2022] Open
Abstract
Background Although most of the current disease candidate gene identification and prioritization methods depend on functional annotations, the coverage of the gene functional annotations is a limiting factor. In the current study, we describe a candidate gene prioritization method that is entirely based on protein-protein interaction network (PPIN) analyses. Results For the first time, extended versions of the PageRank and HITS algorithms, and the K-Step Markov method are applied to prioritize disease candidate genes in a training-test schema. Using a list of known disease-related genes from our earlier study as a training set ("seeds"), and the rest of the known genes as a test list, we perform large-scale cross validation to rank the candidate genes and also evaluate and compare the performance of our approach. Under appropriate settings – for example, a back probability of 0.3 for PageRank with Priors and HITS with Priors, and step size 6 for K-Step Markov method – the three methods achieved a comparable AUC value, suggesting a similar performance. Conclusion Even though network-based methods are generally not as effective as integrated functional annotation-based methods for disease candidate gene prioritization, in a one-to-one comparison, PPIN-based candidate gene prioritization performs better than all other gene features or annotations. Additionally, we demonstrate that methods used for studying both social and Web networks can be successfully used for disease candidate gene prioritization.
Collapse
Affiliation(s)
- Jing Chen
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
| | | | | |
Collapse
|
319
|
Teber ET, Liu JY, Ballouz S, Fatkin D, Wouters MA. Comparison of automated candidate gene prediction systems using genes implicated in type 2 diabetes by genome-wide association studies. BMC Bioinformatics 2009; 10 Suppl 1:S69. [PMID: 19208173 PMCID: PMC2648789 DOI: 10.1186/1471-2105-10-s1-s69] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Background Automated candidate gene prediction systems allow geneticists to hone in on disease genes more rapidly by identifying the most probable candidate genes linked to the disease phenotypes under investigation. Here we assessed the ability of eight different candidate gene prediction systems to predict disease genes in intervals previously associated with type 2 diabetes by benchmarking their performance against genes implicated by recent genome-wide association studies. Results Using a search space of 9556 genes, all but one of the systems pruned the genome in favour of genes associated with moderate to highly significant SNPs. Of the 11 genes associated with highly significant SNPs identified by the genome-wide association studies, eight were flagged as likely candidates by at least one of the prediction systems. A list of candidates produced by a previous consensus approach did not match any of the genes implicated by 706 moderate to highly significant SNPs flagged by the genome-wide association studies. We prioritized genes associated with medium significance SNPs. Conclusion The study appraises the relative success of several candidate gene prediction systems against independent genetic data. Even when confronted with challengingly large intervals, the candidate gene prediction systems can successfully select likely disease genes. Furthermore, they can be used to filter statistically less-well-supported genetic data to select more likely candidates. We suggest consensus approaches fail because they penalize novel predictions made from independent underlying databases. To realize their full potential further work needs to be done on prioritization and annotation of genes.
Collapse
Affiliation(s)
- Erdahl T Teber
- Victor Chang Cardiac Research Institute, 384 Victoria St, Darlinghurst, 2010, NSW, Australia.
| | | | | | | | | |
Collapse
|
320
|
Hutz JE, Kraja AT, McLeod HL, Province MA. CANDID: a flexible method for prioritizing candidate genes for complex human traits. Genet Epidemiol 2009; 32:779-90. [PMID: 18613097 DOI: 10.1002/gepi.20346] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Genomewide studies and localized candidate gene approaches have become everyday study designs for identifying polymorphisms in genes that influence complex human traits. Yet, in general, the number of significant findings and the need to focus on smaller regions require a prioritization of genes for further study. Some candidate gene identification algorithms have been proposed in recent years to attempt to streamline this prioritization, but many suffer from limitations imposed by the source data or are difficult to use and understand. CANDID is a prioritization algorithm designed to produce impartial, accurate rankings of candidate genes that influence complex human traits. CANDID can use information from publications, protein domain descriptions, cross-species conservation measures, gene expression profiles and protein-protein interactions in its analysis. Additionally, users may supplement these data sources with results from linkage, association and other studies. CANDID was tested on well-known complex trait genes using data from the Online Mendelian Inheritance in Man database. Additionally, CANDID was evaluated in a modeled gene discovery environment, where it ranked genes whose trait associations were published after CANDID's databases were compiled. In all settings, CANDID exhibited high sensitivity and specificity, indicating an improvement upon previously published algorithms. Its accuracy and ease of use make CANDID a highly useful tool in study design and analysis for complex human traits.
Collapse
Affiliation(s)
- Janna E Hutz
- Division of Statistical Genomics, Washington University School of Medicine, Saint Louis, Missouri, USA.
| | | | | | | |
Collapse
|
321
|
Functional organization of the yeast proteome by a yeast interactome map. Proc Natl Acad Sci U S A 2009; 106:1490-5. [PMID: 19164585 DOI: 10.1073/pnas.0808624106] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
It is hoped that comprehensive mapping of protein physical interactions will facilitate insights regarding both fundamental cell biology processes and the pathology of diseases. To fulfill this hope, good solutions to 2 issues will be essential: (i) how to obtain reliable interaction data in a high-throughput setting and (ii) how to structure interaction data in a meaningful form, amenable to and valuable for further biological research. In this article, we structure an interactome in terms of predicted permanent protein complexes and predicted transient, nongeneric interactions between these complexes. The interactome is generated by means of an associated computational algorithm, from raw high-throughput affinity purification/mass spectrometric interaction data. We apply our technique to the construction of an interactome for Saccharomyces cerevisiae, showing that it yields reliability typical of low-throughput experiments from high-throughput data. We discuss biological insights raised by this interactome including, via homology, a few related to human disease.
Collapse
|
322
|
Kremer H, Cremers FPM. Positional cloning of deafness genes. Methods Mol Biol 2009; 493:215-238. [PMID: 18839350 DOI: 10.1007/978-1-59745-523-7_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The identification of the majority of the known causative genes involved in nonsyndromic sensorineural hearing loss (NSHL) started with linkage analysis as part of a positional cloning procedure. The human and mouse genome projects in combination with technical developments on genotyping, transcriptomics, proteomics, and the creation of animal models have greatly enhanced the speed of disease gene identification. In the present chapter, we first discuss the possibilities for exclusion of known NSHL loci and genes. Subsequently, methods are described to determine the genomic regions that contain the genetic defects. These include linkage analysis with genotyping and statistical evaluation and the determination of copy number variations. In the case of a large genomic region, candidate genes are selected and prioritized using gene expression analysis, protein network data, and phenotypes of animal models. A number of algorithms are described to automate the process of candidate gene selection. The novel high-throughput sequencing techniques might make gene selection and prioritization unnecessary in the near future. Once genetic variants are identified, questions on pathogenicity need to be addressed, which is the topic of the last section of this chapter.
Collapse
Affiliation(s)
- Hannie Kremer
- Department of Otorhinolaryngology, Nijmegen Centre for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | | |
Collapse
|
323
|
Aerts S, Vilain S, Hu S, Tranchevent LC, Barriot R, Yan J, Moreau Y, Hassan BA, Quan XJ. Integrating computational biology and forward genetics in Drosophila. PLoS Genet 2009; 5:e1000351. [PMID: 19165344 PMCID: PMC2628282 DOI: 10.1371/journal.pgen.1000351] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2008] [Accepted: 12/19/2008] [Indexed: 11/18/2022] Open
Abstract
Genetic screens are powerful methods for the discovery of gene-phenotype associations. However, a systems biology approach to genetics must leverage the massive amount of "omics" data to enhance the power and speed of functional gene discovery in vivo. Thus far, few computational methods for gene function prediction have been rigorously tested for their performance on a genome-wide scale in vivo. In this work, we demonstrate that integrating genome-wide computational gene prioritization with large-scale genetic screening is a powerful tool for functional gene discovery. To discover genes involved in neural development in Drosophila, we extend our strategy for the prioritization of human candidate disease genes to functional prioritization in Drosophila. We then integrate this prioritization strategy with a large-scale genetic screen for interactors of the proneural transcription factor Atonal using genomic deficiencies and mutant and RNAi collections. Using the prioritized genes validated in our genetic screen, we describe a novel genetic interaction network for Atonal. Lastly, we prioritize the whole Drosophila genome and identify candidate gene associations for ten receptor-signaling pathways. This novel database of prioritized pathway candidates, as well as a web application for functional prioritization in Drosophila, called Endeavour-HighFly, and the Atonal network, are publicly available resources. A systems genetics approach that combines the power of computational predictions with in vivo genetic screens strongly enhances the process of gene function and gene-gene association discovery.
Collapse
Affiliation(s)
- Stein Aerts
- Laboratory of Neurogenetics, Department of Molecular and Developmental Genetics, Vlaams Instituut voor Biotechnologie, Leuven, Belgium
- Department of Human Genetics, Katholieke Universiteit Leuven School of Medicine, Leuven, Belgium
| | - Sven Vilain
- Laboratory of Neurogenetics, Department of Molecular and Developmental Genetics, Vlaams Instituut voor Biotechnologie, Leuven, Belgium
- Department of Human Genetics, Katholieke Universiteit Leuven School of Medicine, Leuven, Belgium
- Doctoral Program in Molecular and Developmental Genetics, Katholieke Universiteit Leuven School of Medicine, Leuven, Belgium
| | - Shu Hu
- Laboratory of Neurogenetics, Department of Molecular and Developmental Genetics, Vlaams Instituut voor Biotechnologie, Leuven, Belgium
- Department of Human Genetics, Katholieke Universiteit Leuven School of Medicine, Leuven, Belgium
- Doctoral Program in Molecular and Developmental Genetics, Katholieke Universiteit Leuven School of Medicine, Leuven, Belgium
| | | | - Roland Barriot
- Department of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Jiekun Yan
- Laboratory of Neurogenetics, Department of Molecular and Developmental Genetics, Vlaams Instituut voor Biotechnologie, Leuven, Belgium
- Department of Human Genetics, Katholieke Universiteit Leuven School of Medicine, Leuven, Belgium
| | - Yves Moreau
- Department of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Bassem A. Hassan
- Laboratory of Neurogenetics, Department of Molecular and Developmental Genetics, Vlaams Instituut voor Biotechnologie, Leuven, Belgium
- Department of Human Genetics, Katholieke Universiteit Leuven School of Medicine, Leuven, Belgium
- Doctoral Program in Molecular and Developmental Genetics, Katholieke Universiteit Leuven School of Medicine, Leuven, Belgium
| | - Xiao-Jiang Quan
- Laboratory of Neurogenetics, Department of Molecular and Developmental Genetics, Vlaams Instituut voor Biotechnologie, Leuven, Belgium
- Department of Human Genetics, Katholieke Universiteit Leuven School of Medicine, Leuven, Belgium
| |
Collapse
|
324
|
Gao S, Wang X. Predicting Type 1 Diabetes Candidate Genes using Human Protein-Protein Interaction Networks. ACTA ACUST UNITED AC 2009; 2:133. [PMID: 20148193 PMCID: PMC2818071 DOI: 10.4172/jcsb.1000025] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Background Proteins directly interacting with each other tend to have similar functions and be involved in the same cellular processes. Mutations in genes that code for them often lead to the same family of disease phenotypes. Efforts have been made to prioritize positional candidate genes for complex diseases utilize the protein-protein interaction (PPI) information. But such an approach is often considered too general to be practically useful for specific diseases. Results In this study we investigate the efficacy of this approach in type 1 diabetes (T1D). 266 known disease genes, and 983 positional candidate genes from the 18 established linkage loci of T1D, are compiled from the T1Dbase (http://t1dbase.org). We found that the PPI network of known T1D genes has distinct topological features from others, with significantly higher number of interactions among themselves even after adjusting for their high network degrees (p<1e-5). We then define those positional candidates that are first degree PPI neighbours of the 266 known disease genes to be new candidate disease genes. This leads to a list of 68 genes for further study. Cross validation using the known disease genes as benchmark reveals that the enrichment is ~17.1 fold over random selection, and ~4 fold better than using the linkage information alone. We find that the citations of the new candidates in T1D-related publications are significantly (p<1e-7) more than random, even after excluding the co-citation with the known disease genes; they are significantly over-represented (p<1e-10) in the top 30 GO terms shared by known disease genes. Furthermore, sequence analysis reveals that they contain significantly (p<0.0004) more protein domains that are known to be relevant to T1D. These findings provide indirect validation of the newly predicted candidates. Conclusion Our study demonstrates the potential of the PPI information in prioritizing positional candidate genes for T1D.
Collapse
Affiliation(s)
- Shouguo Gao
- Department of Physics & the Comprehensive Diabetes Center, University of Alabama at Birmingham, 1300 University Blvd, Birmingham, AL 35294, USA
| | | |
Collapse
|
325
|
Ortutay C, Vihinen M. Identification of candidate disease genes by integrating Gene Ontologies and protein-interaction networks: case study of primary immunodeficiencies. Nucleic Acids Res 2008; 37:622-8. [PMID: 19073697 PMCID: PMC2632920 DOI: 10.1093/nar/gkn982] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Disease gene identification is still a challenge despite modern high-throughput methods. Many diseases are very rare or lethal and thus cannot be investigated with traditional methods. Several in silico methods have been developed but they have some limitations. We introduce a new method that combines information about protein-interaction network properties and Gene Ontology terms. Genes with high-calculated network scores and statistically significant gene ontology terms based on known diseases are prioritized as candidate genes. The method was applied to identify novel primary immunodeficiency-related genes, 26 of which were found. The investigation uses the protein-interaction network for all essential immunome human genes available in the Immunome Knowledge Base and an analysis of their enriched gene ontology annotations. The identified disease gene candidates are mainly involved in cellular signaling including receptors, protein kinases and adaptor and binding proteins as well as enzymes. The method can be generalized for any disease group with sufficient information.
Collapse
Affiliation(s)
- Csaba Ortutay
- Institute of Medical Technology, FI-33014 University of Tampere and Tampere University Hospital, FI-33520 Tampere, Finland
| | | |
Collapse
|
326
|
Chen R, Morgan AA, Dudley J, Deshpande T, Li L, Kodama K, Chiang AP, Butte AJ. FitSNPs: highly differentially expressed genes are more likely to have variants associated with disease. Genome Biol 2008; 9:R170. [PMID: 19061490 PMCID: PMC2646274 DOI: 10.1186/gb-2008-9-12-r170] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2008] [Revised: 09/26/2008] [Accepted: 12/05/2008] [Indexed: 12/18/2022] Open
Abstract
Differential expressed genes are more likely to have variants associated with disease. A new tool, fitSNP, prioritizes candidate SNPs from association studies. Background Candidate single nucleotide polymorphisms (SNPs) from genome-wide association studies (GWASs) were often selected for validation based on their functional annotation, which was inadequate and biased. We propose to use the more than 200,000 microarray studies in the Gene Expression Omnibus to systematically prioritize candidate SNPs from GWASs. Results We analyzed all human microarray studies from the Gene Expression Omnibus, and calculated the observed frequency of differential expression, which we called differential expression ratio, for every human gene. Analysis conducted in a comprehensive list of curated disease genes revealed a positive association between differential expression ratio values and the likelihood of harboring disease-associated variants. By considering highly differentially expressed genes, we were able to rediscover disease genes with 79% specificity and 37% sensitivity. We successfully distinguished true disease genes from false positives in multiple GWASs for multiple diseases. We then derived a list of functionally interpolating SNPs (fitSNPs) to analyze the top seven loci of Wellcome Trust Case Control Consortium type 1 diabetes mellitus GWASs, rediscovered all type 1 diabetes mellitus genes, and predicted a novel gene (KIAA1109) for an unexplained locus 4q27. We suggest that fitSNPs would work equally well for both Mendelian and complex diseases (being more effective for cancer) and proposed candidate genes to sequence for their association with 597 syndromes with unknown molecular basis. Conclusions Our study demonstrates that highly differentially expressed genes are more likely to harbor disease-associated DNA variants. FitSNPs can serve as an effective tool to systematically prioritize candidate SNPs from GWASs.
Collapse
Affiliation(s)
- Rong Chen
- Stanford Center for Biomedical Informatics Research, 251 Cmpus Drive, Stanford, CA 94305, USA.
| | | | | | | | | | | | | | | |
Collapse
|
327
|
GeneDistiller--distilling candidate genes from linkage intervals. PLoS One 2008; 3:e3874. [PMID: 19057649 PMCID: PMC2587712 DOI: 10.1371/journal.pone.0003874] [Citation(s) in RCA: 90] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2008] [Accepted: 11/10/2008] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Linkage studies often yield intervals containing several hundred positional candidate genes. Different manual or automatic approaches exist for the determination of the gene most likely to cause the disease. While the manual search is very flexible and takes advantage of the researchers' background knowledge and intuition, it may be very cumbersome to collect and study the relevant data. Automatic solutions on the other hand usually focus on certain models, remain "black boxes" and do not offer the same degree of flexibility. METHODOLOGY We have developed a web-based application that combines the advantages of both approaches. Information from various data sources such as gene-phenotype associations, gene expression patterns and protein-protein interactions was integrated into a central database. Researchers can select which information for the genes within a candidate interval or for single genes shall be displayed. Genes can also interactively be filtered, sorted and prioritised according to criteria derived from the background knowledge and preconception of the disease under scrutiny. CONCLUSIONS GeneDistiller provides knowledge-driven, fully interactive and intuitive access to multiple data sources. It displays maximum relevant information, while saving the user from drowning in the flood of data. A typical query takes less than two seconds, thus allowing an interactive and explorative approach to the hunt for the candidate gene. ACCESS GeneDistiller can be freely accessed at http://www.genedistiller.org.
Collapse
|
328
|
Yilmaz S, Jonveaux P, Bicep C, Pierron L, Smaïl-Tabbone M, Devignes MD. Gene-disease relationship discovery based on model-driven data integration and database view definition. ACTA ACUST UNITED AC 2008; 25:230-6. [PMID: 19042916 PMCID: PMC2639000 DOI: 10.1093/bioinformatics/btn612] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Motivation: Computational methods are widely used to discover gene–disease relationships hidden in vast masses of available genomic and post-genomic data. In most current methods, a similarity measure is calculated between gene annotations and known disease genes or disease descriptions. However, more explicit gene–disease relationships are required for better insights into the molecular bases of diseases, especially for complex multi-gene diseases. Results: Explicit relationships between genes and diseases are formulated as candidate gene definitions that may include intermediary genes, e.g. orthologous or interacting genes. These definitions guide data modelling in our database approach for gene–disease relationship discovery and are expressed as views which ultimately lead to the retrieval of documented sets of candidate genes. A system called ACGR (Approach for Candidate Gene Retrieval) has been implemented and tested with three case studies including a rare orphan gene disease. Availability: The ACGR sources are freely available at http://bioinfo.loria.fr/projects/acgr/acgr-software/. See especially the file ‘disease_description’ and the folders ‘Xcollect_scenarios’ and ‘ACGR_views’. Contact:devignes@loria.fr Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- S Yilmaz
- Laboratory for Human Genetics, Nancy Medical Faculty, Vandoeuvre-les-Nancy, France
| | | | | | | | | | | |
Collapse
|
329
|
Wu X, Liu Q, Jiang R. Align human interactome with phenome to identify causative genes and networks underlying disease families. ACTA ACUST UNITED AC 2008; 25:98-104. [PMID: 19010805 DOI: 10.1093/bioinformatics/btn593] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Understanding the complexity in gene-phenotype relationship is vital for revealing the genetic basis of common diseases. Recent studies on the basis of human interactome and phenome not only uncovers prevalent phenotypic overlap and genetic overlap between diseases, but also reveals a modular organization of the genetic landscape of human diseases, providing new opportunities to reduce the complexity in dissecting the gene-phenotype association. RESULTS We provide systematic and quantitative evidence that phenotypic overlap implies genetic overlap. With these results, we perform the first heterogeneous alignment of human interactome and phenome via a network alignment technique and identify 39 disease families with corresponding causative gene networks. Finally, we propose AlignPI, an alignment-based framework to predict disease genes, and identify plausible candidates for 70 diseases. Our method scales well to the whole genome, as demonstrated by prioritizing 6154 genes across 37 chromosome regions for Crohn's disease (CD). Results are consistent with a recent meta-analysis of genome-wide association studies for CD. AVAILABILITY Bi-modules and disease gene predictions are freely available at the URL http://bioinfo.au.tsinghua.edu.cn/alignpi/
Collapse
Affiliation(s)
- Xuebing Wu
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, China
| | | | | |
Collapse
|
330
|
Radivojac P, Peng K, Clark WT, Peters BJ, Mohan A, Boyle SM, Mooney SD. An integrated approach to inferring gene-disease associations in humans. Proteins 2008; 72:1030-7. [PMID: 18300252 DOI: 10.1002/prot.21989] [Citation(s) in RCA: 132] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
UNLABELLED One of the most important tasks of modern bioinformatics is the development of computational tools that can be used to understand and treat human disease. To date, a variety of methods have been explored and algorithms for candidate gene prioritization are gaining in their usefulness. Here, we propose an algorithm for detecting gene-disease associations based on the human protein-protein interaction network, known gene-disease associations, protein sequence, and protein functional information at the molecular level. Our method, PhenoPred, is supervised: first, we mapped each gene/protein onto the spaces of disease and functional terms based on distance to all annotated proteins in the protein interaction network. We also encoded sequence, function, physicochemical, and predicted structural properties, such as secondary structure and flexibility. We then trained support vector machines to detect gene-disease associations for a number of terms in Disease Ontology and provided evidence that, despite the noise/incompleteness of experimental data and unfinished ontology of diseases, identification of candidate genes can be successful even when a large number of candidate disease terms are predicted on simultaneously. AVAILABILITY www.phenopred.org.
Collapse
Affiliation(s)
- Predrag Radivojac
- School of Informatics, Indiana University, Bloomington, Indiana 47408, USA.
| | | | | | | | | | | | | |
Collapse
|
331
|
Pan W. Network-based model weighting to detect multiple loci influencing complex diseases. Hum Genet 2008; 124:225-34. [PMID: 18719944 DOI: 10.1007/s00439-008-0545-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2008] [Accepted: 08/12/2008] [Indexed: 01/20/2023]
Abstract
For genome-wide association studies, it has been increasingly recognized that the popular locus-by-locus search for DNA variants associated with disease susceptibility may not be effective, especially when there are interactions between or among multiple loci, for which a multi-loci search strategy may be more productive. However, even if computationally feasible, a genome-wide search over all possible multiple loci requires exploring a huge model space and making costly adjustment for multiple testing, leading to reduced statistical power. On the other hand, there are accumulating data suggesting that protein products of many disease-causing genes tend to interact with each other, or cluster in the same biological pathway. To incorporate this prior knowledge and existing data on gene networks, we propose a gene network-based method to improve statistical power over that of the exhaustive search by giving higher weights to models involving genes nearby in a network. We use simulated data under realistic scenarios, including a large-scale human protein-protein interaction network and 23 known ataxia-causing genes, to demonstrate potential gain by our proposed method when disease-genes are clustered in a network.
Collapse
Affiliation(s)
- Wei Pan
- Division of Biostatistics, MMC 303, School of Public Health, University of Minnesota, Minneapolis, MN 55455-0392, USA.
| |
Collapse
|
332
|
Fraser HB, Plotkin JB. Using protein complexes to predict phenotypic effects of gene mutation. Genome Biol 2008; 8:R252. [PMID: 18042286 PMCID: PMC2258176 DOI: 10.1186/gb-2007-8-11-r252] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2007] [Revised: 09/25/2007] [Accepted: 11/27/2007] [Indexed: 11/21/2022] Open
Abstract
The best predictor of a protein's knockout phenotype is shown to be the knockout phenotype of other proteins that are present in a protein complex with it. Background Predicting the phenotypic effects of mutations is a central goal of genetics research; it has important applications in elucidating how genotype determines phenotype and in identifying human disease genes. Results Using a wide range of functional genomic data from the yeast Saccharomyces cerevisiae, we show that the best predictor of a protein's knockout phenotype is the knockout phenotype of other proteins that are present in a protein complex with it. Even the addition of multiple datasets does not improve upon the predictions made from protein complex membership. Similarly, we find that a proxy for protein complexes is a powerful predictor of disease phenotypes in humans. Conclusion We propose that identifying human protein complexes containing known disease genes will be an efficient method for large-scale disease gene discovery, and that yeast may prove to be an informative model system for investigating, and even predicting, the genetic basis of both Mendelian and complex disease phenotypes.
Collapse
Affiliation(s)
- Hunter B Fraser
- Broad Institute of Harvard and MIT, 320 Charles St, Cambridge, Massachhusetts 02142, USA.
| | | |
Collapse
|
333
|
Jiang X, Liu B, Jiang J, Zhao H, Fan M, Zhang J, Fan Z, Jiang T. Modularity in the genetic disease-phenotype network. FEBS Lett 2008; 582:2549-54. [PMID: 18582463 DOI: 10.1016/j.febslet.2008.06.023] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2008] [Revised: 05/23/2008] [Accepted: 06/13/2008] [Indexed: 11/16/2022]
Abstract
Similar disease phenotypes are engendered as a result of the modular nature of gene networks; thus we hypothesized that all human genetic disease phenotypes appear in similar modular styles. Network representations of phenotypes make it possible to explore this hypothesis. We investigated the modularity of a network of genetic disease phenotypes. We computationally extracted phenotype modules and found that the modularity is well correlated with a physiological classification of human diseases. We also found correlations between the modularity and functional genomics as well as its connection to drug-target associations.
Collapse
Affiliation(s)
- Xingpeng Jiang
- National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, PR China
| | | | | | | | | | | | | | | |
Collapse
|
334
|
Abstract
During a decade of proof-of-principle analysis in model organisms, protein networks have been used to further the study of molecular evolution, to gain insight into the robustness of cells to perturbation, and for assignment of new protein functions. Following these analyses, and with the recent rise of protein interaction measurements in mammals, protein networks are increasingly serving as tools to unravel the molecular basis of disease. We review promising applications of protein networks to disease in four major areas: identifying new disease genes; the study of their network properties; identifying disease-related subnetworks; and network-based disease classification. Applications in infectious disease, personalized medicine, and pharmacology are also forthcoming as the available protein network information improves in quality and coverage.
Collapse
Affiliation(s)
- Trey Ideker
- Department of Bioengineering, University of California at San Diego, La Jolla, California 92093, USA
| | | |
Collapse
|
335
|
Network-based global inference of human disease genes. Mol Syst Biol 2008; 4:189. [PMID: 18463613 PMCID: PMC2424293 DOI: 10.1038/msb.2008.27] [Citation(s) in RCA: 428] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2007] [Accepted: 03/17/2008] [Indexed: 01/04/2023] Open
Abstract
Deciphering the genetic basis of human diseases is an important goal of biomedical research. On the basis of the assumption that phenotypically similar diseases are caused by functionally related genes, we propose a computational framework that integrates human protein–protein interactions, disease phenotype similarities, and known gene–phenotype associations to capture the complex relationships between phenotypes and genotypes. We develop a tool named CIPHER to predict and prioritize disease genes, and we show that the global concordance between the human protein network and the phenotype network reliably predicts disease genes. Our method is applicable to genetically uncharacterized phenotypes, effective in the genome-wide scan of disease genes, and also extendable to explore gene cooperativity in complex diseases. The predicted genetic landscape of over 1000 human phenotypes, which reveals the global modular organization of phenotype–genotype relationships. The genome-wide prioritization of candidate genes for over 5000 human phenotypes, including those with under-characterized disease loci or even those lacking known association, is publicly released to facilitate future discovery of disease genes.
Collapse
|
336
|
Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet 2008; 82:949-58. [PMID: 18371930 DOI: 10.1016/j.ajhg.2008.02.013] [Citation(s) in RCA: 781] [Impact Index Per Article: 48.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2007] [Revised: 01/18/2008] [Accepted: 02/19/2008] [Indexed: 11/21/2022] Open
Abstract
The identification of genes associated with hereditary disorders has contributed to improving medical care and to a better understanding of gene functions, interactions, and pathways. However, there are well over 1500 Mendelian disorders whose molecular basis remains unknown. At present, methods such as linkage analysis can identify the chromosomal region in which unknown disease genes are located, but the regions could contain up to hundreds of candidate genes. In this work, we present a method for prioritization of candidate genes by use of a global network distance measure, random walk analysis, for definition of similarities in protein-protein interaction networks. We tested our method on 110 disease-gene families with a total of 783 genes and achieved an area under the ROC curve of up to 98% on simulated linkage intervals of 100 genes surrounding the disease gene, significantly outperforming previous methods based on local distance measures. Our results not only provide an improved tool for positional-cloning projects but also add weight to the assumption that phenotypically similar diseases are associated with disturbances of subnetworks within the larger protein interactome that extend beyond the disease proteins themselves.
Collapse
|
337
|
Ala U, Piro RM, Grassi E, Damasco C, Silengo L, Oti M, Provero P, Di Cunto F. Prediction of human disease genes by human-mouse conserved coexpression analysis. PLoS Comput Biol 2008; 4:e1000043. [PMID: 18369433 PMCID: PMC2268251 DOI: 10.1371/journal.pcbi.1000043] [Citation(s) in RCA: 104] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2007] [Accepted: 02/20/2008] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Even in the post-genomic era, the identification of candidate genes within loci associated with human genetic diseases is a very demanding task, because the critical region may typically contain hundreds of positional candidates. Since genes implicated in similar phenotypes tend to share very similar expression profiles, high throughput gene expression data may represent a very important resource to identify the best candidates for sequencing. However, so far, gene coexpression has not been used very successfully to prioritize positional candidates. METHODOLOGY/PRINCIPAL FINDINGS We show that it is possible to reliably identify disease-relevant relationships among genes from massive microarray datasets by concentrating only on genes sharing similar expression profiles in both human and mouse. Moreover, we show systematically that the integration of human-mouse conserved coexpression with a phenotype similarity map allows the efficient identification of disease genes in large genomic regions. Finally, using this approach on 850 OMIM loci characterized by an unknown molecular basis, we propose high-probability candidates for 81 genetic diseases. CONCLUSION Our results demonstrate that conserved coexpression, even at the human-mouse phylogenetic distance, represents a very strong criterion to predict disease-relevant relationships among human genes.
Collapse
Affiliation(s)
- Ugo Ala
- Molecular Biotechnology Center, Department of Genetics, Biology and Biochemistry, University of Turin, Turin, Italy
| | - Rosario Michael Piro
- Molecular Biotechnology Center, Department of Genetics, Biology and Biochemistry, University of Turin, Turin, Italy
| | - Elena Grassi
- Molecular Biotechnology Center, Department of Genetics, Biology and Biochemistry, University of Turin, Turin, Italy
| | - Christian Damasco
- Molecular Biotechnology Center, Department of Genetics, Biology and Biochemistry, University of Turin, Turin, Italy
| | - Lorenzo Silengo
- Molecular Biotechnology Center, Department of Genetics, Biology and Biochemistry, University of Turin, Turin, Italy
| | - Martin Oti
- Department of Human Genetics and Centre for Molecular and Biomolecular Informatics, University Medical Centre Nijmegen, Nijmegen, The Netherlands
| | - Paolo Provero
- Molecular Biotechnology Center, Department of Genetics, Biology and Biochemistry, University of Turin, Turin, Italy
- * E-mail: (PP); (FDC)
| | - Ferdinando Di Cunto
- Molecular Biotechnology Center, Department of Genetics, Biology and Biochemistry, University of Turin, Turin, Italy
- * E-mail: (PP); (FDC)
| |
Collapse
|
338
|
McGary KL, Lee I, Marcotte EM. Broad network-based predictability of Saccharomyces cerevisiae gene loss-of-function phenotypes. Genome Biol 2008; 8:R258. [PMID: 18053250 PMCID: PMC2246260 DOI: 10.1186/gb-2007-8-12-r258] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2007] [Revised: 10/16/2007] [Accepted: 12/05/2007] [Indexed: 11/10/2022] Open
Abstract
Loss-of-function phenotypes of yeast genes can be predicted from the loss-of-function phenotypes of their neighbours in functional gene networks. This could potentially be applied to the prediction of human disease genes. We demonstrate that loss-of-function yeast phenotypes are predictable by guilt-by-association in functional gene networks. Testing 1,102 loss-of-function phenotypes from genome-wide assays of yeast reveals predictability of diverse phenotypes, spanning cellular morphology, growth, metabolism, and quantitative cell shape features. We apply the method to extend a genome-wide screen by predicting, then verifying, genes whose disruption elongates yeast cells, and to predict human disease genes. To facilitate network-guided screens, a web server is available .
Collapse
Affiliation(s)
- Kriston L McGary
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, 2500 Speedway, Austin, Texas 78712, USA.
| | | | | |
Collapse
|
339
|
Phenome connections. Trends Genet 2008; 24:103-6. [DOI: 10.1016/j.tig.2007.12.005] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2007] [Revised: 12/18/2007] [Accepted: 12/18/2007] [Indexed: 11/23/2022]
|
340
|
Bader S, Kühner S, Gavin AC. Interaction networks for systems biology. FEBS Lett 2008; 582:1220-4. [PMID: 18282471 DOI: 10.1016/j.febslet.2008.02.015] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2008] [Accepted: 02/08/2008] [Indexed: 01/01/2023]
Abstract
Cellular functions are almost always the result of the coordinated action of several proteins, interacting in protein complexes, pathways or networks. Progress made in devising suitable tools for analysis of protein-protein interactions, have recently made it possible to chart interaction networks on a large-scale. The aim of this review is to provide a short overview of the most promising contributions of interaction networks to human biology, structural biology and human genetics.
Collapse
Affiliation(s)
- Samuel Bader
- EMBL, Structural and Computational Biology Unit, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
| | | | | |
Collapse
|
341
|
Wong P, Frishman D. Designability and disease. Methods Mol Biol 2008; 484:491-504. [PMID: 18592197 DOI: 10.1007/978-1-59745-398-1_29] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Structural designability is the number of ways it is possible to encode for structure. A protein's designability has been equated with the size of sequence space encoding for the protein's structure, a measure that reflects the structure's robustness to mutation. Current evidence suggests that designability is fundamental to our understanding of the evolvability and distribution of structures in nature and is a significant factor associated with human disease. Here, we describe definitions and principles underlying the concept of designability and discuss its relation to disease.
Collapse
Affiliation(s)
- Philip Wong
- Institute for Bioinformatics, GSF-National Research Center for Environment and Health, Neuherberg, Germany
| | | |
Collapse
|
342
|
Camargo A, Azuaje F. Linking gene expression and functional network data in human heart failure. PLoS One 2007; 2:e1347. [PMID: 18094754 PMCID: PMC2147076 DOI: 10.1371/journal.pone.0001347] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2007] [Accepted: 11/26/2007] [Indexed: 01/26/2023] Open
Abstract
Background Gene expression profiling and the analysis of protein-protein interaction (PPI) networks may support the identification of disease bio-markers and potential drug targets. Thus, a step forward in the development of systems approaches to medicine is the integrative analysis of these data sources in specific pathological conditions. We report such an integrative bioinformatics analysis in human heart failure (HF). A global PPI network in HF was assembled, which by itself represents a useful compendium of the current status of human HF-relevant interactions. This provided the basis for the analysis of interaction connectivity patterns in relation to a HF gene expression data set. Results Relationships between the significance of the differentiation of gene expression and connectivity degrees in the PPI network were established. In addition, relationships between gene co-expression and PPI network connectivity were analysed. Highly-connected proteins are not necessarily encoded by genes significantly differentially expressed. Genes that are not significantly differentially expressed may encode proteins that exhibit diverse network connectivity patterns. Furthermore, genes that were not defined as significantly differentially expressed may encode proteins with many interacting partners. Genes encoding network hubs may exhibit weak co-expression with the genes encoding their interacting protein partners. We also found that hubs and superhubs display a significant diversity of co-expression patterns in comparison to peripheral nodes. Gene Ontology (GO) analysis established that highly-connected proteins are likely to be engaged in higher level GO biological process terms, while low-connectivity proteins tend to be engaged in more specific disease-related processes. Conclusion This investigation supports the hypothesis that the integrative analysis of differential gene expression and PPI network analysis may facilitate a better understanding of functional roles and the identification of potential drug targets in human heart failure.
Collapse
Affiliation(s)
- Anyela Camargo
- School of Computing and Mathematics, University of Ulster at Jordanstown, Newtownabbey, Northern Ireland, United Kingdom
| | - Francisco Azuaje
- School of Computing and Mathematics, University of Ulster at Jordanstown, Newtownabbey, Northern Ireland, United Kingdom
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
343
|
Kim HS, Yim SV, Jung KH, Zheng LT, Kim YH, Lee KH, Chung SY, Rha HK. Altered DNA copy number in patients with different seizure disorder type: by array-CGH. Brain Dev 2007; 29:639-43. [PMID: 17573221 DOI: 10.1016/j.braindev.2007.04.006] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/23/2006] [Revised: 03/30/2007] [Accepted: 04/23/2007] [Indexed: 10/23/2022]
Abstract
Epilepsy is one of the most common but genetically complex neurological disorders in children. Previous studies have showed that chromosomal abnormalities confer susceptibility to epilepsy. To identify new chromosomal abnormalities associated with epilepsy, DNA samples from patients with idiopathic generalized epilepsy (IGE), partial epilepsy (PE), and febrile seizures (FS) were analyzed using array comparative genome hybridization technique (array-CGH). Genomic aberrations were detected throughout whole chromosome. The most frequently altered loci were gains noted in: 1p (60%), 5p (55%), 8q (55%), 10q (55%), and losses in 7q (55%). The most frequent chromosomal aberrations for each seizure type were: IGE-1p (60%), 5p (55%), and 10q (55%), PE-11p (45%), 21q (45%) and FS-8q (55%), and losses in 7q (55%). To validate the array-CGH results, real time PCR was performed for several genes (EPM2AIP1, OSM, AFP, CYP19A1, SLC6A13, and COL6A2). The results from the real time PCR were consistent with those from the array-CGH. Therefore, we found that the three types of seizures disorder studied have different chromosomal aberrations. These results might be used for further investigation of the pathogenesis of epilepsy.
Collapse
Affiliation(s)
- Hye Sung Kim
- Catholic Neuroscience Center, The Catholic University of Korea, Seoul 137-701, Republic of Korea
| | | | | | | | | | | | | | | |
Collapse
|
344
|
Kiemer L, Cesareni G. Comparative interactomics: comparing apples and pears? Trends Biotechnol 2007; 25:448-54. [PMID: 17825444 DOI: 10.1016/j.tibtech.2007.08.002] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2007] [Revised: 06/21/2007] [Accepted: 08/22/2007] [Indexed: 11/23/2022]
Abstract
The study of the complex web of interactions that link biological molecules in a cell is the subject of interactomics--currently one of the fastest moving fields in molecular biology. The recent completion of high-throughput studies to investigate systematically all the possible interactions in a variety of model organisms has provided unique opportunities to compare interaction networks and ask questions about their conservation during evolution. It is expected that this approach will yield a scientific return as rich as that obtained in the past decade from comparing genomes and proteomes from different organisms.
Collapse
Affiliation(s)
- Lars Kiemer
- Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, Rome, Italy
| | | |
Collapse
|
345
|
Van Vooren S, Coessens B, De Moor B, Moreau Y, Vermeesch JR. Array comparative genomic hybridization and computational genome annotation in constitutional cytogenetics: suggesting candidate genes for novel submicroscopic chromosomal imbalance syndromes. Genet Med 2007; 9:642-9. [PMID: 17873653 DOI: 10.1097/gim.0b013e318145b27b] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Genome-wide array comparative genomic hybridization screening is uncovering pathogenic submicroscopic chromosomal imbalances in patients with developmental disorders. In those patients, imbalances appear now to be scattered across the whole genome, and most patients carry different chromosomal anomalies. Screening patients with developmental disorders can be considered a forward functional genome screen. The imbalances pinpoint the location of genes that are involved in human development. Because most imbalances encompass regions harboring multiple genes, the challenge is to (1) identify those genes responsible for the specific phenotype and (2) disentangle the role of the different genes located in an imbalanced region. In this review, we discuss novel tools and relevant databases that have recently been developed to aid this gene discovery process. Identification of the functional relevance of genes will not only deepen our understanding of human development but will, in addition, aid in the data interpretation and improve genetic counseling.
Collapse
Affiliation(s)
- Steven Van Vooren
- Department of Electrotechnical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium.
| | | | | | | | | |
Collapse
|
346
|
Perez-Iratxeta C, Bork P, Andrade-Navarro MA. Update of the G2D tool for prioritization of gene candidates to inherited diseases. Nucleic Acids Res 2007; 35:W212-6. [PMID: 17478516 PMCID: PMC1933178 DOI: 10.1093/nar/gkm223] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
G2D (genes to diseases) is a web resource for prioritizing genes as candidates for inherited diseases. It uses three algorithms based on different prioritization strategies. The input to the server is the genomic region where the user is looking for the disease-causing mutation, plus an additional piece of information depending on the algorithm used. This information can either be the disease phenotype (described as an online Mendelian inheritance in man (OMIM) identifier), one or several genes known or suspected to be associated with the disease (defined by their Entrez Gene identifiers), or a second genomic region that has been linked as well to the disease. In the latter case, the tool uses known or predicted interactions between genes in the two regions extracted from the STRING database. The output in every case is an ordered list of candidate genes in the region of interest. For the first two of the three methods, the candidate genes are first retrieved through sequence homology search, then scored accordingly to the corresponding method. This means that some of them will correspond to well-known characterized genes, and others will overlap with predicted genes, thus providing a wider analysis. G2D is publicly available at http://www.ogic.ca/projects/g2d_2/
Collapse
Affiliation(s)
- Carolina Perez-Iratxeta
- Ontario Genomics Innovation Centre, Ottawa Health Research Institute, 501 Smyth, Ottawa, ON, Canada K1H 8L6.
| | | | | |
Collapse
|
347
|
Lage K, Karlberg EO, Størling ZM, Olason PI, Pedersen AG, Rigina O, Hinsby AM, Tümer Z, Pociot F, Tommerup N, Moreau Y, Brunak S. A human phenome-interactome network of protein complexes implicated in genetic disorders. Nat Biotechnol 2007; 25:309-16. [PMID: 17344885 DOI: 10.1038/nbt1295] [Citation(s) in RCA: 747] [Impact Index Per Article: 43.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
We performed a systematic, large-scale analysis of human protein complexes comprising gene products implicated in many different categories of human disease to create a phenome-interactome network. This was done by integrating quality-controlled interactions of human proteins with a validated, computationally derived phenotype similarity score, permitting identification of previously unknown complexes likely to be associated with disease. Using a phenomic ranking of protein complexes linked to human disease, we developed a Bayesian predictor that in 298 of 669 linkage intervals correctly ranks the known disease-causing protein as the top candidate, and in 870 intervals with no identified disease-causing gene, provides novel candidates implicated in disorders such as retinitis pigmentosa, epithelial ovarian cancer, inflammatory bowel disease, amyotrophic lateral sclerosis, Alzheimer disease, type 2 diabetes and coronary heart disease. Our publicly available draft of protein complexes associated with pathology comprises 506 complexes, which reveal functional relationships between disease-promoting genes that will inform future experimentation.
Collapse
Affiliation(s)
- Kasper Lage
- Center for Biological Sequence Analysis, BioCentrum-DTU, Technical University of Denmark, Building 208, DK-2800 Lyngby, Denmark
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
348
|
Abstract
Evidence from many sources suggests that similar phenotypes are begotten by functionally related genes. This is most obvious in the case of genetically heterogeneous diseases such as Fanconi anemia, Bardet-Biedl or Usher syndrome, where the various genes work together in a single biological module. Such modules can be a multiprotein complex, a pathway, or a single cellular or subcellular organelle. This observation suggests a number of hypotheses about the human phenome that are now beginning to be explored. First, there is now good evidence from bioinformatic analyses that human genetic diseases can be clustered on the basis of their phenotypic similarities and that such a clustering represents true biological relationships of the genes involved. Second, one may use such phenotypic similarity to predict and then test for the contribution of apparently unrelated genes to the same functional module. This concept is now being systematically tested for several diseases. Most recently, a systematic yeast two-hybrid screen of all known genes for inherited ataxias indicated that they all form part of a single extended protein-protein interaction network. Third, one can use bioinformatics to make predictions about new genes for diseases that form part of the same phenotype cluster. This is done by starting from the known disease genes and then searching for genes that share one or more functional attributes such as gene expression pattern, coevolution, or gene ontology. Ultimately, one may expect that a modular view of disease genes should help the rapid identification of additional disease genes for multifactorial diseases once the first few contributing genes (or environmental factors) have been reliably identified.
Collapse
Affiliation(s)
- M Oti
- Centre for Molecular and Biomolecular Informatics, Nijmegen Centre for Molecular Life Sciences, Radboud University Nijmegen, The Netherlands
| | | |
Collapse
|
349
|
George RA, Liu JY, Feng LL, Bryson-Richardson RJ, Fatkin D, Wouters MA. Analysis of protein sequence and interaction data for candidate disease gene prediction. Nucleic Acids Res 2006; 34:e130. [PMID: 17020920 PMCID: PMC1636487 DOI: 10.1093/nar/gkl707] [Citation(s) in RCA: 120] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Linkage analysis is a successful procedure to associate diseases with specific genomic regions. These regions are often large, containing hundreds of genes, which make experimental methods employed to identify the disease gene arduous and expensive. We present two methods to prioritize candidates for further experimental study: Common Pathway Scanning (CPS) and Common Module Profiling (CMP). CPS is based on the assumption that common phenotypes are associated with dysfunction in proteins that participate in the same complex or pathway. CPS applies network data derived from protein–protein interaction (PPI) and pathway databases to identify relationships between genes. CMP identifies likely candidates using a domain-dependent sequence similarity approach, based on the hypothesis that disruption of genes of similar function will lead to the same phenotype. Both algorithms use two forms of input data: known disease genes or multiple disease loci. When using known disease genes as input, our combined methods have a sensitivity of 0.52 and a specificity of 0.97 and reduce the candidate list by 13-fold. Using multiple loci, our methods successfully identify disease genes for all benchmark diseases with a sensitivity of 0.84 and a specificity of 0.63. Our combined approach prioritizes good candidates and will accelerate the disease gene discovery process.
Collapse
Affiliation(s)
- Richard A. George
- Computational Biology & Bioinformatics ProgramSydney, NSW, Australia
| | - Jason Y. Liu
- Computational Biology & Bioinformatics ProgramSydney, NSW, Australia
| | - Lina L. Feng
- Computational Biology & Bioinformatics ProgramSydney, NSW, Australia
| | | | - Diane Fatkin
- Sr. Bernice Research Program in Inherited Heart Diseases, Victor Chang Cardiac Research InstituteSydney, NSW, Australia
- School of Biotechnology & Biomolecular SciencesSydney, NSW, Australia
- School of Medicine, University of New South WalesSydney, NSW, Australia
- Cardiology Department, St. Vincent's HospitalSydney, NSW, Australia
| | - Merridee A. Wouters
- Computational Biology & Bioinformatics ProgramSydney, NSW, Australia
- School of Biotechnology & Biomolecular SciencesSydney, NSW, Australia
- School of Medicine, University of New South WalesSydney, NSW, Australia
- To whom correspondence should be addressed. Tel: +61 2 92958508; Fax: +61 2 9295 8501;
| |
Collapse
|
350
|
Coevolution, modularity and human disease. Curr Opin Genet Dev 2006; 16:637-44. [PMID: 17005391 DOI: 10.1016/j.gde.2006.09.001] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2006] [Accepted: 09/15/2006] [Indexed: 01/08/2023]
Abstract
The concepts of coevolution and modularity have been studied separately for decades. Recent advances in genomics have led to the first systematic studies in each of these fields at the molecular level, resulting in several important discoveries. Both coevolution and modularity appear to be pervasive features of genomic data from all species studied to date, and their presence can be detected in many types of datasets, including genome sequences, gene expression data, and protein-protein interaction data. Moreover, the combination of these two ideas might have implications for our understanding of many aspects of biology, ranging from the general architecture of living systems to the causes of various human diseases.
Collapse
|