401
|
Abstract
Summary: Many existing databases annotate experimentally characterized single nucleotide polymorphisms (SNPs). Each non-synonymous SNP (nsSNP) changes one amino acid in the gene product (single amino acid substitution;SAAS). This change can either affect protein function or be neutral in that respect. Most polymorphisms lack experimental annotation of their functional impact. Here, we introduce SNPdbe—SNP database of effects, with predictions of computationally annotated functional impacts of SNPs. Database entries represent nsSNPs in dbSNP and 1000 Genomes collection, as well as variants from UniProt and PMD. SAASs come from >2600 organisms; ‘human’ being the most prevalent. The impact of each SAAS on protein function is predicted using the SNAP and SIFT algorithms and augmented with experimentally derived function/structure information and disease associations from PMD, OMIM and UniProt. SNPdbe is consistently updated and easily augmented with new sources of information. The database is available as an MySQL dump and via a web front end that allows searches with any combination of organism names, sequences and mutation IDs. Availability:http://www.rostlab.org/services/snpdbe Contact:schaefer@rostlab.org; snpdbe@rostlab.org
Collapse
Affiliation(s)
- Christian Schaefer
- Technische Universitaet Muenchen, Bioinformatics - I12, Informatik, Boltzmannstrasse 3, Muenchen, Germany.
| | | | | | | |
Collapse
|
402
|
Mills MB, Hudgins L, Balise RR, Abramson DH, Kleinerman RA. Mutation risk associated with paternal and maternal age in a cohort of retinoblastoma survivors. Hum Genet 2011; 131:1115-22. [PMID: 22203219 DOI: 10.1007/s00439-011-1126-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2011] [Accepted: 12/11/2011] [Indexed: 11/25/2022]
Abstract
Autosomal dominant conditions are known to be associated with advanced paternal age, and it has been suggested that retinoblastoma (Rb) also exhibits a paternal age effect due to the paternal origin of most new germline RB1 mutations. To further our understanding of the association of parental age and risk of de novo germline RB1 mutations, we evaluated the effect of parental age in a cohort of Rb survivors in the United States. A cohort of 262 Rb patients was retrospectively identified at one institution, and telephone interviews were conducted with parents of 160 survivors (65.3%). We classified Rb survivors into three groups: those with unilateral Rb were classified as sporadic if they had no or unknown family history of Rb, those with bilateral Rb were classified as having a de novo germline mutation if they had no or unknown family history of Rb, and those with unilateral or bilateral Rb, who had a family history of Rb, were classified as familial. We built two sets of nested logistic regression models to detect an increased odds of the de novo germline mutation classification related to older parental age compared to sporadic and familial Rb classifications. The modeling strategy evaluated effects of continuous increasing maternal and paternal age and 5-year age increases adjusted for the age of the other parent. Mean maternal ages for survivors classified as having de novo germline mutations and sporadic Rb were similar (28.3 and 28.5, respectively) as were mean paternal ages (31.9 and 31.2, respectively), and all were significantly higher than the weighted general US population means. In contrast, maternal and paternal ages for familial Rb did not differ significantly from the weighted US general population means. Although we noted no significant differences between mean maternal and paternal ages between each of the three Rb classification groups, we found increased odds of a survivor being in the de novo germline mutation group for each 5-year increase in paternal age, but these findings were not statistically significant (de novo vs. sporadic ORs 30-34 = 1.7 [0.7-4], ≥ 35 = 1.3 [0.5-3.3]; de novo vs. familial ORs 30-34 = 2.8 [1.0-8.4], ≥ 35 = 1.6 [0.6-4.6]). Our study suggests a weak paternal age effect for Rb resulting from de novo germline mutations consistent with the paternal origin of most of these mutations.
Collapse
Affiliation(s)
- Melissa B Mills
- Department of Genetics, Stanford University School of Medicine/Lucile Packard Children's Hospital, 300 Pasteur Drive, Boswell Building A097, Stanford, CA 94304, USA.
| | | | | | | | | |
Collapse
|
403
|
Han Y, Lee H, Park JC, Yi GS. E3Net: a system for exploring E3-mediated regulatory networks of cellular functions. Mol Cell Proteomics 2011; 11:O111.014076. [PMID: 22199232 DOI: 10.1074/mcp.o111.014076] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Ubiquitin-protein ligase (E3) is a key enzyme targeting specific substrates in diverse cellular processes for ubiquitination and degradation. The existing findings of substrate specificity of E3 are, however, scattered over a number of resources, making it difficult to study them together with an integrative view. Here we present E3Net, a web-based system that provides a comprehensive collection of available E3-substrate specificities and a systematic framework for the analysis of E3-mediated regulatory networks of diverse cellular functions. Currently, E3Net contains 2201 E3s and 4896 substrates in 427 organisms and 1671 E3-substrate specific relations between 493 E3s and 1277 substrates in 42 organisms, extracted mainly from MEDLINE abstracts and UniProt comments with an automatic text mining method and additional manual inspection and partly from high throughput experiment data and public ubiquitination databases. The significant functions and pathways of the extracted E3-specific substrate groups were identified from a functional enrichment analysis with 12 functional category resources for molecular functions, protein families, protein complexes, pathways, cellular processes, cellular localization, and diseases. E3Net includes interactive analysis and navigation tools that make it possible to build an integrative view of E3-substrate networks and their correlated functions with graphical illustrations and summarized descriptions. As a result, E3Net provides a comprehensive resource of E3s, substrates, and their functional implications summarized from the regulatory network structures of E3-specific substrate groups and their correlated functions. This resource will facilitate further in-depth investigation of ubiquitination-dependent regulatory mechanisms. E3Net is freely available online at http://pnet.kaist.ac.kr/e3net.
Collapse
Affiliation(s)
- Youngwoong Han
- Department of Information and Communications Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701, Republic of Korea
| | | | | | | |
Collapse
|
404
|
Ramírez F, Lawyer G, Albrecht M. Novel search method for the discovery of functional relationships. ACTA ACUST UNITED AC 2011; 28:269-76. [PMID: 22180409 PMCID: PMC3259435 DOI: 10.1093/bioinformatics/btr631] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Motivation: Numerous annotations are available that functionally characterize genes and proteins with regard to molecular process, cellular localization, tissue expression, protein domain composition, protein interaction, disease association and other properties. Searching this steadily growing amount of information can lead to the discovery of new biological relationships between genes and proteins. To facilitate the searches, methods are required that measure the annotation similarity of genes and proteins. However, most current similarity methods are focused only on annotations from the Gene Ontology (GO) and do not take other annotation sources into account. Results: We introduce the new method BioSim that incorporates multiple sources of annotations to quantify the functional similarity of genes and proteins. We compared the performance of our method with four other well-known methods adapted to use multiple annotation sources. We evaluated the methods by searching for known functional relationships using annotations based only on GO or on our large data warehouse BioMyn. This warehouse integrates many diverse annotation sources of human genes and proteins. We observed that the search performance improved substantially for almost all methods when multiple annotation sources were included. In particular, our method outperformed the other methods in terms of recall and average precision. Contact:mario.albrecht@mpi-inf.mpg.de Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fidel Ramírez
- Max Planck Institute for Informatics, Campus E1.4, 66123 Saarbrücken, Germany
| | | | | |
Collapse
|
405
|
Jiang R, Gan M, He P. Constructing a gene semantic similarity network for the inference of disease genes. BMC SYSTEMS BIOLOGY 2011; 5 Suppl 2:S2. [PMID: 22784573 PMCID: PMC3287482 DOI: 10.1186/1752-0509-5-s2-s2] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Motivation The inference of genes that are truly associated with inherited human diseases from a set of candidates resulting from genetic linkage studies has been one of the most challenging tasks in human genetics. Although several computational approaches have been proposed to prioritize candidate genes relying on protein-protein interaction (PPI) networks, these methods can usually cover less than half of known human genes. Results We propose to rely on the biological process domain of the gene ontology to construct a gene semantic similarity network and then use the network to infer disease genes. We show that the constructed network covers about 50% more genes than a typical PPI network. By analyzing the gene semantic similarity network with the PPI network, we show that gene pairs tend to have higher semantic similarity scores if the corresponding proteins are closer to each other in the PPI network. By analyzing the gene semantic similarity network with a phenotype similarity network, we show that semantic similarity scores of genes associated with similar diseases are significantly different from those of genes selected at random, and that genes with higher semantic similarity scores tend to be associated with diseases with higher phenotype similarity scores. We further use the gene semantic similarity network with a random walk with restart model to infer disease genes. Through a series of large-scale leave-one-out cross-validation experiments, we show that the gene semantic similarity network can achieve not only higher coverage but also higher accuracy than the PPI network in the inference of disease genes. Contact ruijiang@tsinghua.edu.cn
Collapse
Affiliation(s)
- Rui Jiang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, China.
| | | | | |
Collapse
|
406
|
Park YK, Bang OS, Cha MH, Kim J, Cole JW, Lee D, Kim YJ. SigCS base: an integrated genetic information resource for human cerebral stroke. BMC SYSTEMS BIOLOGY 2011; 5 Suppl 2:S10. [PMID: 22784567 PMCID: PMC3287476 DOI: 10.1186/1752-0509-5-s2-s10] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Background To understand how stroke risk factors mechanistically contribute to stroke, the genetic components regulating each risk factor need to be integrated and evaluated with respect to biological function and through pathway-based algorithms. This resource will provide information to researchers studying the molecular and genetic causes of stroke in terms of genomic variants, genes, and pathways. Methods Reported genetic variants, gene structure, phenotypes, and literature information regarding stroke were collected and extracted from publicly available databases describing variants, genome, proteome, functional annotation, and disease subtypes. Stroke related candidate pathways and etiologic genes that participate significantly in risk were analyzed in terms of canonical pathways in public biological pathway databases. These efforts resulted in a relational database of genetic signals of cerebral stroke, SigCS base, which implements an effective web retrieval system. Results The current version of SigCS base documents 1943 non-redundant genes with 11472 genetic variants and 165 non-redundant pathways. The web retrieval system of SigCS base consists of two principal search flows, including: 1) a gene-based variant search using gene table browsing or a keyword search, and, 2) a pathway-based variant search using pathway table browsing. SigCS base is freely accessible at http://sysbio.kribb.re.kr/sigcs. Conclusions SigCS base is an effective tool that can assist researchers in the identification of the genetic factors associated with stroke by utilizing existing literature information, selecting candidate genes and variants for experimental studies, and examining the pathways that contribute to the pathophysiological mechanisms of stroke.
Collapse
Affiliation(s)
- Young-Kyu Park
- Medical Genome Research Center, KRIBB, Daejeon 305-806, Korea
| | | | | | | | | | | | | |
Collapse
|
407
|
Krzywinski M, Birol I, Jones SJM, Marra MA. Hive plots--rational approach to visualizing networks. Brief Bioinform 2011; 13:627-44. [PMID: 22155641 DOI: 10.1093/bib/bbr069] [Citation(s) in RCA: 164] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Networks are typically visualized with force-based or spectral layouts. These algorithms lack reproducibility and perceptual uniformity because they do not use a node coordinate system. The layouts can be difficult to interpret and are unsuitable for assessing differences in networks. To address these issues, we introduce hive plots (http://www.hiveplot.com) for generating informative, quantitative and comparable network layouts. Hive plots depict network structure transparently, are simple to understand and can be easily tuned to identify patterns of interest. The method is computationally straightforward, scales well and is amenable to a plugin for existing tools.
Collapse
|
408
|
Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Federhen S, Feolo M, Fingerman IM, Geer LY, Helmberg W, Kapustin Y, Krasnov S, Landsman D, Lipman DJ, Lu Z, Madden TL, Madej T, Maglott DR, Marchler-Bauer A, Miller V, Karsch-Mizrachi I, Ostell J, Panchenko A, Phan L, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Shumway M, Sirotkin K, Slotta D, Souvorov A, Starchenko G, Tatusova TA, Wagner L, Wang Y, Wilbur WJ, Yaschenko E, Ye J. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2011; 40:D13-25. [PMID: 22140104 PMCID: PMC3245031 DOI: 10.1093/nar/gkr1184] [Citation(s) in RCA: 415] [Impact Index Per Article: 29.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Collapse
Affiliation(s)
- Eric W Sayers
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
409
|
Khare SP, Habib F, Sharma R, Gadewal N, Gupta S, Galande S. HIstome--a relational knowledgebase of human histone proteins and histone modifying enzymes. Nucleic Acids Res 2011; 40:D337-42. [PMID: 22140112 PMCID: PMC3245077 DOI: 10.1093/nar/gkr1125] [Citation(s) in RCA: 102] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
Histones are abundant nuclear proteins that are essential for the packaging of eukaryotic DNA into chromosomes. Different histone variants, in combination with their modification ‘code’, control regulation of gene expression in diverse cellular processes. Several enzymes that catalyze the addition and removal of multiple histone modifications have been discovered in the past decade, enabling investigations of their role(s) in normal cellular processes and diverse pathological conditions. This sudden influx of data, however, has resulted in need of an updated knowledgebase that compiles, organizes and presents curated scientific information to the user in an easily accessible format. Here, we present HIstome, a browsable, manually curated, relational database that provides information about human histone proteins, their sites of modifications, variants and modifying enzymes. HIstome is a knowledgebase of 55 human histone proteins, 106 distinct sites of their post-translational modifications (PTMs) and 152 histone-modifying enzymes. Entries have been grouped into 5 types of histones, 8 types of post-translational modifications and 14 types of enzymes that catalyze addition and removal of these modifications. The resource will be useful for epigeneticists, pharmacologists and clinicians. HIstome: The Histone Infobase is available online at http://www.iiserpune.ac.in/∼coee/histome/ and http://www.actrec.gov.in/histome/.
Collapse
Affiliation(s)
- Satyajeet P Khare
- Cancer Research Institute, Advanced Centre for Treatment, Research and Education in Cancer, Kharghar, Navi Mumbai 410210, India
| | | | | | | | | | | |
Collapse
|
410
|
Kwofie SK, Schaefer U, Sundararajan VS, Bajic VB, Christoffels A. HCVpro: Hepatitis C virus protein interaction database. INFECTION GENETICS AND EVOLUTION 2011; 11:1971-7. [PMID: 21930248 DOI: 10.1016/j.meegid.2011.09.001] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2011] [Revised: 08/24/2011] [Accepted: 09/02/2011] [Indexed: 02/07/2023]
|
411
|
Huynh T, Khan JM, Ranganathan S. A comparative structural bioinformatics analysis of inherited mutations in β-D-Mannosidase across multiple species reveals a genotype-phenotype correlation. BMC Genomics 2011; 12 Suppl 3:S22. [PMID: 22369051 PMCID: PMC3333182 DOI: 10.1186/1471-2164-12-s3-s22] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Background Lysosomal β-D-mannosidase is a glycosyl hydrolase that breaks down the glycosidic bonds at the non-reducing end of N-linked glycoproteins. Hence, it is a crucial enzyme in polysaccharide degradation pathway. Mutations in the MANBA gene that codes for lysosomal β-mannosidase, result in improper coding and malfunctioning of protein, leading to β-mannosidosis. Studying the location of mutations on the enzyme structure is a rational approach in order to understand the functional consequences of these mutations. Accordingly, the pathology and clinical manifestations of the disease could be correlated to the genotypic modifications. Results The wild-type and inherited mutations of β-mannosidase were studied across four different species, human, cow, goat and mouse employing a previously demonstrated comprehensive homology modeling and mutational mapping technique, which reveals a correlation between the variation of genotype and the severity of phenotype in β-mannosidosis. X-ray crystallographic structure of β-mannosidase from Bacteroides thetaiotaomicron was used as template for 3D structural modeling of the wild-type enzymes containing all the associated ligands. These wild-type models subsequently served as templates for building mutational structures. Truncations account for approximately 70% of the mutational cases. In general, the proximity of mutations to the active site determines the severity of phenotypic expressions. Mapping mutations to the MANBA gene sequence has identified five mutational hot-spots. Conclusion Although restrained by a limited dataset, our comprehensive study suggests a genotype-phenotype correlation in β-mannosidosis. A predictive approach for detecting likely β-mannosidosis is also demonstrated where we have extrapolated observed mutations from one species to homologous positions in other organisms based on the proximity of the mutations to the enzyme active site and their co-location from different organisms. Apart from aiding the detection of mutational hotspots in the gene, where novel mutations could be disease-implicated, this approach also provides a way to predict new disease mutations. Higher expression of the exoglycosidase chitobiase is said to play a vital role in determining disease phenotypes in human and mouse. A bigger dataset of inherited mutations as well as a parallel study of β-mannosidase and chitobiase activities in prospective patients would be interesting to better understand the underlying reasons for β-mannosidosis.
Collapse
Affiliation(s)
- Thi Huynh
- Department of Chemistry and Biomolecular Sciences and ARC center of excellence in Bioinformatics, Macquarie University, NSW 2109, Australia
| | | | | |
Collapse
|
412
|
Yook K, Harris TW, Bieri T, Cabunoc A, Chan J, Chen WJ, Davis P, de la Cruz N, Duong A, Fang R, Ganesan U, Grove C, Howe K, Kadam S, Kishore R, Lee R, Li Y, Muller HM, Nakamura C, Nash B, Ozersky P, Paulini M, Raciti D, Rangarajan A, Schindelman G, Shi X, Schwarz EM, Ann Tuli M, Van Auken K, Wang D, Wang X, Williams G, Hodgkin J, Berriman M, Durbin R, Kersey P, Spieth J, Stein L, Sternberg PW. WormBase 2012: more genomes, more data, new website. Nucleic Acids Res 2011; 40:D735-41. [PMID: 22067452 PMCID: PMC3245152 DOI: 10.1093/nar/gkr954] [Citation(s) in RCA: 164] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Since its release in 2000, WormBase (http://www.wormbase.org) has grown from a small resource focusing on a single species and serving a dedicated research community, to one now spanning 15 species essential to the broader biomedical and agricultural research fields. To enhance the rate of curation, we have automated the identification of key data in the scientific literature and use similar methodology for data extraction. To ease access to the data, we are collaborating with journals to link entities in research publications to their report pages at WormBase. To facilitate discovery, we have added new views of the data, integrated large-scale datasets and expanded descriptions of models for human disease. Finally, we have introduced a dramatic overhaul of the WormBase website for public beta testing. Designed to balance complexity and usability, the new site is species-agnostic, highly customizable, and interactive. Casual users and developers alike will be able to leverage the public RESTful application programming interface (API) to generate custom data mining solutions and extensions to the site. We report on the growth of our database and on our work in keeping pace with the growing demand for data, efforts to anticipate the requirements of users and new collaborations with the larger science community.
Collapse
Affiliation(s)
- Karen Yook
- Division of Biology 156-29, California Institute of Technology, Pasadena, CA 91125, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
413
|
Karolchik D, Hinrichs AS, Kent WJ. The UCSC Genome Browser. CURRENT PROTOCOLS IN HUMAN GENETICS 2011; Chapter 18:18.6.1-18.6.33. [PMID: 21975940 PMCID: PMC3222792 DOI: 10.1002/0471142905.hg1806s71] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
The University of California Santa Cruz (UCSC) Genome Browser is a popular Web-based tool for quickly displaying a requested portion of a genome at any scale, accompanied by a series of aligned annotation "tracks." The annotations generated by the UCSC Genome Bioinformatics Group and external collaborators include gene predictions, mRNA and expressed sequence tag alignments, simple nucleotide polymorphisms, expression and regulatory data, phenotype and variation data, and pairwise and multiple-species comparative genomics data. All information relevant to a region is presented in one window, facilitating biological analysis and interpretation. The database tables underlying the Genome Browser tracks can be viewed, downloaded, and manipulated using another Web-based application, the UCSC Table Browser. Users can upload personal datasets in a wide variety of formats as custom annotation tracks in both browsers for research or educational purposes. This unit describes how to use the Genome Browser and Table Browser for genome analysis, download the underlying database tables, and create and display custom annotation tracks.
Collapse
Affiliation(s)
- Donna Karolchik
- Center for Biomolecular Science and Engineering, University of California Santa Cruz
| | - Angie S. Hinrichs
- Center for Biomolecular Science and Engineering, University of California Santa Cruz
| | - W. James Kent
- Center for Biomolecular Science and Engineering, University of California Santa Cruz
| |
Collapse
|
414
|
Oakley DJ, Iyer V, Skarnes WC, Smedley D. BioMart as an integration solution for the International Knockout Mouse Consortium. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2011; 2011:bar028. [PMID: 21930503 PMCID: PMC3263594 DOI: 10.1093/database/bar028] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
In this article, we describe the use of the BioMart data management system to provide integrated access to International Knockout Mouse Consortium (IKMC) data and other related mouse resources. The IKMC is currently mutating all mouse protein-coding genes in embryonic stem (ES) cells using gene targeting and gene trapping approaches. The BioMart portal allows researchers to identify and obtain IKMC knockout vectors, ES cells and mice for genes of interest. Gene annotation, expression, phenotype and disease data is also integrated from external BioMarts, allowing selection of IKMC products by a wide variety of criteria. These products are invaluable for researchers involved in the elucidation of gene function and the role of individual genes in human disease. Here, we describe these datasets in more detail and illustrate the functionality of the portal using several examples. Database URL: http://www.knockoutmouse.org/mart
Collapse
Affiliation(s)
- Darren J Oakley
- The Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1HH
| | | | | | | |
Collapse
|
415
|
Path to facilitate the prediction of functional amino acid substitutions in red blood cell disorders--a computational approach. PLoS One 2011; 6:e24607. [PMID: 21931771 PMCID: PMC3172254 DOI: 10.1371/journal.pone.0024607] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2011] [Accepted: 08/14/2011] [Indexed: 02/06/2023] Open
Abstract
Background A major area of effort in current genomics is to distinguish mutations that are functionally neutral from those that contribute to disease. Single Nucleotide Polymorphisms (SNPs) are amino acid substitutions that currently account for approximately half of the known gene lesions responsible for human inherited diseases. As a result, the prediction of non-synonymous SNPs (nsSNPs) that affect protein functions and relate to disease is an important task. Principal Findings In this study, we performed a comprehensive analysis of deleterious SNPs at both functional and structural level in the respective genes associated with red blood cell metabolism disorders using bioinformatics tools. We analyzed the variants in Glucose-6-phosphate dehydrogenase (G6PD) and isoforms of Pyruvate Kinase (PKLR & PKM2) genes responsible for major red blood cell disorders. Deleterious nsSNPs were categorized based on empirical rule and support vector machine based methods to predict the impact on protein functions. Furthermore, we modeled mutant proteins and compared them with the native protein for evaluation of protein structure stability. Significance We argue here that bioinformatics tools can play an important role in addressing the complexity of the underlying genetic basis of Red Blood Cell disorders. Based on our investigation, we report here the potential candidate SNPs, for future studies in human Red Blood Cell disorders. Current study also demonstrates the presence of other deleterious mutations and also endorses with in vivo experimental studies. Our approach will present the application of computational tools in understanding functional variation from the perspective of structure, expression, evolution and phenotype.
Collapse
|
416
|
Hwang PI, Wu HB, Wang CD, Lin BL, Chen CT, Yuan S, Wu G, Li KC. Tissue-specific gene expression templates for accurate molecular characterization of the normal physiological states of multiple human tissues with implication in development and cancer studies. BMC Genomics 2011; 12:439. [PMID: 21880155 PMCID: PMC3178546 DOI: 10.1186/1471-2164-12-439] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2011] [Accepted: 09/01/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND To elucidate the molecular complications in many complex diseases, we argue for the priority to construct a model representing the normal physiological state of a cell/tissue. RESULTS By analyzing three independent microarray datasets on normal human tissues, we established a quantitative molecular model GET, which consists of 24 tissue-specific Gene Expression Templates constructed from a set of 56 genes, for predicting 24 distinct tissue types under disease-free condition. 99.2% correctness was reached when a large-scale validation was performed on 61 new datasets to test the tissue-prediction power of GET. Network analysis based on molecular interactions suggests a potential role of these 56 genes in tissue differentiation and carcinogenesis.Applying GET to transcriptomic datasets produced from tissue development studies the results correlated well with developmental stages. Cancerous tissues and cell lines yielded significantly lower correlation with GET than the normal tissues. GET distinguished melanoma from normal skin tissue or benign skin tumor with 96% sensitivity and 89% specificity. CONCLUSIONS These results strongly suggest that a normal tissue or cell may uphold its normal functioning and morphology by maintaining specific chemical stoichiometry among genes. The state of stoichiometry can be depicted by a compact set of representative genes such as the 56 genes obtained here. A significant deviation from normal stoichiometry may result in malfunction or abnormal growth of the cells.
Collapse
Affiliation(s)
- Pei-Ing Hwang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan 115, Republic of China
| | | | | | | | | | | | | | | |
Collapse
|
417
|
Hu Y, Flockhart I, Vinayagam A, Bergwitz C, Berger B, Perrimon N, Mohr SE. An integrative approach to ortholog prediction for disease-focused and other functional studies. BMC Bioinformatics 2011; 12:357. [PMID: 21880147 PMCID: PMC3179972 DOI: 10.1186/1471-2105-12-357] [Citation(s) in RCA: 528] [Impact Index Per Article: 37.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2011] [Accepted: 08/31/2011] [Indexed: 12/12/2022] Open
Abstract
Background Mapping of orthologous genes among species serves an important role in functional genomics by allowing researchers to develop hypotheses about gene function in one species based on what is known about the functions of orthologs in other species. Several tools for predicting orthologous gene relationships are available. However, these tools can give different results and identification of predicted orthologs is not always straightforward. Results We report a simple but effective tool, the Drosophila RNAi Screening Center Integrative Ortholog Prediction Tool (DIOPT; http://www.flyrnai.org/diopt), for rapid identification of orthologs. DIOPT integrates existing approaches, facilitating rapid identification of orthologs among human, mouse, zebrafish, C. elegans, Drosophila, and S. cerevisiae. As compared to individual tools, DIOPT shows increased sensitivity with only a modest decrease in specificity. Moreover, the flexibility built into the DIOPT graphical user interface allows researchers with different goals to appropriately 'cast a wide net' or limit results to highest confidence predictions. DIOPT also displays protein and domain alignments, including percent amino acid identity, for predicted ortholog pairs. This helps users identify the most appropriate matches among multiple possible orthologs. To facilitate using model organisms for functional analysis of human disease-associated genes, we used DIOPT to predict high-confidence orthologs of disease genes in Online Mendelian Inheritance in Man (OMIM) and genes in genome-wide association study (GWAS) data sets. The results are accessible through the DIOPT diseases and traits query tool (DIOPT-DIST; http://www.flyrnai.org/diopt-dist). Conclusions DIOPT and DIOPT-DIST are useful resources for researchers working with model organisms, especially those who are interested in exploiting model organisms such as Drosophila to study the functions of human disease genes.
Collapse
Affiliation(s)
- Yanhui Hu
- Drosophila RNAi Screening Center, Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | | | | | | | | | | | | |
Collapse
|
418
|
Flatscher-Bader T, Foldi CJ, Chong S, Whitelaw E, Moser RJ, Burne THJ, Eyles DW, McGrath JJ. Increased de novo copy number variants in the offspring of older males. Transl Psychiatry 2011; 1:e34. [PMID: 22832608 PMCID: PMC3309504 DOI: 10.1038/tp.2011.30] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/23/2011] [Accepted: 07/08/2011] [Indexed: 01/26/2023] Open
Abstract
The offspring of older fathers have an increased risk of neurodevelopmental disorders, such as schizophrenia and autism. In light of the evidence implicating copy number variants (CNVs) with schizophrenia and autism, we used a mouse model to explore the hypothesis that the offspring of older males have an increased risk of de novo CNVs. C57BL/6J sires that were 3- and 12-16-months old were mated with 3-month-old dams to create control offspring and offspring of old sires, respectively. Applying genome-wide microarray screening technology, 7 distinct CNVs were identified in a set of 12 offspring and their parents. Competitive quantitative PCR confirmed these CNVs in the original set and also established their frequency in an independent set of 77 offspring and their parents. On the basis of the combined samples, six de novo CNVs were detected in the offspring of older sires, whereas none were detected in the control group. Two of the CNVs were associated with behavioral and/or neuroanatomical phenotypic features. One of the de novo CNVs involved Auts2 (autism susceptibility candidate 2), and other CNVs included genes linked to schizophrenia, autism and brain development. This is the first experimental demonstration that the offspring of older males have an increased risk of de novo CNVs. Our results support the hypothesis that the offspring of older fathers have an increased risk of neurodevelopmental disorders such as schizophrenia and autism by generation of de novo CNVs in the male germline.
Collapse
Affiliation(s)
- T Flatscher-Bader
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD, Australia
- The Queensland Institute of Medical Research, Herston, QLD, Australia
| | - C J Foldi
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD, Australia
| | - S Chong
- The Queensland Institute of Medical Research, Herston, QLD, Australia
| | - E Whitelaw
- The Queensland Institute of Medical Research, Herston, QLD, Australia
| | | | - T H J Burne
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD, Australia
- Queensland Centre for Mental Health Research, The Park Centre for Mental Health, Wacol, QLD, Australia
| | - D W Eyles
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD, Australia
- Queensland Centre for Mental Health Research, The Park Centre for Mental Health, Wacol, QLD, Australia
| | - J J McGrath
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD, Australia
- Queensland Centre for Mental Health Research, The Park Centre for Mental Health, Wacol, QLD, Australia
- Discipline of Psychiatry, The University of Queensland, St Lucia, QLD, Australia
| |
Collapse
|
419
|
Harding SD, Armit C, Armstrong J, Brennan J, Cheng Y, Haggarty B, Houghton D, Lloyd-MacGilp S, Pi X, Roochun Y, Sharghi M, Tindal C, McMahon AP, Gottesman B, Little MH, Georgas K, Aronow BJ, Potter SS, Brunskill EW, Southard-Smith EM, Mendelsohn C, Baldock RA, Davies JA, Davidson D. The GUDMAP database--an online resource for genitourinary research. Development 2011; 138:2845-53. [PMID: 21652655 PMCID: PMC3188593 DOI: 10.1242/dev.063594] [Citation(s) in RCA: 183] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
The GenitoUrinary Development Molecular Anatomy Project (GUDMAP) is an international consortium working to generate gene expression data and transgenic mice. GUDMAP includes data from large-scale in situ hybridisation screens (wholemount and section) and microarray gene expression data of microdissected, laser-captured and FACS-sorted components of the developing mouse genitourinary (GU) system. These expression data are annotated using a high-resolution anatomy ontology specific to the developing murine GU system. GUDMAP data are freely accessible at www.gudmap.org via easy-to-use interfaces. This curated, high-resolution dataset serves as a powerful resource for biologists, clinicians and bioinformaticians interested in the developing urogenital system. This paper gives examples of how the data have been used to address problems in developmental biology and provides a primer for those wishing to use the database in their own research.
Collapse
Affiliation(s)
- Simon D Harding
- MRC Human Genetics Unit, Western General Hospital, Edinburgh EH4 2XU, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
420
|
Piro RM, Molineris I, Ala U, Di Cunto F. Evaluation of candidate genes from orphan FEB and GEFS+ loci by analysis of human brain gene expression atlases. PLoS One 2011; 6:e23149. [PMID: 21858011 PMCID: PMC3157479 DOI: 10.1371/journal.pone.0023149] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2011] [Accepted: 07/07/2011] [Indexed: 12/19/2022] Open
Abstract
Febrile seizures, or febrile convulsions (FEB), represent the most common form of childhood seizures and are believed to be influenced by variations in several susceptibility genes. Most of the associated loci, however, remain ‘orphan’, i.e. the susceptibility genes they contain still remain to be identified. Further orphan loci have been mapped for a related disorder, genetic (generalized) epilepsy with febrile seizures plus (GEFS+). We show that both spatially mapped and ‘traditional’ gene expression data from the human brain can be successfully employed to predict the most promising candidate genes for FEB and GEFS+, apply our prediction method to the remaining orphan loci and discuss the validity of the predictions. For several of the orphan FEB/GEFS+ loci we propose excellent, and not always obvious, candidates for mutation screening in order to aid in gaining a better understanding of the genetic origin of the susceptibility to seizures.
Collapse
Affiliation(s)
- Rosario M Piro
- Molecular Biotechnology Center and Department of Genetics, Biology and Biochemistry, University of Torino, Torino, Italy.
| | | | | | | |
Collapse
|
421
|
Katayama T, Wilkinson MD, Vos R, Kawashima T, Kawashima S, Nakao M, Yamamoto Y, Chun HW, Yamaguchi A, Kawano S, Aerts J, Aoki-Kinoshita KF, Arakawa K, Aranda B, Bonnal RJ, Fernández JM, Fujisawa T, Gordon PM, Goto N, Haider S, Harris T, Hatakeyama T, Ho I, Itoh M, Kasprzyk A, Kido N, Kim YJ, Kinjo AR, Konishi F, Kovarskaya Y, von Kuster G, Labarga A, Limviphuvadh V, McCarthy L, Nakamura Y, Nam Y, Nishida K, Nishimura K, Nishizawa T, Ogishima S, Oinn T, Okamoto S, Okuda S, Ono K, Oshita K, Park KJ, Putnam N, Senger M, Severin J, Shigemoto Y, Sugawara H, Taylor J, Trelles O, Yamasaki C, Yamashita R, Satoh N, Takagi T. The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications. J Biomed Semantics 2011; 2:4. [PMID: 21806842 PMCID: PMC3170566 DOI: 10.1186/2041-1480-2-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2011] [Accepted: 08/02/2011] [Indexed: 01/19/2023] Open
Abstract
Background The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009. Results Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system for analysis of the transcription factor binding sites (TFBSs) enriched based on differential gene expression data obtained from a microarray experiment; iii) a workflow to enumerate putative physical protein interactions among enzymes in a metabolic pathway using protein structure data; iv) a workflow to analyze glyco-gene-related diseases by searching for human homologs of glyco-genes in other species, such as fruit flies, and retrieving their phenotype-annotated SNPs. Conclusions Beyond deriving prototype solutions for each use-case, a second major purpose of the BioHackathon was to highlight areas of insufficiency. We discuss the issues raised by our exploration of the problem/solution space, concluding that there are still problems with the way Web services are modeled and annotated, including: i) the absence of several useful data or analysis functions in the Web service "space"; ii) the lack of documentation of methods; iii) lack of compliance with the SOAP/WSDL specification among and between various programming-language libraries; and iv) incompatibility between various bioinformatics data formats. Although it was still difficult to solve real world problems posed to the developers by the biological researchers in attendance because of these problems, we note the promise of addressing these issues within a semantic framework.
Collapse
Affiliation(s)
- Toshiaki Katayama
- Database Center for Life Science, Research Organization of Information and Systems, 2-11-16 Yayoi, Bunkyo-ku, Tokyo, 113-0032, Japan.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
422
|
Alvarez-Ponce D, McInerney JO. The human genome retains relics of its prokaryotic ancestry: human genes of archaebacterial and eubacterial origin exhibit remarkable differences. Genome Biol Evol 2011; 3:782-90. [PMID: 21795752 PMCID: PMC3163467 DOI: 10.1093/gbe/evr073] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Eukaryotes are generally thought to stem from a fusion event involving an archaebacterium and a eubacterium. As a result of this event, contemporaneous eukaryotic genomes are chimeras of genes inherited from both endosymbiotic partners. These two coexisting gene repertoires have been shown to differ in a number of ways in yeast. Here we combine genomic and functional data in order to determine if and how human genes that have been inherited from both prokaryotic ancestors remain distinguishable. We show that, despite being fewer in number, human genes of archaebacterial origin are more highly and broadly expressed across tissues, are more likely to have lethal mouse orthologs, tend to be involved in informational processes, are more selectively constrained, and encode shorter and more central proteins in the protein–protein interaction network than eubacterium-like genes. Furthermore, consistent with endosymbiotic theory, we show that proteins tend to interact with those encoded by genes of the same ancestry. Most interestingly from a human health perspective, archaebacterial genes are less likely to be involved in heritable human disease. Taken together, these results show that more than 2 billion years after eukaryogenesis, the human genome retains at least two somewhat distinct communities of genes.
Collapse
Affiliation(s)
- David Alvarez-Ponce
- Department of Biology, National University of Ireland Maynooth, Maynooth, Co. Kildare, Ireland
| | | |
Collapse
|
423
|
Samuels ME. Saturation of the human phenome. Curr Genomics 2011; 11:482-99. [PMID: 21532833 PMCID: PMC3048311 DOI: 10.2174/138920210793175886] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2010] [Revised: 06/22/2010] [Accepted: 06/22/2010] [Indexed: 12/26/2022] Open
Abstract
The phenome is the complete set of phenotypes resulting from genetic variation in populations of an organism. Saturation of a phenome implies the identification and phenotypic description of mutations in all genes in an organism, potentially constrained to those encoding proteins. The human genome is believed to contain 20-25,000 protein coding genes, but only a small fraction of these have documented mutant phenotypes, thus the human phenome is far from complete. In model organisms, genetic saturation entails the identification of multiple mutant alleles of a gene or locus, allowing a consistent description of mutational phenotypes for that gene. Saturation of several model organisms has been attempted, usually by targeting annotated coding genes with insertional transposons (Drosophila melanogaster, Mus musculus) or by sequence directed deletion (Saccharomyces cerevisiae) or using libraries of antisense oligonucleotide probes injected directly into animals (Caenorhabditis elegans, Danio rerio). This paper reviews the general state of the human phenome, and discusses theoretical and practical considerations toward a saturation analysis in humans. Throughout, emphasis is placed on high penetrance genetic variation, of the kind typically asociated with monogenic versus complex traits.
Collapse
Affiliation(s)
- Mark E Samuels
- Centre de Recherche de Ste-Justine, 3175, Côte Ste-Catherine, Montréal QC H3T 1C5, Canada
| |
Collapse
|
424
|
Bordbar A, Jamshidi N, Palsson BO. iAB-RBC-283: A proteomically derived knowledge-base of erythrocyte metabolism that can be used to simulate its physiological and patho-physiological states. BMC SYSTEMS BIOLOGY 2011; 5:110. [PMID: 21749716 PMCID: PMC3158119 DOI: 10.1186/1752-0509-5-110] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2011] [Accepted: 07/12/2011] [Indexed: 02/06/2023]
Abstract
BACKGROUND The development of high-throughput technologies capable of whole cell measurements of genes, proteins, and metabolites has led to the emergence of systems biology. Integrated analysis of the resulting omic data sets has proved to be hard to achieve. Metabolic network reconstructions enable complex relationships amongst molecular components to be represented formally in a biologically relevant manner while respecting physical constraints. In silico models derived from such reconstructions can then be queried or interrogated through mathematical simulations. Proteomic profiling studies of the mature human erythrocyte have shown more proteins present related to metabolic function than previously thought; however the significance and the causal consequences of these findings have not been explored. RESULTS Erythrocyte proteomic data was used to reconstruct the most expansive description of erythrocyte metabolism to date, following extensive manual curation, assessment of the literature, and functional testing. The reconstruction contains 281 enzymes representing functions from glycolysis to cofactor and amino acid metabolism. Such a comprehensive view of erythrocyte metabolism implicates the erythrocyte as a potential biomarker for different diseases as well as a 'cell-based' drug-screening tool. The analysis shows that 94 erythrocyte enzymes are implicated in morbid single nucleotide polymorphisms, representing 142 pathologies. In addition, over 230 FDA-approved and experimental pharmaceuticals have enzymatic targets in the erythrocyte. CONCLUSION The advancement of proteomic technologies and increased generation of high-throughput proteomic data have created the need for a means to analyze these data in a coherent manner. Network reconstructions provide a systematic means to integrate and analyze proteomic data in a biologically meaning manner. Analysis of the red cell proteome has revealed an unexpected level of complexity in the functional capabilities of human erythrocyte metabolism.
Collapse
Affiliation(s)
- Aarash Bordbar
- Department of Bioengineering, University of California San Diego, La Jolla, 92093-0412, USA
| | | | | |
Collapse
|
425
|
Liekens AML, De Knijf J, Daelemans W, Goethals B, De Rijk P, Del-Favero J. BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation. Genome Biol 2011; 12:R57. [PMID: 21696594 PMCID: PMC3218845 DOI: 10.1186/gb-2011-12-6-r57] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2011] [Revised: 03/24/2011] [Accepted: 06/22/2011] [Indexed: 01/09/2023] Open
Abstract
We present BioGraph, a data integration and data mining platform for the exploration and discovery of biomedical information. The platform offers prioritizations of putative disease genes, supported by functional hypotheses. We show that BioGraph can retrospectively confirm recently discovered disease genes and identify potential susceptibility genes, outperforming existing technologies, without requiring prior domain knowledge. Additionally, BioGraph allows for generic biomedical applications beyond gene discovery. BioGraph is accessible at http://www.biograph.be.
Collapse
Affiliation(s)
- Anthony M L Liekens
- Applied Molecular Genomics group, VIB Department of Molecular Genetics, Universiteit Antwerpen, Universiteitsplein 1, Wilrijk, Belgium.
| | | | | | | | | | | |
Collapse
|
426
|
Liekens AML, De Knijf J, Daelemans W, Goethals B, De Rijk P, Del-Favero J. BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation. Genome Biol 2011; 1:2. [PMID: 19348689 PMCID: PMC2651587 DOI: 10.1186/gm2] [Citation(s) in RCA: 234] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
High-throughput technologies for DNA sequencing and for analyses of transcriptomes, proteomes and metabolomes have provided the foundations for deciphering the structure, variation and function of the human genome and relating them to health and disease states. The increased efficiency of DNA sequencing opens up the possibility of analyzing a large number of individual genomes and transcriptomes, and complete reference proteomes and metabolomes are within reach using powerful analytical techniques based on chromatography, mass spectrometry and nuclear magnetic resonance. Computational and mathematical tools have enabled the development of systems approaches for deciphering the functional and regulatory networks underlying the behavior of complex biological systems. Further conceptual and methodological developments of these tools are needed for the integration of various data types across the multiple levels of organization and time frames that are characteristic of human development, physiology and disease. Medical genomics has attempted to overcome the initial limitations of genome-wide association studies and has identified a limited number of susceptibility loci for many complex and common diseases. Iterative systems approaches are starting to provide deeper insights into the mechanisms of human diseases, and to facilitate the development of better diagnostic and prognostic biomarkers for cancer and many other diseases. Systems approaches will transform the way drugs are developed through academy-industry partnerships that will target multiple components of networks and pathways perturbed in diseases. They will enable medicine to become predictive, personalized, preventive and participatory, and, in the process, concepts and methods from Western and oriental cultures can be combined. We recommend that systems medicine should be developed through an international network of systems biology and medicine centers dedicated to inter-disciplinary training and education, to help reduce the gap in healthcare between developed and developing countries.
Collapse
Affiliation(s)
- Anthony M L Liekens
- Applied Molecular Genomics group, VIB Department of Molecular Genetics, Universiteit Antwerpen, Universiteitsplein 1, Wilrijk, Belgium.
| | | | | | | | | | | |
Collapse
|
427
|
Abstract
Despite the common assumption that orthologs usually share the same function, there have been various reports of divergence between orthologs, even among species as close as mammals. The comparison of mouse and human is of special interest, because mouse is often used as a model organism to understand human biology. We review the literature on evidence for divergence between human and mouse orthologous genes, and discuss it in the context of biomedical research.
Collapse
Affiliation(s)
- Walid H Gharib
- Department of Ecology and Evolution, Biophore, Swiss Institute of Bioinformatics, Lausanne University, CH-1015 Lausanne, Switzerland
| | | |
Collapse
|
428
|
Vidal M, Cusick ME, Barabási AL. Interactome networks and human disease. Cell 2011; 144:986-98. [PMID: 21414488 DOI: 10.1016/j.cell.2011.02.016] [Citation(s) in RCA: 1165] [Impact Index Per Article: 83.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2010] [Revised: 02/07/2011] [Accepted: 02/09/2011] [Indexed: 02/06/2023]
Abstract
Complex biological systems and cellular networks may underlie most genotype to phenotype relationships. Here, we review basic concepts in network biology, discussing different types of interactome networks and the insights that can come from analyzing them. We elaborate on why interactome networks are important to consider in biology, how they can be mapped and integrated with each other, what global properties are starting to emerge from interactome network models, and how these properties may relate to human disease.
Collapse
Affiliation(s)
- Marc Vidal
- Center for Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA.
| | | | | |
Collapse
|
429
|
Stewart A, Gaikwad S, Hart P, Kyzar E, Roth A, Kalueff AV. Experimental models for anxiolytic drug discovery in the era of omes and omics. Expert Opin Drug Discov 2011; 6:755-69. [PMID: 22650981 DOI: 10.1517/17460441.2011.586028] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
INTRODUCTION Animal behavioral models have become an indispensable tool for studying anxiety disorders and testing anxiety-modulating drugs. However, significant methodological and conceptual challenges affect the translational validity and accurate behavioral dissection in such models. They are also often limited to individual behavioral domains and fail to target the disorder's real clinical picture (its spectrum or overlap with other disorders), which hinder screening and development of novel anxiolytic drugs. AREAS COVERED In this article, the authors discuss and emphasize the importance of high-throughput multi-domain neurophenotyping based on the latest developments in video-tracking and bioinformatics. Additionally, the authors also explain how bioinformatics can provide new insight into the neural substrates of brain disorders and its benefit for drug discovery. EXPERT OPINION The throughput and utility of animal models of anxiety and other brain disorders can be markedly increased by a number of ways: i) analyzing systems of several domains and their interplay in a wider spectrum of model species; ii) using a larger number of end points generated by video-tracking tools; iii) correlating behavioral data with genomic, proteomic and other physiologically relevant markers using online databases and iv) creating molecular network-based models of anxiety to identify new targets for drug design and discovery. Experimental models utilizing bioinformatics tools and online databases will not only improve our understanding of both gene-behavior interactions and complex trait interconnectivity but also highlight new targets for novel anxiolytic drugs.
Collapse
Affiliation(s)
- Adam Stewart
- Tulane University Medical School, Department of Pharmacology and Neuroscience Program , Tulane Neurophenotyping Platform, SL-83, 1430 Tulane Ave, New Orleans, LA 70112 , USA +1 504 988 3354 ;
| | | | | | | | | | | |
Collapse
|
430
|
Tabarés-Seisdedos R, Dumont N, Baudot A, Valderas JM, Climent J, Valencia A, Crespo-Facorro B, Vieta E, Gómez-Beneyto M, Martínez S, Rubenstein JL. No paradox, no progress: inverse cancer comorbidity in people with other complex diseases. Lancet Oncol 2011; 12:604-8. [DOI: 10.1016/s1470-2045(11)70041-9] [Citation(s) in RCA: 91] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
431
|
Kim JR, Kim J, Kwon YK, Lee HY, Heslop-Harrison P, Cho KH. Reduction of complex signaling networks to a representative kernel. Sci Signal 2011; 4:ra35. [PMID: 21632468 DOI: 10.1126/scisignal.2001390] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
The network of biomolecular interactions that occurs within cells is large and complex. When such a network is analyzed, it can be helpful to reduce the complexity of the network to a "kernel" that maintains the essential regulatory functions for the output under consideration. We developed an algorithm to identify such a kernel and showed that the resultant kernel preserves the network dynamics. Using an integrated network of all of the human signaling pathways retrieved from the KEGG (Kyoto Encyclopedia of Genes and Genomes) database, we identified this network's kernel and compared the properties of the kernel to those of the original network. We found that the percentage of essential genes to the genes encoding nodes outside of the kernel was about 10%, whereas ~32% of the genes encoding nodes within the kernel were essential. In addition, we found that 95% of the kernel nodes corresponded to Mendelian disease genes and that 93% of synthetic lethal pairs associated with the network were contained in the kernel. Genes corresponding to nodes in the kernel had low evolutionary rates, were ubiquitously expressed in various tissues, and were well conserved between species. Furthermore, kernel genes included many drug targets, suggesting that other kernel nodes may be potential drug targets. Owing to the simplification of the entire network, the efficient modeling of a large-scale signaling network and an understanding of the core structure within a complex framework become possible.
Collapse
Affiliation(s)
- Jeong-Rae Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701, Republic of Korea
| | | | | | | | | | | |
Collapse
|
432
|
Ainali C, Simon M, Freilich S, Espinosa O, Hazelwood L, Tsoka S, Ouzounis CA, Hancock JM. Protein coalitions in a core mammalian biochemical network linked by rapidly evolving proteins. BMC Evol Biol 2011; 11:142. [PMID: 21612628 PMCID: PMC3112093 DOI: 10.1186/1471-2148-11-142] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2010] [Accepted: 05/25/2011] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Cellular ATP levels are generated by glucose-stimulated mitochondrial metabolism and determine metabolic responses, such as glucose-stimulated insulin secretion (GSIS) from the β-cells of pancreatic islets. We describe an analysis of the evolutionary processes affecting the core enzymes involved in glucose-stimulated insulin secretion in mammals. The proteins involved in this system belong to ancient enzymatic pathways: glycolysis, the TCA cycle and oxidative phosphorylation. RESULTS We identify two sets of proteins, or protein coalitions, in this group of 77 enzymes with distinct evolutionary patterns. Members of the glycolysis, TCA cycle, metabolite transport, pyruvate and NADH shuttles have low rates of protein sequence evolution, as inferred from a human-mouse comparison, and relatively high rates of evolutionary gene duplication. Respiratory chain and glutathione pathway proteins evolve faster, exhibiting lower rates of gene duplication. A small number of proteins in the system evolve significantly faster than co-pathway members and may serve as rapidly evolving adapters, linking groups of co-evolving genes. CONCLUSIONS Our results provide insights into the evolution of the involved proteins. We find evidence for two coalitions of proteins and the role of co-adaptation in protein evolution is identified and could be used in future research within a functional context.
Collapse
Affiliation(s)
- Chrysanthi Ainali
- Centre for Bioinformatics, Department of Informatics, School of Natural and Mathematical Sciences, King's College London, Strand, UK
| | | | | | | | | | | | | | | |
Collapse
|
433
|
Fernald GH, Capriotti E, Daneshjou R, Karczewski KJ, Altman RB. Bioinformatics challenges for personalized medicine. ACTA ACUST UNITED AC 2011; 27:1741-8. [PMID: 21596790 PMCID: PMC3117361 DOI: 10.1093/bioinformatics/btr295] [Citation(s) in RCA: 134] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
MOTIVATION Widespread availability of low-cost, full genome sequencing will introduce new challenges for bioinformatics. RESULTS This review outlines recent developments in sequencing technologies and genome analysis methods for application in personalized medicine. New methods are needed in four areas to realize the potential of personalized medicine: (i) processing large-scale robust genomic data; (ii) interpreting the functional effect and the impact of genomic variation; (iii) integrating systems data to relate complex genetic interactions with phenotypes; and (iv) translating these discoveries into medical practice. CONTACT russ.altman@stanford.edu
Collapse
Affiliation(s)
- Guy Haskin Fernald
- Biomedical Informatics Training Program, Stanford University School of Medicine, Department of Bioengineering, Stanford University, Stanford, CA, USA
| | | | | | | | | |
Collapse
|
434
|
Lees JG, Heriche JK, Morilla I, Ranea JA, Orengo CA. Systematic computational prediction of protein interaction networks. Phys Biol 2011; 8:035008. [PMID: 21572181 DOI: 10.1088/1478-3975/8/3/035008] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Determining the network of physical protein associations is an important first step in developing mechanistic evidence for elucidating biological pathways. Despite rapid advances in the field of high throughput experiments to determine protein interactions, the majority of associations remain unknown. Here we describe computational methods for significantly expanding protein association networks. We describe methods for integrating multiple independent sources of evidence to obtain higher quality predictions and we compare the major publicly available resources available for experimentalists to use.
Collapse
Affiliation(s)
- J G Lees
- Research Department of Structural & Molecular Biology, University College London, London, UK.
| | | | | | | | | |
Collapse
|
435
|
Abstract
Locus-specific databases are the most useful repositories of the sequence information underlying medical genetic conditions and, for this reason, they need our continued support.
Collapse
Affiliation(s)
- Mark E Samuels
- Ste-Justine Hospital Research Center and Department of Medicine, University of Montreal, 3175 Cote Ste-Catherine, Montreal, Quebec, Canada
| | | |
Collapse
|
436
|
Korcsmáros T, Szalay MS, Rovó P, Palotai R, Fazekas D, Lenti K, Farkas IJ, Csermely P, Vellai T. Signalogs: orthology-based identification of novel signaling pathway components in three metazoans. PLoS One 2011; 6:e19240. [PMID: 21559328 PMCID: PMC3086880 DOI: 10.1371/journal.pone.0019240] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2010] [Accepted: 03/29/2011] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Uncovering novel components of signal transduction pathways and their interactions within species is a central task in current biological research. Orthology alignment and functional genomics approaches allow the effective identification of signaling proteins by cross-species data integration. Recently, functional annotation of orthologs was transferred across organisms to predict novel roles for proteins. Despite the wide use of these methods, annotation of complete signaling pathways has not yet been transferred systematically between species. PRINCIPAL FINDINGS Here we introduce the concept of 'signalog' to describe potential novel signaling function of a protein on the basis of the known signaling role(s) of its ortholog(s). To identify signalogs on genomic scale, we systematically transferred signaling pathway annotations among three animal species, the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster, and humans. Using orthology data from InParanoid and signaling pathway information from the SignaLink database, we predict 88 worm, 92 fly, and 73 human novel signaling components. Furthermore, we developed an on-line tool and an interactive orthology network viewer to allow users to predict and visualize components of orthologous pathways. We verified the novelty of the predicted signalogs by literature search and comparison to known pathway annotations. In C. elegans, 6 out of the predicted novel Notch pathway members were validated experimentally. Our approach predicts signaling roles for 19 human orthodisease proteins and 5 known drug targets, and suggests 14 novel drug target candidates. CONCLUSIONS Orthology-based pathway membership prediction between species enables the identification of novel signaling pathway components that we referred to as signalogs. Signalogs can be used to build a comprehensive signaling network in a given species. Such networks may increase the biomedical utilization of C. elegans and D. melanogaster. In humans, signalogs may identify novel drug targets and new signaling mechanisms for approved drugs.
Collapse
Affiliation(s)
- Tamás Korcsmáros
- Department of Genetics, Eötvös Loránd University, Budapest, Hungary
| | | | | | | | | | | | | | | | | |
Collapse
|
437
|
Overgaard M, Mogensen J. A framework for the study of multiple realizations: the importance of levels of analysis. Front Physiol 2011. [PMID: 21772823 PMCID: PMC3222887 DOI: 10.3389/fphys.2011.00079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
The brain may undergo functional reorganizations. Selective loss of sensory input or training within a restricted part of a modality cause "shifts" within for instance somatotopic or tonotopic maps. Cross-modal plasticity occurs when input within a modality is absent - e.g., in the congenitally blind. Reorganizations are also found in functional recovery after brain injury. Focusing on such reorganizations, it may be studied whether a cognitive or conscious process can exclusively be mediated by one neural substrate - or may be associated with multiple neural representations. This is typically known as the problem of multiple realization - an essentially empirical issue with wide theoretical implications. This issue may appear to have a simple solution. When, for instance, the symptoms associated with brain injury disappear and the recovery is associated with increased activities within spared regions of the brain, it is tempting to conclude that the processes originally associated with the injured part of the brain are now mediated by an alternative neural substrate. Such a conclusion is, however, not a simple matter. Without a more thorough analysis, it cannot be concluded that a functional recovery of for instance language or attention is necessarily associated with a novel representation of the processes lost to injury. Alternatively, for instance, the recovery may reflect that apparently similar surface phenomena are obtained via dissimilar cognitive mechanisms. In this paper we propose a theoretical framework, which we believe can guide the design and interpretations of studies of post-traumatic recovery. It is essential to distinguish between a number of levels of analysis - including a differentiation between the surface phenomena and the underlying information processing - when addressing, for instance, whether a pre-traumatic and post-traumatically recovered cognitive or conscious process are actually the same. We propose a (somewhat preliminary) system of levels of analysis, which can be applied to such studies.
Collapse
Affiliation(s)
- Morten Overgaard
- CNRU, Department of Psychology and Communication, Aalborg University Aalborg, Denmark
| | | |
Collapse
|
438
|
Winslow RL. A framework for the study of multiple realizations: the importance of levels of analysis. Front Physiol 2011; 2:79. [PMID: 21772823 PMCID: PMC3222887 DOI: 10.3389/fpsyg.2011.00079] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2010] [Accepted: 04/13/2011] [Indexed: 12/12/2022] Open
Abstract
The brain may undergo functional reorganizations. Selective loss of sensory input or training within a restricted part of a modality cause "shifts" within for instance somatotopic or tonotopic maps. Cross-modal plasticity occurs when input within a modality is absent - e.g., in the congenitally blind. Reorganizations are also found in functional recovery after brain injury. Focusing on such reorganizations, it may be studied whether a cognitive or conscious process can exclusively be mediated by one neural substrate - or may be associated with multiple neural representations. This is typically known as the problem of multiple realization - an essentially empirical issue with wide theoretical implications. This issue may appear to have a simple solution. When, for instance, the symptoms associated with brain injury disappear and the recovery is associated with increased activities within spared regions of the brain, it is tempting to conclude that the processes originally associated with the injured part of the brain are now mediated by an alternative neural substrate. Such a conclusion is, however, not a simple matter. Without a more thorough analysis, it cannot be concluded that a functional recovery of for instance language or attention is necessarily associated with a novel representation of the processes lost to injury. Alternatively, for instance, the recovery may reflect that apparently similar surface phenomena are obtained via dissimilar cognitive mechanisms. In this paper we propose a theoretical framework, which we believe can guide the design and interpretations of studies of post-traumatic recovery. It is essential to distinguish between a number of levels of analysis - including a differentiation between the surface phenomena and the underlying information processing - when addressing, for instance, whether a pre-traumatic and post-traumatically recovered cognitive or conscious process are actually the same. We propose a (somewhat preliminary) system of levels of analysis, which can be applied to such studies.
Collapse
Affiliation(s)
- Raimond L. Winslow
- Department of Biomedical Engineering, Institute for Computational Medicine, The Johns Hopkins University School of Medicine and Whiting School of EngineeringBaltimore, MD, USA
| |
Collapse
|
439
|
HMMerThread: detecting remote, functional conserved domains in entire genomes by combining relaxed sequence-database searches with fold recognition. PLoS One 2011; 6:e17568. [PMID: 21423752 PMCID: PMC3053371 DOI: 10.1371/journal.pone.0017568] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2010] [Accepted: 02/03/2011] [Indexed: 12/14/2022] Open
Abstract
Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs) are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10), a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in proteins based on weak sequence similarity. Our predictions open up new avenues for biological and medical studies. Genome-wide HMMerThread domains are available at http://vm1-hmmerthread.age.mpg.de.
Collapse
|
440
|
Loscalzo J, Barabasi AL. Systems biology and the future of medicine. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2011; 3:619-27. [PMID: 21928407 DOI: 10.1002/wsbm.144] [Citation(s) in RCA: 176] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Contemporary views of human disease are based on simple correlation between clinical syndromes and pathological analysis dating from the late 19th century. Although this approach to disease diagnosis, prognosis, and treatment has served the medical establishment and society well for many years, it has serious shortcomings for the modern era of the genomic medicine that stem from its reliance on reductionist principles of experimentation and analysis. Quantitative, holistic systems biology applied to human disease offers a unique approach for diagnosing established disease, defining disease predilection, and developing individualized (personalized) treatment strategies that can take full advantage of modern molecular pathobiology and the comprehensive data sets that are rapidly becoming available for populations and individuals. In this way, systems pathobiology offers the promise of redefining our approach to disease and the field of medicine.
Collapse
Affiliation(s)
- Joseph Loscalzo
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
| | | |
Collapse
|
441
|
Gillis J, Pavlidis P. The impact of multifunctional genes on "guilt by association" analysis. PLoS One 2011; 6:e17258. [PMID: 21364756 PMCID: PMC3041792 DOI: 10.1371/journal.pone.0017258] [Citation(s) in RCA: 136] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2010] [Accepted: 01/27/2011] [Indexed: 02/02/2023] Open
Abstract
Many previous studies have shown that by using variants of "guilt-by-association", gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the "associations" in the data (e.g., protein interaction partners) of a gene are necessary in establishing "guilt". In this paper we show that multifunctionality, rather than association, is a primary driver of gene function prediction. We first show that knowledge of the degree of multifunctionality alone can produce astonishingly strong performance when used as a predictor of gene function. We then demonstrate how multifunctionality is encoded in gene interaction data (such as protein interactions and coexpression networks) and how this can feed forward into gene function prediction algorithms. We find that high-quality gene function predictions can be made using data that possesses no information on which gene interacts with which. By examining a wide range of networks from mouse, human and yeast, as well as multiple prediction methods and evaluation metrics, we provide evidence that this problem is pervasive and does not reflect the failings of any particular algorithm or data type. We propose computational controls that can be used to provide more meaningful control when estimating gene function prediction performance. We suggest that this source of bias due to multifunctionality is important to control for, with widespread implications for the interpretation of genomics studies.
Collapse
Affiliation(s)
- Jesse Gillis
- Centre for High-Throughput Biology, Department of Psychiatry, University of British Columbia, Vancouver, British Columbia, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Paul Pavlidis
- Centre for High-Throughput Biology, Department of Psychiatry, University of British Columbia, Vancouver, British Columbia, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
442
|
Revisiting Mendelian disorders through exome sequencing. Hum Genet 2011; 129:351-70. [PMID: 21331778 DOI: 10.1007/s00439-011-0964-2] [Citation(s) in RCA: 147] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2010] [Accepted: 02/03/2011] [Indexed: 12/25/2022]
Abstract
Over the past several years, more focus has been placed on dissecting the genetic basis of complex diseases and traits through genome-wide association studies. In contrast, Mendelian disorders have received little attention mainly due to the lack of newer and more powerful methods to study these disorders. Linkage studies have previously been the main tool to elucidate the genetics of Mendelian disorders; however, extremely rare disorders or sporadic cases caused by de novo variants are not amendable to this study design. Exome sequencing has now become technically feasible and more cost-effective due to the recent advances in high-throughput sequence capture methods and next-generation sequencing technologies which have offered new opportunities for Mendelian disorder research. Exome sequencing has been swiftly applied to the discovery of new causal variants and candidate genes for a number of Mendelian disorders such as Kabuki syndrome, Miller syndrome and Fowler syndrome. In addition, de novo variants were also identified for sporadic cases, which would have not been possible without exome sequencing. Although exome sequencing has been proven to be a promising approach to study Mendelian disorders, several shortcomings of this method must be noted, such as the inability to capture regulatory or evolutionary conserved sequences in non-coding regions and the incomplete capturing of all exons.
Collapse
|
443
|
Rho K, Kim B, Jang Y, Lee S, Bae T, Seo J, Seo C, Lee J, Kang H, Yu U, Kim S, Lee S, Kim WK. GARNET--gene set analysis with exploration of annotation relations. BMC Bioinformatics 2011; 12 Suppl 1:S25. [PMID: 21342555 PMCID: PMC3044280 DOI: 10.1186/1471-2105-12-s1-s25] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Background Gene set analysis is a powerful method of deducing biological meaning for an a priori defined set of genes. Numerous tools have been developed to test statistical enrichment or depletion in specific pathways or gene ontology (GO) terms. Major difficulties towards biological interpretation are integrating diverse types of annotation categories and exploring the relationships between annotation terms of similar information. Results GARNET (Gene Annotation Relationship NEtwork Tools) is an integrative platform for gene set analysis with many novel features. It includes tools for retrieval of genes from annotation database, statistical analysis & visualization of annotation relationships, and managing gene sets. In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations, protein-protein interaction) are also included. The pair-wise relationship between annotation gene sets was calculated using kappa statistics. GARNET consists of three modules - gene set manager, gene set analysis and gene set retrieval, which are tightly integrated to provide virtually automatic analysis for gene sets. A dedicated viewer for annotation network has been developed to facilitate exploration of the related annotations. Conclusions GARNET (gene annotation relationship network tools) is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool (http://garnet.isysbio.org/ or http://ercsb.ewha.ac.kr/garnet/).
Collapse
Affiliation(s)
- Kyoohyoung Rho
- Information Center for Bio-Pharmacological Network, Seoul National University, Suwon 443-270, Korea.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
444
|
Mitropoulou C, Webb AJ, Mitropoulos K, Brookes AJ, Patrinos GP. Locus-specific database domain and data content analysis: evolution and content maturation toward clinical use. Hum Mutat 2011; 31:1109-16. [PMID: 20672379 DOI: 10.1002/humu.21332] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Genetic variation databases have become indispensable in many areas of health care. In addition, more and more experts are depositing published and unpublished disease-causing variants of particular genes into locus-specific databases (LSDBs). Some of these databases contain such extensive information that they have become known as knowledge bases. Here, we analyzed 1,188 LSDBs and their content for the presence or absence of 44 content criteria related to database features (general presentation, locus-specific information, database structure) and data content (data collection, summary table of variants, database querying). Our analyses revealed that several elements have helped to advance the field and reduce data heterogeneity, such as the development of specialized database management systems and the creation of data querying tools. We also identified a number of deficiencies, namely, the lack of detailed disease and phenotypic descriptions for each genetic variant and links to relevant patient organizations, which, if addressed, would allow LSDBs to better serve the clinical genetics community. We propose a structure, based on LSDBs and closely related repositories (namely, clinical genetics databases), which would contribute to a federated genetic variation browser and also allow the maintenance of variation data.
Collapse
Affiliation(s)
- Christina Mitropoulou
- Erasmus MC, Faculty of Medicine and Health Sciences, MGC-Department of Cell Biology and Genetics, Rotterdam, The Netherlands
| | | | | | | | | |
Collapse
|
445
|
Schindelman G, Fernandes JS, Bastiani CA, Yook K, Sternberg PW. Worm Phenotype Ontology: integrating phenotype data within and beyond the C. elegans community. BMC Bioinformatics 2011; 12:32. [PMID: 21261995 PMCID: PMC3039574 DOI: 10.1186/1471-2105-12-32] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2010] [Accepted: 01/24/2011] [Indexed: 02/02/2023] Open
Abstract
Background Caenorhabditis elegans gene-based phenotype information dates back to the 1970's, beginning with Sydney Brenner and the characterization of behavioral and morphological mutant alleles via classical genetics in order to understand nervous system function. Since then C. elegans has become an important genetic model system for the study of basic biological and biomedical principles, largely through the use of phenotype analysis. Because of the growth of C. elegans as a genetically tractable model organism and the development of large-scale analyses, there has been a significant increase of phenotype data that needs to be managed and made accessible to the research community. To do so, a standardized vocabulary is necessary to integrate phenotype data from diverse sources, permit integration with other data types and render the data in a computable form. Results We describe a hierarchically structured, controlled vocabulary of terms that can be used to standardize phenotype descriptions in C. elegans, namely the Worm Phenotype Ontology (WPO). The WPO is currently comprised of 1,880 phenotype terms, 74% of which have been used in the annotation of phenotypes associated with greater than 18,000 C. elegans genes. The scope of the WPO is not exclusively limited to C. elegans biology, rather it is devised to also incorporate phenotypes observed in related nematode species. We have enriched the value of the WPO by integrating it with other ontologies, thereby increasing the accessibility of worm phenotypes to non-nematode biologists. We are actively developing the WPO to continue to fulfill the evolving needs of the scientific community and hope to engage researchers in this crucial endeavor. Conclusions We provide a phenotype ontology (WPO) that will help to facilitate data retrieval, and cross-species comparisons within the nematode community. In the larger scientific community, the WPO will permit data integration, and interoperability across the different Model Organism Databases (MODs) and other biological databases. This standardized phenotype ontology will therefore allow for more complex data queries and enhance bioinformatic analyses.
Collapse
Affiliation(s)
- Gary Schindelman
- Division of Biology, California Institute of Technology, Pasadena, CA 91125, USA
| | | | | | | | | |
Collapse
|
446
|
Tong MY, Cassa CA, Kohane IS. Automated validation of genetic variants from large databases: ensuring that variant references refer to the same genomic locations. ACTA ACUST UNITED AC 2011; 27:891-3. [PMID: 21258063 PMCID: PMC3051330 DOI: 10.1093/bioinformatics/btr029] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
SUMMARY Accurate annotations of genomic variants are necessary to achieve full-genome clinical interpretations that are scientifically sound and medically relevant. Many disease associations, especially those reported before the completion of the HGP, are limited in applicability because of potential inconsistencies with our current standards for genomic coordinates, nomenclature and gene structure. In an effort to validate and link variants from the medical genetics literature to an unambiguous reference for each variant, we developed a software pipeline and reviewed 68 641 single amino acid mutations from Online Mendelian Inheritance in Man (OMIM), Human Gene Mutation Database (HGMD) and dbSNP. The frequency of unresolved mutation annotations varied widely among the databases, ranging from 4 to 23%. A taxonomy of primary causes for unresolved mutations was produced. AVAILABILITY This program is freely available from the web site (http://safegene.hms.harvard.edu/aa2nt/).
Collapse
Affiliation(s)
- Mark Y Tong
- Harvard Medical School, Boston, MA 02115, USA.
| | | | | |
Collapse
|
447
|
Workman TE, Fiszman M, Hurdle JF, Rindflesch TC. Biomedical text summarization to support genetic database curation: using Semantic MEDLINE to create a secondary database of genetic information. J Med Libr Assoc 2011; 98:273-81. [PMID: 20936065 DOI: 10.3163/1536-5050.98.4.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
OBJECTIVE This paper examines the development and evaluation of an automatic summarization system in the domain of molecular genetics. The system is a potential component of an advanced biomedical information management application called Semantic MEDLINE and could assist librarians in developing secondary databases of genetic information extracted from the primary literature. METHODS An existing summarization system was modified for identifying biomedical text relevant to the genetic etiology of disease. The summarization system was evaluated on the task of identifying data describing genes associated with bladder cancer in MEDLINE citations. A gold standard was produced using records from Genetics Home Reference and Online Mendelian Inheritance in Man. Genes in text found by the system were compared to the gold standard. Recall, precision, and F-measure were calculated. RESULTS The system achieved recall of 46%, and precision of 88% (F-measure=0.61) by taking Gene References into Function (GeneRIFs) into account. CONCLUSION The new summarization schema for genetic etiology has potential as a component in Semantic MEDLINE to support the work of data curators.
Collapse
Affiliation(s)
- T Elizabeth Workman
- Department of Biomedical Informatics, University of Utah, 26 S 2000 E, HSEB 5700, Salt Lake City, UT 84112, USA.
| | | | | | | |
Collapse
|
448
|
Barabási AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet 2011; 12:56-68. [PMID: 21164525 DOI: 10.1038/nrg2918] [Citation(s) in RCA: 2950] [Impact Index Per Article: 210.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Given the functional interdependencies between the molecular components in a human cell, a disease is rarely a consequence of an abnormality in a single gene, but reflects the perturbations of the complex intracellular and intercellular network that links tissue and organ systems. The emerging tools of network medicine offer a platform to explore systematically not only the molecular complexity of a particular disease, leading to the identification of disease modules and pathways, but also the molecular relationships among apparently distinct (patho)phenotypes. Advances in this direction are essential for identifying new disease genes, for uncovering the biological significance of disease-associated mutations identified by genome-wide association studies and full-genome sequencing, and for identifying drug targets and biomarkers for complex diseases.
Collapse
Affiliation(s)
- Albert-László Barabási
- Center for Complex Networks Research and Department of Physics, Northeastern University, 110 Forsyth Street, 111 Dana Research Center, Boston, Massachusetts 02115, USA.
| | | | | |
Collapse
|
449
|
Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Federhen S, Feolo M, Fingerman IM, Geer LY, Helmberg W, Kapustin Y, Landsman D, Lipman DJ, Lu Z, Madden TL, Madej T, Maglott DR, Marchler-Bauer A, Miller V, Mizrachi I, Ostell J, Panchenko A, Phan L, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Shumway M, Sirotkin K, Slotta D, Souvorov A, Starchenko G, Tatusova TA, Wagner L, Wang Y, Wilbur WJ, Yaschenko E, Ye J. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2011; 39:D38-51. [PMID: 21097890 PMCID: PMC3013733 DOI: 10.1093/nar/gkq1172] [Citation(s) in RCA: 485] [Impact Index Per Article: 34.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2010] [Revised: 10/29/2010] [Accepted: 11/01/2010] [Indexed: 12/03/2022] Open
Abstract
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Electronic PCR, OrfFinder, Splign, ProSplign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), IBIS, Biosystems, Peptidome, OMSSA, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Collapse
Affiliation(s)
- Eric W Sayers
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
450
|
Bellazzi R, Diomidous M, Sarkar IN, Takabayashi K, Ziegler A, McCray AT. Data analysis and data mining: current issues in biomedical informatics. Methods Inf Med 2011; 50:536-44. [PMID: 22146916 PMCID: PMC3233983 DOI: 10.3414/me11-06-0002] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
BACKGROUND Medicine and biomedical sciences have become data-intensive fields, which, at the same time, enable the application of data-driven approaches and require sophisticated data analysis and data mining methods. Biomedical informatics provides a proper interdisciplinary context to integrate data and knowledge when processing available information, with the aim of giving effective decision-making support in clinics and translational research. OBJECTIVES To reflect on different perspectives related to the role of data analysis and data mining in biomedical informatics. METHODS On the occasion of the 50th year of Methods of Information in Medicine a symposium was organized, which reflected on opportunities, challenges and priorities of organizing, representing and analysing data, information and knowledge in biomedicine and health care. The contributions of experts with a variety of backgrounds in the area of biomedical data analysis have been collected as one outcome of this symposium, in order to provide a broad, though coherent, overview of some of the most interesting aspects of the field. RESULTS The paper presents sections on data accumulation and data-driven approaches in medical informatics, data and knowledge integration, statistical issues for the evaluation of data mining models, translational bioinformatics and bioinformatics aspects of genetic epidemiology. CONCLUSIONS Biomedical informatics represents a natural framework to properly and effectively apply data analysis and data mining methods in a decision-making context. In the future, it will be necessary to preserve the inclusive nature of the field and to foster an increasing sharing of data and methods between researchers.
Collapse
Affiliation(s)
- R Bellazzi
- University of Pavia, Dipartimento di Informatica e Sistemistica, Via Ferrata 1, 27100 Pavia (PV), Italy.
| | | | | | | | | | | |
Collapse
|