451
|
Ringwald M, Iyer V, Mason JC, Stone KR, Tadepally HD, Kadin JA, Bult CJ, Eppig JT, Oakley DJ, Briois S, Stupka E, Maselli V, Smedley D, Liu S, Hansen J, Baldock R, Hicks GG, Skarnes WC. The IKMC web portal: a central point of entry to data and resources from the International Knockout Mouse Consortium. Nucleic Acids Res 2010; 39:D849-55. [PMID: 20929875 PMCID: PMC3013768 DOI: 10.1093/nar/gkq879] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
The International Knockout Mouse Consortium (IKMC) aims to mutate all protein-coding genes in the mouse using a combination of gene targeting and gene trapping in mouse embryonic stem (ES) cells and to make the generated resources readily available to the research community. The IKMC database and web portal (www.knockoutmouse.org) serves as the central public web site for IKMC data and facilitates the coordination and prioritization of work within the consortium. Researchers can access up-to-date information on IKMC knockout vectors, ES cells and mice for specific genes, and follow links to the respective repositories from which corresponding IKMC products can be ordered. Researchers can also use the web site to nominate genes for targeting, or to indicate that targeting of a gene should receive high priority. The IKMC database provides data to, and features extensive interconnections with, other community databases.
Collapse
Affiliation(s)
- Martin Ringwald
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
452
|
Seal RL, Gordon SM, Lush MJ, Wright MW, Bruford EA. genenames.org: the HGNC resources in 2011. Nucleic Acids Res 2010; 39:D514-9. [PMID: 20929869 PMCID: PMC3013772 DOI: 10.1093/nar/gkq892] [Citation(s) in RCA: 175] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
The HUGO Gene Nomenclature Committee (HGNC) aims to assign a unique gene symbol and name to every human gene. The HGNC database currently contains almost 30 000 approved gene symbols, over 19 000 of which represent protein-coding genes. The public website, www.genenames.org, displays all approved nomenclature within Symbol Reports that contain data curated by HGNC editors and links to related genomic, phenotypic and proteomic information. Here we describe improvements to our resources, including a new Quick Gene Search, a new List Search, an integrated HGNC BioMart and a new Statistics and Downloads facility.
Collapse
Affiliation(s)
- Ruth L Seal
- European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.
| | | | | | | | | |
Collapse
|
453
|
Maqungo M, Kaur M, Kwofie SK, Radovanovic A, Schaefer U, Schmeier S, Oppon E, Christoffels A, Bajic VB. DDPC: Dragon Database of Genes associated with Prostate Cancer. Nucleic Acids Res 2010; 39:D980-5. [PMID: 20880996 PMCID: PMC3013759 DOI: 10.1093/nar/gkq849] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Prostate cancer (PC) is one of the most commonly diagnosed cancers in men. PC is relatively difficult to diagnose due to a lack of clear early symptoms. Extensive research of PC has led to the availability of a large amount of data on PC. Several hundred genes are implicated in different stages of PC, which may help in developing diagnostic methods or even cures. In spite of this accumulated information, effective diagnostics and treatments remain evasive. We have developed Dragon Database of Genes associated with Prostate Cancer (DDPC) as an integrated knowledgebase of genes experimentally verified as implicated in PC. DDPC is distinctive from other databases in that (i) it provides pre-compiled biomedical text-mining information on PC, which otherwise require tedious computational analyses, (ii) it integrates data on molecular interactions, pathways, gene ontologies, gene regulation at molecular level, predicted transcription factor binding sites on promoters of PC implicated genes and transcription factors that correspond to these binding sites and (iii) it contains DrugBank data on drugs associated with PC. We believe this resource will serve as a source of useful information for research on PC. DDPC is freely accessible for academic and non-profit users via http://apps.sanbi.ac.za/ddpc/ and http://cbrc.kaust.edu.sa/ddpc/.
Collapse
Affiliation(s)
- Monique Maqungo
- South African National Bioinformatics Institute, University of the Western Cape, Private Bag-X17, Modderdam Road, Bellville, Cape Town, South Africa
| | | | | | | | | | | | | | | | | |
Collapse
|
454
|
Shin YC, Shin SY, So I, Kwon D, Jeon JH. TRIP Database: a manually curated database of protein-protein interactions for mammalian TRP channels. Nucleic Acids Res 2010; 39:D356-61. [PMID: 20851834 PMCID: PMC3013757 DOI: 10.1093/nar/gkq814] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Transient receptor potential (TRP) channels are a superfamily of Ca2+-permeable cation channels that translate cellular stimuli into electrochemical signals. Aberrant activity of TRP channels has been implicated in a variety of human diseases, such as neurological disorders, cardiovascular disease and cancer. To facilitate the understanding of the molecular network by which TRP channels are associated with biological and disease processes, we have developed the TRIP (TRansient receptor potential channel-Interacting Protein) Database (http://www.trpchannel.org), a manually curated database that aims to offer comprehensive information on protein–protein interactions (PPIs) of mammalian TRP channels. The TRIP Database was created by systematically curating 277 peer-reviewed literature; the current version documents 490 PPI pairs, 28 TRP channels and 297 cellular proteins. The TRIP Database provides a detailed summary of PPI data that fit into four categories: screening, validation, characterization and functional consequence. Users can find in-depth information specified in the literature on relevant analytical methods and experimental resources, such as gene constructs and cell/tissue types. The TRIP Database has user-friendly web interfaces with helpful features, including a search engine, an interaction map and a function for cross-referencing useful external databases. Our TRIP Database will provide a valuable tool to assist in understanding the molecular regulatory network of TRP channels.
Collapse
Affiliation(s)
- Young-Cheul Shin
- Department of Physiology, Seoul National University College of Medicine, Seoul 110-799, Korea
| | | | | | | | | |
Collapse
|
455
|
Chautard E, Fatoux-Ardore M, Ballut L, Thierry-Mieg N, Ricard-Blum S. MatrixDB, the extracellular matrix interaction database. Nucleic Acids Res 2010; 39:D235-40. [PMID: 20852260 PMCID: PMC3013758 DOI: 10.1093/nar/gkq830] [Citation(s) in RCA: 102] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
MatrixDB (http://matrixdb.ibcp.fr) is a freely available database focused on interactions established by extracellular proteins and polysaccharides. Only few databases report protein-polysaccharide interactions and, to the best of our knowledge, there is no other database of extracellular interactions. MatrixDB takes into account the multimeric nature of several extracellular protein families for the curation of interactions, and reports interactions with individual polypeptide chains or with multimers, considered as permanent complexes, when appropriate. MatrixDB is a member of the International Molecular Exchange consortium (IMEx) and has adopted the PSI-MI standards for the curation and the exchange of interaction data. MatrixDB stores experimental data from our laboratory, data from literature curation, data imported from IMEx databases, and data from the Human Protein Reference Database. MatrixDB is focused on mammalian interactions, but aims to integrate interaction datasets of model organisms when available. MatrixDB provides direct links to databases recapitulating mutations in genes encoding extracellular proteins, to UniGene and to the Human Protein Atlas that shows expression and localization of proteins in a large variety of normal human tissues and cells. MatrixDB allows researchers to perform customized queries and to build tissue- and disease-specific interaction networks that can be visualized and analyzed with Cytoscape or Medusa.
Collapse
Affiliation(s)
- Emilie Chautard
- Institut de Biologie et Chimie des Protéines, UMR 5086 CNRS-Université Lyon 1, IFR 128 Biosciences Gerland-Lyon Sud, 7 passage du Vercors 69367, Lyon Cedex 07, France
| | | | | | | | | |
Collapse
|
456
|
Wu Z, Zhao XM, Chen L. A systems biology approach to identify effective cocktail drugs. BMC SYSTEMS BIOLOGY 2010; 4 Suppl 2:S7. [PMID: 20840734 PMCID: PMC2982694 DOI: 10.1186/1752-0509-4-s2-s7] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
BACKGROUND Complex diseases, such as Type 2 Diabetes, are generally caused by multiple factors, which hamper effective drug discovery. To combat these diseases, combination regimens or combination drugs provide an alternative way, and are becoming the standard of treatment for complex diseases. However, most of existing combination drugs are developed based on clinical experience or test-and-trial strategy, which are not only time consuming but also expensive. RESULTS In this paper, we presented a novel network-based systems biology approach to identify effective drug combinations by exploiting high throughput data. We assumed that a subnetwork or pathway will be affected in the networked cellular system after a drug is administrated. Therefore, the affected subnetwork can be used to assess the drug's overall effect, and thereby help to identify effective drug combinations by comparing the subnetworks affected by individual drugs with that by the combination drug. In this work, we first constructed a molecular interaction network by integrating protein interactions, protein-DNA interactions, and signaling pathways. A new model was then developed to detect subnetworks affected by drugs. Furthermore, we proposed a new score to evaluate the overall effect of one drug by taking into account both efficacy and side-effects. As a pilot study we applied the proposed method to identify effective combinations of drugs used to treat Type 2 Diabetes. Our method detected the combination of Metformin and Rosiglitazone, which is actually Avandamet, a drug that has been successfully used to treat Type 2 Diabetes. CONCLUSIONS The results on real biological data demonstrate the effectiveness and efficiency of the proposed method, which can not only detect effective cocktail combination of drugs in an accurate manner but also significantly reduce expensive and tedious trial-and-error experiments.
Collapse
Affiliation(s)
- Zikai Wu
- Institute of Systems Biology, Shanghai University, Shanghai, China
- Business School, University of Shanghai for Science and Technology, Shanghai, China
- School of Communication and Information Engineering, Shanghai University, Shanghai, China
| | - Xing-Ming Zhao
- Institute of Systems Biology, Shanghai University, Shanghai, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, SIBS-Novo Nordisk Translational Research Centre for PreDiabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
| |
Collapse
|
457
|
Cause-effect relationships in medicine: a protein network perspective. Trends Pharmacol Sci 2010; 31:547-55. [PMID: 20810173 DOI: 10.1016/j.tips.2010.07.005] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2010] [Revised: 07/21/2010] [Accepted: 07/26/2010] [Indexed: 11/22/2022]
Abstract
Current target-based drug discovery platforms are not able to predict drug efficacy and the full spectrum of drug effects in organisms. Hence, many experimental drugs do not survive the lengthy and costly process of drug development. Understanding how drugs affect cellular network structures and how the resulting signals are translated into drug effects is extremely important for the discovery of new medicines. This requires a greater understanding of cause-effect relationships at the organism, organ, tissue, cellular, and molecular level. There is a growing recognition that this information must be integrated into discovery paradigms, but a 'road map' for obtaining and integrating information about heterogeneous networks into drug-discovery platforms currently does not exist. This review explores recent network-centered approaches developed to investigate the genesis of medicine and disease effects, specifically highlighting protein-protein interaction network models and their use in cause-effect analyses in medicine.
Collapse
|
458
|
Gene therapy, gene targeting and induced pluripotent stem cells: applications in monogenic disease treatment. Biotechnol Adv 2010; 29:1-10. [PMID: 20656005 DOI: 10.1016/j.biotechadv.2010.07.005] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2010] [Revised: 05/05/2010] [Accepted: 05/11/2010] [Indexed: 01/15/2023]
Abstract
Monogenic diseases are often severe, life-threatening disorders for which lifelong palliative treatment is the only option. Over the last two decades, a number of strategies have been devised with the aim to treat these diseases with a genetic approach. Gene therapy has been under development for many years, yet suffers from the lack of an effective and safe vector for the delivery of genetic material into cells. More recently, gene targeting by homologous recombination has been proposed as a safer treatment, by specifically correcting disease-causing mutations. However, low efficiency is a major drawback. The emergence of two technologies could overcome some of these obstacles. Terminally differentiated somatic cells can be reprogrammed, using defined factors, to become induced pluripotent stem cells (iPSCs), which can undergo efficient gene mutation correction with the aid of fusion proteins known as zinc finger nucleases (ZFNs). The amalgamation of these two technologies has the potential to break through the current bottleneck in gene therapy and gene targeting.
Collapse
|
459
|
Sobreira NLM, Cirulli ET, Avramopoulos D, Wohler E, Oswald GL, Stevens EL, Ge D, Shianna KV, Smith JP, Maia JM, Gumbs CE, Pevsner J, Thomas G, Valle D, Hoover-Fong JE, Goldstein DB. Whole-genome sequencing of a single proband together with linkage analysis identifies a Mendelian disease gene. PLoS Genet 2010; 6:e1000991. [PMID: 20577567 PMCID: PMC2887469 DOI: 10.1371/journal.pgen.1000991] [Citation(s) in RCA: 168] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2010] [Accepted: 05/18/2010] [Indexed: 11/19/2022] Open
Abstract
Although more than 2,400 genes have been shown to contain variants that cause Mendelian disease, there are still several thousand such diseases yet to be molecularly defined. The ability of new whole-genome sequencing technologies to rapidly indentify most of the genetic variants in any given genome opens an exciting opportunity to identify these disease genes. Here we sequenced the whole genome of a single patient with the dominant Mendelian disease, metachondromatosis (OMIM 156250), and used partial linkage data from her small family to focus our search for the responsible variant. In the proband, we identified an 11 bp deletion in exon four of PTPN11, which alters frame, results in premature translation termination, and co-segregates with the phenotype. In a second metachondromatosis family, we confirmed our result by identifying a nonsense mutation in exon 4 of PTPN11 that also co-segregates with the phenotype. Sequencing PTPN11 exon 4 in 469 controls showed no such protein truncating variants, supporting the pathogenicity of these two mutations. This combination of a new technology and a classical genetic approach provides a powerful strategy to discover the genes responsible for unexplained Mendelian disorders.
Collapse
Affiliation(s)
- Nara L. M. Sobreira
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
- Predoctoral Training Program in Human Genetics, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Elizabeth T. Cirulli
- Center for Human Genome Variation, Duke University School of Medicine, Durham, North Carolina, United States of America
| | - Dimitrios Avramopoulos
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
- Department of Psychiatry, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Elizabeth Wohler
- Department of Cytogenetics, Kennedy Krieger Institute, Baltimore, Maryland, United States of America
| | - Gretchen L. Oswald
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Eric L. Stevens
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
- Predoctoral Training Program in Human Genetics, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Dongliang Ge
- Center for Human Genome Variation, Duke University School of Medicine, Durham, North Carolina, United States of America
| | - Kevin V. Shianna
- Center for Human Genome Variation, Duke University School of Medicine, Durham, North Carolina, United States of America
| | - Jason P. Smith
- Center for Human Genome Variation, Duke University School of Medicine, Durham, North Carolina, United States of America
| | - Jessica M. Maia
- Center for Human Genome Variation, Duke University School of Medicine, Durham, North Carolina, United States of America
| | - Curtis E. Gumbs
- Center for Human Genome Variation, Duke University School of Medicine, Durham, North Carolina, United States of America
| | - Jonathan Pevsner
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
- Department of Neurology, Kennedy Krieger Institute, Baltimore, Maryland, United States of America
| | - George Thomas
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
- Department of Cytogenetics, Kennedy Krieger Institute, Baltimore, Maryland, United States of America
| | - David Valle
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
- Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Julie E. Hoover-Fong
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
- Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
- Greenberg Center for Skeletal Dysplasias, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - David B. Goldstein
- Predoctoral Training Program in Human Genetics, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| |
Collapse
|
460
|
Korcsmáros T, Farkas IJ, Szalay MS, Rovó P, Fazekas D, Spiró Z, Böde C, Lenti K, Vellai T, Csermely P. Uniformly curated signaling pathways reveal tissue-specific cross-talks and support drug target discovery. ACTA ACUST UNITED AC 2010; 26:2042-50. [PMID: 20542890 DOI: 10.1093/bioinformatics/btq310] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
MOTIVATION Signaling pathways control a large variety of cellular processes. However, currently, even within the same database signaling pathways are often curated at different levels of detail. This makes comparative and cross-talk analyses difficult. RESULTS We present SignaLink, a database containing eight major signaling pathways from Caenorhabditis elegans, Drosophila melanogaster and humans. Based on 170 review and approximately 800 research articles, we have compiled pathways with semi-automatic searches and uniform, well-documented curation rules. We found that in humans any two of the eight pathways can cross-talk. We quantified the possible tissue- and cancer-specific activity of cross-talks and found pathway-specific expression profiles. In addition, we identified 327 proteins relevant for drug target discovery. CONCLUSIONS We provide a novel resource for comparative and cross-talk analyses of signaling pathways. The identified multi-pathway and tissue-specific cross-talks contribute to the understanding of the signaling complexity in health and disease, and underscore its importance in network-based drug target selection. AVAILABILITY http://SignaLink.org.
Collapse
|
461
|
Chavali S, Barrenas F, Kanduri K, Benson M. Network properties of human disease genes with pleiotropic effects. BMC SYSTEMS BIOLOGY 2010; 4:78. [PMID: 20525321 PMCID: PMC2892460 DOI: 10.1186/1752-0509-4-78] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2010] [Accepted: 06/04/2010] [Indexed: 01/28/2023]
Abstract
BACKGROUND The ability of a gene to cause a disease is known to be associated with the topological position of its protein product in the molecular interaction network. Pleiotropy, in human genetic diseases, refers to the ability of different mutations within the same gene to cause different pathological effects. Here, we hypothesized that the ability of human disease genes to cause pleiotropic effects would be associated with their network properties. RESULTS Shared genes, with pleiotropic effects, were more central than specific genes that were associated with one disease, in the protein interaction network. Furthermore, shared genes associated with phenotypically divergent diseases (phenodiv genes) were more central than those associated with phenotypically similar diseases. Shared genes had a higher number of disease gene interactors compared to specific genes, implying higher likelihood of finding a novel disease gene in their network neighborhood. Shared genes had a relatively restricted tissue co-expression with interactors, contrary to specific genes. This could be a function of shared genes leading to pleiotropy. Essential and phenodiv genes had comparable connectivities and hence we investigated for differences in network attributes conferring lethality and pleiotropy, respectively. Essential and phenodiv genes were found to be intra-modular and inter-modular hubs with the former being highly co-expressed with their interactors contrary to the latter. Essential genes were predominantly nuclear proteins with transcriptional regulation activities while phenodiv genes were cytoplasmic proteins involved in signal transduction. CONCLUSION The properties of a disease gene in molecular interaction network determine its role in manifesting different and divergent diseases.
Collapse
Affiliation(s)
- Sreenivas Chavali
- The Unit for Clinical Systems Biology, University of Gothenburg, Medicinaregatan 5A, Gothenburg SE405 30, Sweden.
| | | | | | | |
Collapse
|
462
|
Sardana D, Vasa S, Vepachedu N, Chen J, Gudivada RC, Aronow BJ, Jegga AG. PhenoHM: human-mouse comparative phenome-genome server. Nucleic Acids Res 2010; 38:W165-74. [PMID: 20507906 PMCID: PMC2896149 DOI: 10.1093/nar/gkq472] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
PhenoHM is a human–mouse comparative phenome–genome server that facilitates cross-species identification of genes associated with orthologous phenotypes (http://phenome.cchmc.org; full open access, login not required). Combining and extrapolating the knowledge about the roles of individual gene functions in the determination of phenotype across multiple organisms improves our understanding of gene function in normal and perturbed states and offers the opportunity to complement biologically the rapidly expanding strategies in comparative genomics. The Mammalian Phenotype Ontology (MPO), a structured vocabulary of phenotype terms that leverages observations encompassing the consequences of mouse gene knockout studies, is a principal component of mouse phenotype knowledge source. On the other hand, the Unified Medical Language System (UMLS) is a composite collection of various human-centered biomedical terminologies. In the present study, we mapped terms reciprocally from the MPO to human disease concepts such as clinical findings from the UMLS and clinical phenotypes from the Online Mendelian Inheritance in Man knowledgebase. By cross-mapping mouse–human phenotype terms, extracting implicated genes and extrapolating phenotype-gene associations between species PhenoHM provides a resource that enables rapid identification of genes that trigger similar outcomes in human and mouse and facilitates identification of potentially novel disease causal genes. The PhenoHM server can be accessed freely at http://phenome.cchmc.org.
Collapse
Affiliation(s)
- Divya Sardana
- Department of Computer Science, University of Cincinnati, Cincinnati, OH, USA
| | | | | | | | | | | | | |
Collapse
|
463
|
Bioinformatics services related to diagnosis of primary immunodeficiencies. Curr Opin Allergy Clin Immunol 2010; 9:531-6. [PMID: 19779331 DOI: 10.1097/aci.0b013e3283327dc1] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
PURPOSE OF REVIEW Most primary immunodeficiencies (PIDs) have overlapping signs and symptoms - presenting a challenge for diagnosis. The information available from the Internet for over 200 PIDs is scattered between numerous services and databases. Patient information has been collected in different patient registries. Several software tools have been developed in order to build the databases, expert systems and other information systems useful in diagnosis or prediction. RECENT FINDINGS Previously released services have been significantly improved and some new bioinformatics tools have been developed to help in diagnosis, prediction, mutation analysis and classification of PIDs. Several national initiatives have been launched for centralized PID information services. The very latest additions are tools and approaches for PID candidate gene prioritization, systematic classification and a medical expert system to help in diagnosis. SUMMARY Many bioinformatics tools for PIDs are already freely available over the Internet. We expect bioinformatics tools to further help healthcare professionals in diagnosis, analysis and prediction. Currently, most of the resources are stand-alone and thus their integration will be a challenge for the future. Another challenge is to develop terminologies, ontologies and standards to achieve semantic interoperability.
Collapse
|
464
|
Abstract
Connections have been revealed between very different human diseases using phenotype associations in other species Surprising correlations between human disease phenotypes are emerging. Recent work now reveals startling phenotype connections between species, which could provide new disease models.
Collapse
Affiliation(s)
- Bolan Linghu
- Translational Sciences Department, Novartis Institutes for BioMedical Research, Cambridge, MA 02139, USA
| | | |
Collapse
|
465
|
Gkoutos GV, Mungall C, Dolken S, Ashburner M, Lewis S, Hancock J, Schofield P, Kohler S, Robinson PN. Entity/quality-based logical definitions for the human skeletal phenome using PATO. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2010; 2009:7069-72. [PMID: 19964203 DOI: 10.1109/iembs.2009.5333362] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This paper describes an approach to providing computer-interpretable logical definitions for the terms of the Human Phenotype Ontology (HPO) using PATO, the ontology of phenotypic qualities, to link terms of the HPO to the anatomic and other entities that are affected by abnormal phenotypic qualities. This approach will allow improved computerized reasoning as well as a facility to compare phenotypes between different species. The PATO mapping will also provide direct links from phenotypic abnormalities and underlying anatomic structures encoded using the Foundational Model of Anatomy, which will be a valuable resource for computational investigations of the links between anatomical components and concepts representing diseases with abnormal phenotypes and associated genes.
Collapse
Affiliation(s)
- Georgios V Gkoutos
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, England.
| | | | | | | | | | | | | | | | | |
Collapse
|
466
|
Friedrich A, Garnier N, Gagnière N, Nguyen H, Albou LP, Biancalana V, Bettler E, Deléage G, Lecompte O, Muller J, Moras D, Mandel JL, Toursel T, Moulinier L, Poch O. SM2PH-db: an interactive system for the integrated analysis of phenotypic consequences of missense mutations in proteins involved in human genetic diseases. Hum Mutat 2010; 31:127-35. [PMID: 19921752 DOI: 10.1002/humu.21155] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Understanding how genetic alterations affect gene products at the molecular level represents a first step in the elucidation of the complex relationships between genotypic and phenotypic variations, and is thus a major challenge in the postgenomic era. Here, we present SM2PH-db (http://decrypthon.igbmc.fr/sm2ph), a new database designed to investigate structural and functional impacts of missense mutations and their phenotypic effects in the context of human genetic diseases. A wealth of up-to-date interconnected information is provided for each of the 2,249 disease-related entry proteins (August 2009), including data retrieved from biological databases and data generated from a Sequence-Structure-Evolution Inference in Systems-based approach, such as multiple alignments, three-dimensional structural models, and multidimensional (physicochemical, functional, structural, and evolutionary) characterizations of mutations. SM2PH-db provides a robust infrastructure associated with interactive analysis tools supporting in-depth study and interpretation of the molecular consequences of mutations, with the more long-term goal of elucidating the chain of events leading from a molecular defect to its pathology. The entire content of SM2PH-db is regularly and automatically updated thanks to a computational grid data federation facilities provided in the context of the Decrypthon program.
Collapse
Affiliation(s)
- Anne Friedrich
- Département de Biologie et Génomique Structurales, Institut de Génétique et de Biologie Moléculaire et Cellulaire (UMR7104), Centre National de la Recherche Scientifique/Institut National de la Santé et de la Recherche Médicale/Université de Strasbourg, Illkirch, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
467
|
Hijikata A, Raju R, Keerthikumar S, Ramabadran S, Balakrishnan L, Ramadoss SK, Pandey A, Mohan S, Ohara O. Mutation@A Glance: an integrative web application for analysing mutations from human genetic diseases. DNA Res 2010; 17:197-208. [PMID: 20360267 PMCID: PMC2885273 DOI: 10.1093/dnares/dsq010] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Although mutation analysis serves as a key part in making a definitive diagnosis about a genetic disease, it still remains a time-consuming step to interpret their biological implications through integration of various lines of archived information about genes in question. To expedite this evaluation step of disease-causing genetic variations, here we developed Mutation@A Glance (http://rapid.rcai.riken.jp/mutation/), a highly integrated web-based analysis tool for analysing human disease mutations; it implements a user-friendly graphical interface to visualize about 40 000 known disease-associated mutations and genetic polymorphisms from more than 2600 protein-coding human disease-causing genes. Mutation@A Glance locates already known genetic variation data individually on the nucleotide and the amino acid sequences and makes it possible to cross-reference them with tertiary and/or quaternary protein structures and various functional features associated with specific amino acid residues in the proteins. We showed that the disease-associated missense mutations had a stronger tendency to reside in positions relevant to the structure/function of proteins than neutral genetic variations. From a practical viewpoint, Mutation@A Glance could certainly function as a ‘one-stop’ analysis platform for newly determined DNA sequences, which enables us to readily identify and evaluate new genetic variations by integrating multiple lines of information about the disease-causing candidate genes.
Collapse
Affiliation(s)
- Atsushi Hijikata
- Laboratory for Immunogenomics, RIKEN Research Center for Allergy and Immunology, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | | | | | | | | | | | | | | | | |
Collapse
|
468
|
Systematic discovery of nonobvious human disease models through orthologous phenotypes. Proc Natl Acad Sci U S A 2010; 107:6544-9. [PMID: 20308572 DOI: 10.1073/pnas.0910200107] [Citation(s) in RCA: 206] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Biologists have long used model organisms to study human diseases, particularly when the model bears a close resemblance to the disease. We present a method that quantitatively and systematically identifies nonobvious equivalences between mutant phenotypes in different species, based on overlapping sets of orthologous genes from human, mouse, yeast, worm, and plant (212,542 gene-phenotype associations). These orthologous phenotypes, or phenologs, predict unique genes associated with diseases. Our method suggests a yeast model for angiogenesis defects, a worm model for breast cancer, mouse models of autism, and a plant model for the neural crest defects associated with Waardenburg syndrome, among others. Using these models, we show that SOX13 regulates angiogenesis, and that SEC23IP is a likely Waardenburg gene. Phenologs reveal functionally coherent, evolutionarily conserved gene networks-many predating the plant-animal divergence-capable of identifying candidate disease genes.
Collapse
|
469
|
Andersson L, Ståhl F. Distribution of candidate genes for experimentally induced arthritis in rats. BMC Genomics 2010; 11:146. [PMID: 20196835 PMCID: PMC2838850 DOI: 10.1186/1471-2164-11-146] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2008] [Accepted: 03/02/2010] [Indexed: 12/04/2022] Open
Abstract
Background Rat models are frequently used to link genomic regions to experimentally induced arthritis in quantitative trait locus (QTL) analyses. To facilitate the search for candidate genes within such regions, we have previously developed an application (CGC) that uses weighted keywords to rank genes based on their descriptive text. In this study, CGC is used for analyzing the localization of candidate genes from two viewpoints: distribution over the rat genome and functional connections between arthritis QTLs. Methods To investigate if candidate genes identified by CGC are more likely to be found inside QTLs, we ranked 2403 genes genome wide in rat. The number of genes within different ranges of CGC scores localized inside and outside QTLs was then calculated. Furthermore, we investigated if candidate genes within certain QTLs share similar functions, and if these functions could be connected to genes within other QTLs. Based on references between genes in OMIM, we created connections between genes in QTLs identified in two distinct rat crosses. In this way, QTL pairs with one QTL from each cross that share an unexpectedly high number of gene connections were identified. The genes that were found to connect a pair of QTLs were then functionally analysed using a publicly available classification tool. Results Out of the 2403 genes ranked by the CGC application, 1160 were localized within QTL regions. No difference was observed between highly and lowly rated genes. Hence, highly rated candidate genes for arthritis seem to be distributed randomly inside and outside QTLs. Furthermore, we found five pairs of QTLs that shared a significantly high number of interconnected genes. When functionally analyzed, most genes connecting two QTLs could be included in a single functional cluster. Thus, the functional connections between these genes could very well be involved in the development of an arthritis phenotype. Conclusions From the genome wide CGC search, we conclude that candidate genes for arthritis in rat are randomly distributed between QTL and non-QTL regions. We do however find certain pairs of QTLs that share a large number of functionally connected candidate genes, suggesting that these QTLs contain a number of genes involved in similar functions contributing to the arthritis phenotype.
Collapse
Affiliation(s)
- Lars Andersson
- Department of Cell and Molecular Biology-Genetics, Göteborg University, Box 462, SE 40530 Göteborg, Sweden.
| | | |
Collapse
|
470
|
Jeong SK, Lee EY, Cho JY, Lee HJ, Jeong AS, Cho SY, Paik YK. Data management and functional annotation of the Korean reference plasma proteome. Proteomics 2010; 10:1250-5. [DOI: 10.1002/pmic.200900371] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
471
|
Lin YR, Wei HY, Tsai TL, Lin TH. HDAPD: a web tool for searching the disease-associated protein structures. BMC Bioinformatics 2010; 11:88. [PMID: 20158919 PMCID: PMC2833151 DOI: 10.1186/1471-2105-11-88] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2009] [Accepted: 02/17/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The protein structures of the disease-associated proteins are important for proceeding with the structure-based drug design to against a particular disease. Up until now, proteins structures are usually searched through a PDB id or some sequence information. However, in the HDAPD database presented here the protein structure of a disease-associated protein can be directly searched through the associated disease name keyed in. DESCRIPTION The search in HDAPD can be easily initiated by keying some key words of a disease, protein name, protein type, or PDB id. The protein sequence can be presented in FASTA format and directly copied for a BLAST search. HDAPD is also interfaced with Jmol so that users can observe and operate a protein structure with Jmol. The gene ontological data such as cellular components, molecular functions, and biological processes are provided once a hyperlink to Gene Ontology (GO) is clicked. Further, HDAPD provides a link to the KEGG map such that where the protein is placed and its relationship with other proteins in a metabolic pathway can be found from the map. The latest literatures namely titles, journals, authors, and abstracts searched from PubMed for the protein are also presented as a length controllable list. CONCLUSIONS Since the HDAPD data content can be routinely updated through a PHP-MySQL web page built, the new database presented is useful for searching the structures for some disease-associated proteins that may play important roles in the disease developing process for performing the structure-based drug design to against the diseases.
Collapse
Affiliation(s)
- Yi-Ruen Lin
- Institute of Molecular Medicine and Department of Life Science, National Tsing Hua University, HsinChu, 30013, Taiwan, Republic of China
| | - Hsin-Yuan Wei
- Institute of Molecular Medicine and Department of Life Science, National Tsing Hua University, HsinChu, 30013, Taiwan, Republic of China
| | - Tsung-Lin Tsai
- Institute of Molecular Medicine and Department of Life Science, National Tsing Hua University, HsinChu, 30013, Taiwan, Republic of China
| | - Thy-Hou Lin
- Institute of Molecular Medicine and Department of Life Science, National Tsing Hua University, HsinChu, 30013, Taiwan, Republic of China
| |
Collapse
|
472
|
Stenson PD, Ball EV, Howells K, Phillips AD, Mort M, Cooper DN. The Human Gene Mutation Database: providing a comprehensive central mutation database for molecular diagnostics and personalized genomics. Hum Genomics 2010; 4:69-72. [PMID: 20038494 PMCID: PMC3525207 DOI: 10.1186/1479-7364-4-2-69] [Citation(s) in RCA: 137] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
|
473
|
Abstract
A standardized, controlled vocabulary allows phenotypic information to be described in an unambiguous fashion in medical publications and databases. The Human Phenotype Ontology (HPO) is being developed in an effort to provide such a vocabulary. The use of an ontology to capture phenotypic information allows the use of computational algorithms that exploit semantic similarity between related phenotypic abnormalities to define phenotypic similarity metrics, which can be used to perform database searches for clinical diagnostics or as a basis for incorporating the human phenome into large-scale computational analysis of gene expression patterns and other cellular phenomena associated with human disease. The HPO is freely available at http://www.human-phenotype-ontology.org.
Collapse
Affiliation(s)
- P N Robinson
- Institute for Medical Genetics, Augustenburger Platz 1, 13353 Berlin, Germany.
| | | |
Collapse
|
474
|
Suthram S, Dudley JT, Chiang AP, Chen R, Hastie TJ, Butte AJ. Network-based elucidation of human disease similarities reveals common functional modules enriched for pluripotent drug targets. PLoS Comput Biol 2010; 6:e1000662. [PMID: 20140234 PMCID: PMC2816673 DOI: 10.1371/journal.pcbi.1000662] [Citation(s) in RCA: 221] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Accepted: 12/30/2009] [Indexed: 11/18/2022] Open
Abstract
Current work in elucidating relationships between diseases has largely been based on pre-existing knowledge of disease genes. Consequently, these studies are limited in their discovery of new and unknown disease relationships. We present the first quantitative framework to compare and contrast diseases by an integrated analysis of disease-related mRNA expression data and the human protein interaction network. We identified 4,620 functional modules in the human protein network and provided a quantitative metric to record their responses in 54 diseases leading to 138 significant similarities between diseases. Fourteen of the significant disease correlations also shared common drugs, supporting the hypothesis that similar diseases can be treated by the same drugs, allowing us to make predictions for new uses of existing drugs. Finally, we also identified 59 modules that were dysregulated in at least half of the diseases, representing a common disease-state "signature". These modules were significantly enriched for genes that are known to be drug targets. Interestingly, drugs known to target these genes/proteins are already known to treat significantly more diseases than drugs targeting other genes/proteins, highlighting the importance of these core modules as prime therapeutic opportunities.
Collapse
Affiliation(s)
- Silpa Suthram
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, United States of America
- Department of Pediatrics, Stanford University, Stanford, California, United States of America
- Lucile Packard Children's Hospital, Palo Alto, California, United States of America
| | - Joel T. Dudley
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, United States of America
- Department of Pediatrics, Stanford University, Stanford, California, United States of America
- Lucile Packard Children's Hospital, Palo Alto, California, United States of America
| | - Annie P. Chiang
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, United States of America
- Department of Pediatrics, Stanford University, Stanford, California, United States of America
- Lucile Packard Children's Hospital, Palo Alto, California, United States of America
| | - Rong Chen
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, United States of America
- Department of Pediatrics, Stanford University, Stanford, California, United States of America
- Lucile Packard Children's Hospital, Palo Alto, California, United States of America
| | - Trevor J. Hastie
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Atul J. Butte
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, United States of America
- Department of Pediatrics, Stanford University, Stanford, California, United States of America
- Lucile Packard Children's Hospital, Palo Alto, California, United States of America
- * E-mail:
| |
Collapse
|
475
|
Ruepp A, Kowarsch A, Schmidl D, Buggenthin F, Brauner B, Dunger I, Fobo G, Frishman G, Montrone C, Theis FJ. PhenomiR: a knowledgebase for microRNA expression in diseases and biological processes. Genome Biol 2010; 11:R6. [PMID: 20089154 PMCID: PMC2847718 DOI: 10.1186/gb-2010-11-1-r6] [Citation(s) in RCA: 218] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2009] [Revised: 12/03/2009] [Accepted: 01/20/2010] [Indexed: 12/19/2022] Open
Abstract
In recent years, microRNAs have been shown to play important roles in physiological as well as malignant processes. The PhenomiR database http://mips.helmholtz-muenchen.de/phenomir provides data from 542 studies that investigate deregulation of microRNA expression in diseases and biological processes as a systematic, manually curated resource. Using the PhenomiR dataset, we could demonstrate that, depending on disease type, independent information from cell culture studies contrasts with conclusions drawn from patient studies.
Collapse
Affiliation(s)
- Andreas Ruepp
- Institute for Bioinformatics and Systems Biology (MIPS), Helmholtz Center Munich - German Research Center for Environmental Health (GmbH), Ingolstädter Landstrasse 1, D-85764 Neuherberg, Germany.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
476
|
Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Landsman D, Lipman DJ, Lu Z, Madden TL, Madej T, Maglott DR, Marchler-Bauer A, Miller V, Mizrachi I, Ostell J, Panchenko A, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Shumway M, Sirotkin K, Slotta D, Souvorov A, Starchenko G, Tatusova TA, Wagner L, Wang Y, John Wilbur W, Yaschenko E, Ye J. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2010; 38:D5-16. [PMID: 19910364 PMCID: PMC2808881 DOI: 10.1093/nar/gkp967] [Citation(s) in RCA: 374] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2009] [Revised: 10/06/2009] [Accepted: 10/13/2009] [Indexed: 12/23/2022] Open
Abstract
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, Reference Sequence, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Peptidome, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Collapse
Affiliation(s)
- Eric W Sayers
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
477
|
Musumeci L, Arthur JW, Cheung FSG, Hoque A, Lippman S, Reichardt JKV. Single nucleotide differences (SNDs) in the dbSNP database may lead to errors in genotyping and haplotyping studies. Hum Mutat 2010; 31:67-73. [PMID: 19877174 PMCID: PMC2797835 DOI: 10.1002/humu.21137] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The creation of single nucleotide polymorphism (SNP) databases (such as NCBI dbSNP) has facilitated scientific research in many fields. SNP discovery and detection has improved to the extent that there are over 17 million human reference (rs) SNPs reported to date (Build 129 of dbSNP). SNP databases are unfortunately not always complete and/or accurate. In fact, half of the reported SNPs are still only candidate SNPs and are not validated in a population. We describe the identification of SNDs (single nucleotide differences) in humans, that may contaminate the dbSNP database. These SNDs, reported as real SNPs in the database, do not exist as such, but are merely artifacts due to the presence of a paralogue (highly similar duplicated) sequence in the genome. Using sequencing we showed how SNDs could originate in two paralogous genes and evaluated samples from a population of 100 individuals for the presence/absence of SNPs. Moreover, using bioinformatics, we predicted as many as 8.32% of the biallelic, coding SNPs in the dbSNP database to be SNDs. Our identification of SNDs in the database will allow researchers to not only select truly informative SNPs for association studies, but also aid in determining accurate SNP genotypes and haplotypes.
Collapse
Affiliation(s)
- Lucia Musumeci
- Plunkett Chair of Molecular Biology (Medicine), Bosch Institute, The University of Sydney, Medical Foundation Building (K25), 92–94 Parramatta Road, Camperdown, NSW 2006, Australia
| | - Jonathan W Arthur
- Discipline of Medicine, Sydney Medical School, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Bioinformatics, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Florence SG Cheung
- Plunkett Chair of Molecular Biology (Medicine), Bosch Institute, The University of Sydney, Medical Foundation Building (K25), 92–94 Parramatta Road, Camperdown, NSW 2006, Australia
| | - Ashraful Hoque
- The University of Texas M. D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Scott Lippman
- The University of Texas M. D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Juergen KV Reichardt
- Plunkett Chair of Molecular Biology (Medicine), Bosch Institute, The University of Sydney, Medical Foundation Building (K25), 92–94 Parramatta Road, Camperdown, NSW 2006, Australia
| |
Collapse
|
478
|
Huss JW, Lindenbaum P, Martone M, Roberts D, Pizarro A, Valafar F, Hogenesch JB, Su AI. The Gene Wiki: community intelligence applied to human gene annotation. Nucleic Acids Res 2010; 38:D633-9. [PMID: 19755503 PMCID: PMC2808918 DOI: 10.1093/nar/gkp760] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2009] [Revised: 08/26/2009] [Accepted: 08/29/2009] [Indexed: 12/31/2022] Open
Abstract
Annotating the function of all human genes is a critical, yet formidable, challenge. Current gene annotation efforts focus on centralized curation resources, but it is increasingly clear that this approach does not scale with the rapid growth of the biomedical literature. The Gene Wiki utilizes an alternative and complementary model based on the principle of community intelligence. Directly integrated within the online encyclopedia, Wikipedia, the goal of this effort is to build a gene-specific review article for every gene in the human genome, where each article is collaboratively written, continuously updated and community reviewed. Previously, we described the creation of Gene Wiki 'stubs' for approximately 9000 human genes. Here, we describe ongoing systematic improvements to these articles to increase their utility. Moreover, we retrospectively examine the community usage and improvement of the Gene Wiki, providing evidence of a critical mass of users and editors. Gene Wiki articles are freely accessible within the Wikipedia web site, and additional links and information are available at http://en.wikipedia.org/wiki/Portal:Gene_Wiki.
Collapse
Affiliation(s)
- Jon W. Huss
- Genomics Institute of the Novartis Research Foundation, San Diego, CA 92121, USA, Department of Bioinformatics, CEPH/Fondation Jean-Dausset, Paris, France, Rush University Medical College, Chicago, IL 60612, San Diego State University, Bioinformatics and Medical Informatics Graduate Program, San Diego, CA 92182 and Department of Pharmacology, Institute for Translational Medicine and Therapeutics, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA
| | - Pierre Lindenbaum
- Genomics Institute of the Novartis Research Foundation, San Diego, CA 92121, USA, Department of Bioinformatics, CEPH/Fondation Jean-Dausset, Paris, France, Rush University Medical College, Chicago, IL 60612, San Diego State University, Bioinformatics and Medical Informatics Graduate Program, San Diego, CA 92182 and Department of Pharmacology, Institute for Translational Medicine and Therapeutics, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA
| | - Michael Martone
- Genomics Institute of the Novartis Research Foundation, San Diego, CA 92121, USA, Department of Bioinformatics, CEPH/Fondation Jean-Dausset, Paris, France, Rush University Medical College, Chicago, IL 60612, San Diego State University, Bioinformatics and Medical Informatics Graduate Program, San Diego, CA 92182 and Department of Pharmacology, Institute for Translational Medicine and Therapeutics, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA
| | - Donabel Roberts
- Genomics Institute of the Novartis Research Foundation, San Diego, CA 92121, USA, Department of Bioinformatics, CEPH/Fondation Jean-Dausset, Paris, France, Rush University Medical College, Chicago, IL 60612, San Diego State University, Bioinformatics and Medical Informatics Graduate Program, San Diego, CA 92182 and Department of Pharmacology, Institute for Translational Medicine and Therapeutics, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA
| | - Angel Pizarro
- Genomics Institute of the Novartis Research Foundation, San Diego, CA 92121, USA, Department of Bioinformatics, CEPH/Fondation Jean-Dausset, Paris, France, Rush University Medical College, Chicago, IL 60612, San Diego State University, Bioinformatics and Medical Informatics Graduate Program, San Diego, CA 92182 and Department of Pharmacology, Institute for Translational Medicine and Therapeutics, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA
| | - Faramarz Valafar
- Genomics Institute of the Novartis Research Foundation, San Diego, CA 92121, USA, Department of Bioinformatics, CEPH/Fondation Jean-Dausset, Paris, France, Rush University Medical College, Chicago, IL 60612, San Diego State University, Bioinformatics and Medical Informatics Graduate Program, San Diego, CA 92182 and Department of Pharmacology, Institute for Translational Medicine and Therapeutics, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA
| | - John B. Hogenesch
- Genomics Institute of the Novartis Research Foundation, San Diego, CA 92121, USA, Department of Bioinformatics, CEPH/Fondation Jean-Dausset, Paris, France, Rush University Medical College, Chicago, IL 60612, San Diego State University, Bioinformatics and Medical Informatics Graduate Program, San Diego, CA 92182 and Department of Pharmacology, Institute for Translational Medicine and Therapeutics, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA
| | - Andrew I. Su
- Genomics Institute of the Novartis Research Foundation, San Diego, CA 92121, USA, Department of Bioinformatics, CEPH/Fondation Jean-Dausset, Paris, France, Rush University Medical College, Chicago, IL 60612, San Diego State University, Bioinformatics and Medical Informatics Graduate Program, San Diego, CA 92182 and Department of Pharmacology, Institute for Translational Medicine and Therapeutics, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA
| |
Collapse
|
479
|
Maggi N, Arrigo P, Ruggiero C. SNP analysis of Rac1 For personalized ligand interaction. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2010; 2010:1779-1782. [PMID: 21096420 DOI: 10.1109/iembs.2010.5626750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
This paper addresses mutational events that give rise to differing response to drugs focusing on Rac1, a protein that has been recognized as a target for drug design for cardiovascular disease due its regulatory role of angiogenesis. Rac1 has been considered with reference to Single Nucleotide Polymorphism (SNP), which has become of great value for personalized medicine. We have considered four variation of Rac1 registered in UNIPROTKB. Two of these variations are due to the environmental or population factors and two are mutation that we have selected because they are located near the binding sites of Rac1. Rac1 has been modelled by Rosetta software and by i-Tasser web server. We have chosen i-Tasser based modelling because the Rac1 structure obtained was more closely resembling crystallography data. In silico model have been used as receptors for docking with a set of 20 morpholines. The results that have been obtained on SNPs shows that a single ligand can react very differently with a mutated structure. Our analysis shows that all mutations that have been considered change Rac1 conformation and increase the accessible surface of Rac1. Our analysis highlights the effect of two sources of genetic variability: single base variation and alternative splicing.
Collapse
Affiliation(s)
- Norbert Maggi
- Department of Communication, Computer and System Sciences, Nanobiotechnology and Medical Informatics Laboratory University of Genoa, Via all'Opera Pia, 13, 16145, Italy
| | | | | |
Collapse
|
480
|
Syed AS, D’Antonio M, Ciccarelli FD. Network of Cancer Genes: a web resource to analyze duplicability, orthology and network properties of cancer genes. Nucleic Acids Res 2010; 38:D670-5. [PMID: 19906700 PMCID: PMC2808873 DOI: 10.1093/nar/gkp957] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2009] [Revised: 10/02/2009] [Accepted: 10/13/2009] [Indexed: 01/19/2023] Open
Abstract
The Network of Cancer Genes (NCG) collects and integrates data on 736 human genes that are mutated in various types of cancer. For each gene, NCG provides information on duplicability, orthology, evolutionary appearance and topological properties of the encoded protein in a comprehensive version of the human protein-protein interaction network. NCG also stores information on all primary interactors of cancer proteins, thus providing a complete overview of 5357 proteins that constitute direct and indirect determinants of human cancer. With the constant delivery of results from the mutational screenings of cancer genomes, NCG represents a versatile resource for retrieving detailed information on particular cancer genes, as well as for identifying common properties of precompiled lists of cancer genes. NCG is freely available at: http://bio.ifom-ieo-campus.it/ncg.
Collapse
Affiliation(s)
| | | | - Francesca D. Ciccarelli
- Department of Experimental Oncology, European Institute of Oncology, IFOM-IEO Campus, Via Adamello 16, 20139 Milan, Italy
| |
Collapse
|
481
|
Abstract
Biomedical researchers studying gene function should consider the impact of variation, even if genetics is not the primary objective of an investigation. Information on genetic variation can provide a valuable insight into the functional range and critical regions of a gene, protein or regulatory element. Genetic variants may be diverse in nature, ranging from single nucleotide variants, tandem repeats, small insertions or deletions to large copy number variants. Until recently, information on genetic variation was quite limited, but now a range of large scale surveys of variation have made plentiful data on common variation and a picture is beginning to emerge from the driving forces in human evolution and population diversification. Next-generation sequencing technologies are moving knowledge into a new phase focused on the individual genome and complete disclosure of individual variation, including the rarest of variants. The consequences of these advances in medicine are unresolved, but it is clear that biomedical researchers cannot afford to ignore this information. This review presents a broad overview of the in silico methods that will allow a researcher to quickly review known variation in a gene of interest, providing some pointers for further investigation.
Collapse
|
482
|
[Genetic mutation databases: stakes and perspectives for orphan genetic diseases]. PATHOLOGIE-BIOLOGIE 2009; 58:387-95. [PMID: 19954899 DOI: 10.1016/j.patbio.2009.09.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2009] [Accepted: 09/14/2009] [Indexed: 12/30/2022]
Abstract
New technologies, which constantly become available for mutation detection and gene analysis, have contributed to an exponential rate of discovery of disease genes and variation in the human genome. The task of collecting and documenting this enormous amount of data in genetic databases represents a major challenge for the future of biological and medical science. The Locus Specific Databases (LSDBs) are so far the most efficient mutation databases. This review presents the main types of databases available for the analysis of mutations responsible for genetic disorders, as well as open perspectives for new therapeutic research or challenges for future medicine. Accurate and exhaustive collection of variations in human genomes will be crucial for research and personalized delivery of healthcare.
Collapse
|
483
|
Kim WY, Kim SY, Kim TH, Ahn SM, Byun HN, Kim D, Kim DS, Lee YS, Ghang H, Park D, Kim BC, Kim C, Lee S, Kim SJ, Bhak J. Gevab: a prototype genome variation analysis browsing server. BMC Bioinformatics 2009; 10 Suppl 15:S3. [PMID: 19958513 PMCID: PMC2788354 DOI: 10.1186/1471-2105-10-s15-s3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Background The first Korean individual diploid genome sequence data (KOREF) was publicized in December 2008. Results A Korean genome variation analysis and browsing server (Gevab) was constructed as a database and web server for the exploration and downloading of Korean personal genome(s). Information in the Gevab includes SNPs, short indels, and structural variation (SV) and comparison analysis between the NCBI human reference and the Korean genome(s). The user can find information on assembled consensus sequences, sequenced short reads, genetic variations, and relationships between genotype and phenotypes. Conclusion This server is openly and publicly available online at http://koreagenome.org/en/ or directly http://gevab.org.
Collapse
Affiliation(s)
- Woo-Yeon Kim
- Korean BioInformation Center (KOBIC), KRIBB, Daejeon, Korea.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
484
|
Khan JM, Ranganathan S. A multi-species comparative structural bioinformatics analysis of inherited mutations in alpha-D-mannosidase reveals strong genotype-phenotype correlation. BMC Genomics 2009; 10 Suppl 3:S33. [PMID: 19958498 PMCID: PMC2788387 DOI: 10.1186/1471-2164-10-s3-s33] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Background Lysosomal α-mannosidase is an enzyme that acts to degrade N-linked oligosaccharides and hence plays an important role in mannose metabolism in humans and other mammalian species, especially livestock. Mutations in the gene (MAN2B1) encoding lysosomal α-D-mannosidase cause improper coding, resulting in dysfunctional or non-functional protein, causing the disease α-mannosidosis. Mapping disease mutations to the structure of the protein can help in understanding the functional consequences of these mutations and thus indirectly, the finer aspects of the pathology and clinical manifestations of the disease, including phenotypic severity as a function of the genotype. Results A comprehensive homology modeling study of all the wild-type and inherited mutations of lysosomal α-mannosidase in four different species, human, cow, cat and guinea pig, reveals a significant correlation between the severity of the genotype and the phenotype in α-mannosidosis. We used the X-ray crystallographic structure of bovine lysosomal α-mannosidase as template, containing only two disulphide bonds and some ligands, to build structural models of wild-type structures with four disulfide linkages and all bound ligands. These wild-type models were then used as templates for disease mutations. All the truncations and substitutions involving the residues in and around the active site and those that destabilize the fold led to severe genotypes resulting in lethal phenotypes, whereas the mutations lying away from the active site were milder in both their genotypic and phenotypic expression. Conclusion Based on the co-location of mutations from different organisms and their proximity to the enzyme active site, we have extrapolated observed mutations from one species to homologous positions in other organisms, as a predictive approach for detecting likely α-mannosidosis. Besides predicting new disease mutations, this approach also provides a way for detecting mutation hotspots in the gene, where novel mutations could be implicated in disease. The current study has identified five mutational hot-spot regions along the MAN2B1 gene. Structural mapping can thus provide a rational approach for predicting the phenotype of a disease, based on observed genotypic variations.
Collapse
Affiliation(s)
- Javed Mohammed Khan
- Department of Chemistry and Biomolecular Sciences and ARC center of excellence in Bioinformatics, Macquarie University, NSW 2109, Australia.
| | | |
Collapse
|
485
|
Abstract
The University of California Santa Cruz (UCSC) Genome Browser is a popular Web-based tool for quickly displaying a requested portion of a genome at any scale, accompanied by a series of aligned annotation "tracks." The annotations-generated by the UCSC Genome Bioinformatics Group and external collaborators-display gene predictions, mRNA and expressed sequence tag alignments, simple nucleotide polymorphisms, expression and regulatory data, phenotype and variation data, and pairwise and multiple-species comparative genomics data. All information relevant to a region is presented in one window, facilitating biological analysis and interpretation. The database tables underlying the Genome Browser tracks can be viewed, downloaded, and manipulated using another Web-based application, the UCSC Table Browser. Users can upload data as custom annotation tracks in both browsers for research or educational use. This unit describes how to use the Genome Browser and Table Browser for genome analysis, download the underlying database tables, and create and display custom annotation tracks.
Collapse
Affiliation(s)
- Donna Karolchik
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, Phone: (831) 459-1571, Fax: (831) 459-1809
| | - Angie S. Hinrichs
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, Phone: (831) 459-1544, Fax: (831) 459-1809
| | - W. James Kent
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, Phone: (831) 459-1401, Fax: (831) 459-1809
| |
Collapse
|
486
|
Yamasaki C, Murakami K, Takeda JI, Sato Y, Noda A, Sakate R, Habara T, Nakaoka H, Todokoro F, Matsuya A, Imanishi T, Gojobori T. H-InvDB in 2009: extended database and data mining resources for human genes and transcripts. Nucleic Acids Res 2009; 38:D626-32. [PMID: 19933760 PMCID: PMC2808976 DOI: 10.1093/nar/gkp1020] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We report the extended database and data mining resources newly released in the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). H-InvDB is a comprehensive annotation resource of human genes and transcripts, and consists of two main views and six sub-databases. The latest release of H-InvDB (release 6.2) provides the annotation for 219,765 human transcripts in 43,159 human gene clusters based on human full-length cDNAs and mRNAs. H-InvDB now provides several new annotation features, such as mapping of microarray probes, new gene models, relation to known ncRNAs and information from the Glycogene database. H-InvDB also provides useful data mining resources-'Navigation search', 'H-InvDB Enrichment Analysis Tool (HEAT)' and web service APIs. 'Navigation search' is an extended search system that enables complicated searches by combining 16 different search options. HEAT is a data mining tool for automatically identifying features specific to a given human gene set. HEAT searches for H-InvDB annotations that are significantly enriched in a user-defined gene set, as compared with the entire H-InvDB representative transcripts. H-InvDB now has web service APIs of SOAP and REST to allow the use of H-InvDB data in programs, providing the users extended data accessibility.
Collapse
Affiliation(s)
- Chisato Yamasaki
- BIRC, NIG Waterfront Bio-IT Research Building, 4-7 Aomi, Tokyo 135-0064, Japan
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
487
|
Schlicker A, Albrecht M. FunSimMat update: new features for exploring functional similarity. Nucleic Acids Res 2009; 38:D244-8. [PMID: 19923227 PMCID: PMC2808991 DOI: 10.1093/nar/gkp979] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Quantifying the functional similarity of genes and their products based on Gene Ontology annotation is an important tool for diverse applications like the analysis of gene expression data, the prediction and validation of protein functions and interactions, and the prioritization of disease genes. The Functional Similarity Matrix (FunSimMat, http://www.funsimmat.de) is a comprehensive database providing various precomputed functional similarity values for proteins in UniProtKB and for protein families in Pfam and SMART. With this update, we significantly increase the coverage of FunSimMat by adding data from the Gene Ontology Annotation project as well as new functional similarity measures. The applicability of the database is greatly extended by the implementation of a new Gene Ontology-based method for disease gene prioritization. Two new visualization tools allow an interactive analysis of the functional relationships between proteins or protein families. This is enhanced further by the introduction of an automatically derived hierarchy of annotation classes. Additional changes include a revised user front-end and a new RESTlike interface for improving the user-friendliness and online accessibility of FunSimMat.
Collapse
Affiliation(s)
- Andreas Schlicker
- Max Planck Institute for Informatics, Campus E1.4, 66123 Saarbrücken, Germany.
| | | |
Collapse
|
488
|
Sadreyev RI, Feramisco JD, Tsao H, Grishin NV. Phenotypic categorization of genetic skin diseases reveals new relations between phenotypes, genes and pathways. ACTA ACUST UNITED AC 2009; 25:2891-6. [PMID: 19744994 PMCID: PMC2773259 DOI: 10.1093/bioinformatics/btp538] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Motivation: Systematic analysis of connection between proteins, their cellular function and phenotypic manifestations in disease is a central problem of biological and clinical research. The solution to this problem requires the development of new approaches to link the rapidly growing dataset of gene–disease associations with the many complex and overlapping phenotypes of human disease. Results: We analyze genetic skin disorders and suggest a manually designed set of elementary phenotypes whose combinations define diseases as points in a multidimensional space, providing a basis for phenotypic disease clustering. Placing the known gene–disease associations in the context of this space reveals new patterns that suggest previously unknown functional links between proteins, signaling pathways and disease phenotypes. For example, analysis of telangiectasias (spider vein diseases) reveals a previously unrecognized interplay between the TGF-β signaling pathway and pentose phosphate pathway. This interaction may mediate glucose-dependent regulation of TGF-β signaling, providing a clue to the known association between angiopathies and diabetes and implying new gene candidates for mutational analysis and drug targeting. Contact:grishin@chop.swmed.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ruslan I Sadreyev
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390-9050, USA
| | | | | | | |
Collapse
|
489
|
Burgun A, Mougin F, Bodenreider O. Two approaches to integrating phenotype and clinical information. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2009; 2009:75-79. [PMID: 20351826 PMCID: PMC2815427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Linkages between animal models of diseases and human data enable the development of translational research hypotheses. The objective of this study is to investigate two approaches to integrating phenotype and clinical information. On the one hand, we develop a terminology mapping between phenotypes from the Mammalian Phenotype Ontology (MPO) and Online Mendelian Inheritance in Man (OMIM) through the Unified Medical Language System (UMLS). On the other, we associate MPO phenotypes with OMIM manifestations through annotations made to orthologous genes. 1,469 MPO concepts (22%) were mapped successfully to some disease concept in the UMLS, of which 869 were present in OMIM. Among the 16,764 distinct MGI genes associated with human orthologs, 1,968 distinct genes were associated with both MPO and OMIM annotations. The UMLS is a valuable resource for linking phenotype terms to clinical terminologies, and these mappings between terminologies can help enrich gene annotation databases and unify phenotype representation.
Collapse
Affiliation(s)
- Anita Burgun
- INSERM U936, School of Medicine, University of Rennes 1, IFR 140, Rennes, France.
| | | | | |
Collapse
|
490
|
Dalziel AC, Rogers SM, Schulte PM. Linking genotypes to phenotypes and fitness: how mechanistic biology can inform molecular ecology. Mol Ecol 2009; 18:4997-5017. [PMID: 19912534 DOI: 10.1111/j.1365-294x.2009.04427.x] [Citation(s) in RCA: 135] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The accessibility of new genomic resources, high-throughput molecular technologies and analytical approaches such as genome scans have made finding genes contributing to fitness variation in natural populations an increasingly feasible task. Once candidate genes are identified, we argue that it is necessary to take a mechanistic approach and work up through the levels of biological organization to fully understand the impacts of genetic variation at these candidate genes. We demonstrate how this approach provides testable hypotheses about the causal links among levels of biological organization, and assists in designing relevant experiments to test the effects of genetic variation on phenotype, whole-organism performance capabilities and fitness. We review some of the research programs that have incorporated mechanistic approaches when examining naturally occurring genetic and phenotypic variation and use these examples to highlight the value of developing a comprehensive understanding of the relationship between genotype and fitness. We give suggestions to guide future research aimed at uncovering and understanding the genetic basis of adaptation and argue that further integration of mechanistic approaches will help molecular ecologists better understand the evolution of natural populations.
Collapse
Affiliation(s)
- Anne C Dalziel
- Department of Zoology, University of British Columbia, Vancouver, Canada.
| | | | | |
Collapse
|
491
|
Kim P, Yoon S, Kim N, Lee S, Ko M, Lee H, Kang H, Kim J, Lee S. ChimerDB 2.0--a knowledgebase for fusion genes updated. Nucleic Acids Res 2009; 38:D81-5. [PMID: 19906715 PMCID: PMC2808913 DOI: 10.1093/nar/gkp982] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Chromosome translocations and gene fusions are frequent events in the human genome and have been found to cause diverse types of tumor. ChimerDB is a knowledgebase of fusion genes identified from bioinformatics analysis of transcript sequences in the GenBank and various other public resources such as the Sanger cancer genome project (CGP), OMIM, PubMed and the Mitelman’s database. In this updated version, we significantly modified the algorithm of identifying fusion transcripts. Specifically, the new algorithm is more sensitive and has detected 2699 fusion transcripts with high confidence. Furthermore, it can identify interchromosomal translocations as well as the intrachromosomal deletions or inversions of large DNA segments. Importantly, results from the analysis of next-generation sequencing data in the short read archives are incorporated as well. We updated and integrated all contents (GenBank, Sanger CGP, OMIM, PubMed publications and the Mitelman’s database), and the user-interface has been improved to support diverse types of searches and to enhance the user convenience especially in browsing PubMed articles. We also developed a new alignment viewer that should facilitate examining reliability of fusion transcripts and inferring functional significance. We expect ChimerDB 2.0, available at http://ercsb.ewha.ac.kr/fusiongene, to be a valuable tool in identifying biomarkers and drug targets.
Collapse
Affiliation(s)
- Pora Kim
- Division of Life and Pharmaceutical Sciences, Ewha Research Center for Systems Biology, Ewha Womans University, Seoul 120-750, Korea
| | | | | | | | | | | | | | | | | |
Collapse
|
492
|
Rhead B, Karolchik D, Kuhn RM, Hinrichs AS, Zweig AS, Fujita PA, Diekhans M, Smith KE, Rosenbloom KR, Raney BJ, Pohl A, Pheasant M, Meyer LR, Learned K, Hsu F, Hillman-Jackson J, Harte RA, Giardine B, Dreszer TR, Clawson H, Barber GP, Haussler D, Kent WJ. The UCSC Genome Browser database: update 2010. Nucleic Acids Res 2009; 38:D613-9. [PMID: 19906737 PMCID: PMC2808870 DOI: 10.1093/nar/gkp939] [Citation(s) in RCA: 476] [Impact Index Per Article: 31.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The University of California, Santa Cruz (UCSC) Genome Browser website (http://genome.ucsc.edu/) provides a large database of publicly available sequence and annotation data along with an integrated tool set for examining and comparing the genomes of organisms, aligning sequence to genomes, and displaying and sharing users’ own annotation data. As of September 2009, genomic sequence and a basic set of annotation ‘tracks’ are provided for 47 organisms, including 14 mammals, 10 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms and a yeast. New data highlights this year include an updated human genome browser, a 44-species multiple sequence alignment track, improved variation and phenotype tracks and 16 new genome-wide ENCODE tracks. New features include drag-and-zoom navigation, a Wiki track for user-added annotations, new custom track formats for large datasets (bigBed and bigWig), a new multiple alignment output tool, links to variation and protein structure tools, in silico PCR utility enhancements, and improved track configuration tools.
Collapse
Affiliation(s)
- Brooke Rhead
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
493
|
Wang L, Xiong Y, Sun Y, Fang Z, Li L, Ji H, Shi T. HLungDB: an integrated database of human lung cancer research. Nucleic Acids Res 2009; 38:D665-9. [PMID: 19900972 PMCID: PMC2808962 DOI: 10.1093/nar/gkp945] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
The human lung cancer database (HLungDB) is a database with the integration of the lung cancer-related genes, proteins and miRNAs together with the corresponding clinical information. The main purpose of this platform is to establish a network of lung cancer-related molecules and to facilitate the mechanistic study of lung carcinogenesis. The entries describing the relationships between molecules and human lung cancer in the current release were extracted manually from literatures. Currently, we have collected 2585 genes and 212 miRNA with the experimental evidences involved in the different stages of lung carcinogenesis through text mining. Furthermore, we have incorporated the results from analysis of transcription factor-binding motifs, the promoters and the SNP sites for each gene. Since epigenetic alterations also play an important role in lung carcinogenesis, genes with epigenetic regulation were also included. We hope HLungDB will enrich our knowledge about lung cancer biology and eventually lead to the development of novel therapeutic strategies. HLungDB can be freely accessed at http://www.megabionet.org/bio/hlung.
Collapse
Affiliation(s)
- Lishan Wang
- Center for Bioinformatics and Computational Biology, and The Institute of Biomedical Sciences, College of Life Science, East China Normal University, Shanghai 200241, China
| | | | | | | | | | | | | |
Collapse
|
494
|
Baresić A, Hopcroft LEM, Rogers HH, Hurst JM, Martin ACR. Compensated pathogenic deviations: analysis of structural effects. J Mol Biol 2009; 396:19-30. [PMID: 19900462 DOI: 10.1016/j.jmb.2009.11.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2009] [Revised: 10/29/2009] [Accepted: 11/03/2009] [Indexed: 10/20/2022]
Abstract
Pathogenic deviations (PDs) in humans are disease-causing missense mutations. However, in some cases, these disease-associated residues occur as the wild-type residues in functionally equivalent proteins in other species and these cases are termed 'compensated pathogenic deviations' (CPDs). The lack of pathogenicity in a non-human protein is presumed to be explained in most cases by the presence of compensatory mutations, most commonly within the same protein. Identification of structural features of CPDs and detection of specific compensatory events will help us to understand traversal along fitness landscape valleys in protein evolution. We divided mutations listed in the OMIM (Online Mendelian Inheritance in Man) database into PD and CPD data sets and performed two independent analyses: (i) We searched for potential compensatory mutations spatially close to the CPDs and, (ii) using our SAAPdb database, we examined likely structural effects to try to explain why mutations are pathogenic, comparing PDs and CPDs. Our data sets were obtained from a set of 245 human proteins of known structure and contained a total of 2328 mutations of which 453 (from 85 structures) were seen to be compensated in at least one functionally equivalent protein in another (non-human) species. Structural analysis results confirm previous findings that CPDs are, on average, 'milder' in their likely structural effects than uncompensated PDs and tend to be on the protein surface. We also showed that the residues surrounding the CPD residue in the folded protein are more often mutated than the residues surrounding an uncompensated mutation, supporting the hypothesis that compensation is largely a result of structurally local mutations.
Collapse
Affiliation(s)
- Anja Baresić
- Institute of Structural and Molecular Biology, Darwin Building, University College London, Gower Street, London WC1E 6BT, UK
| | | | | | | | | |
Collapse
|
495
|
Zhang Y, Lv J, Liu H, Zhu J, Su J, Wu Q, Qi Y, Wang F, Li X. HHMD: the human histone modification database. Nucleic Acids Res 2009; 38:D149-54. [PMID: 19892823 PMCID: PMC2808954 DOI: 10.1093/nar/gkp968] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Histone modifications play important roles in chromatin remodeling, gene transcriptional regulation, stem cell maintenance and differentiation. Alterations in histone modifications may be linked to human diseases especially cancer. Histone modifications including methylation, acetylation and ubiquitylation probed by ChIP-seq, ChIP-chip and qChIP have become widely available. Mining and integration of histone modification data can be beneficial to novel biological discoveries. There has been no comprehensive data repository that is exclusive for human histone modifications. Therefore, we developed a relatively comprehensive database for human histone modifications. Human Histone Modification Database (HHMD, http://bioinfo.hrbmu.edu.cn/hhmd) focuses on the storage and integration of histone modification datasets that were obtained from laboratory experiments. The latest release of HHMD incorporates 43 location-specific histone modifications in human. To facilitate data extraction, flexible search options are built in HHMD. It can be searched by histone modification, gene ID, functional categories, chromosome location and cancer name. HHMD also includes a user-friendly visualization tool named HisModView, by which genome-wide histone modification map can be shown. HisModView facilitates the acquisition and visualization of histone modifications. The database also has manually curated information of histone modification dysregulation in nine human cancers.
Collapse
Affiliation(s)
- Yan Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | | | | | | | | | | | | | | | | |
Collapse
|
496
|
Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res 2009; 38:D355-60. [PMID: 19880382 PMCID: PMC2808910 DOI: 10.1093/nar/gkp896] [Citation(s) in RCA: 1633] [Impact Index Per Article: 108.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Most human diseases are complex multi-factorial diseases resulting from the combination of various genetic and environmental factors. In the KEGG database resource (http://www.genome.jp/kegg/), diseases are viewed as perturbed states of the molecular system, and drugs as perturbants to the molecular system. Disease information is computerized in two forms: pathway maps and gene/molecule lists. The KEGG PATHWAY database contains pathway maps for the molecular systems in both normal and perturbed states. In the KEGG DISEASE database, each disease is represented by a list of known disease genes, any known environmental factors at the molecular level, diagnostic markers and therapeutic drugs, which may reflect the underlying molecular system. The KEGG DRUG database contains chemical structures and/or chemical components of all drugs in Japan, including crude drugs and TCM (Traditional Chinese Medicine) formulas, and drugs in the USA and Europe. This database also captures knowledge about two types of molecular networks: the interaction network with target molecules, metabolizing enzymes, other drugs, etc. and the chemical structure transformation network in the history of drug development. The new disease/drug information resource named KEGG MEDICUS can be used as a reference knowledge base for computational analysis of molecular networks, especially, by integrating large-scale experimental datasets.
Collapse
Affiliation(s)
- Minoru Kanehisa
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan.
| | | | | | | | | |
Collapse
|
497
|
Grillo G, Turi A, Licciulli F, Mignone F, Liuni S, Banfi S, Gennarino VA, Horner DS, Pavesi G, Picardi E, Pesole G. UTRdb and UTRsite (RELEASE 2010): a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs. Nucleic Acids Res 2009; 38:D75-80. [PMID: 19880380 PMCID: PMC2808995 DOI: 10.1093/nar/gkp902] [Citation(s) in RCA: 234] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The 5' and 3' untranslated regions of eukaryotic mRNAs (UTRs) play crucial roles in the post-transcriptional regulation of gene expression through the modulation of nucleo-cytoplasmic mRNA transport, translation efficiency, subcellular localization and message stability. UTRdb is a curated database of 5' and 3' untranslated sequences of eukaryotic mRNAs, derived from several sources of primary data. Experimentally validated functional motifs are annotated and also collated as the UTRsite database where more specific information on the functional motifs and cross-links to interacting regulatory protein are provided. In the current update, the UTR entries have been organized in a gene-centric structure to better visualize and retrieve 5' and 3'UTR variants generated by alternative initiation and termination of transcription and alternative splicing. Experimentally validated miRNA targets and conserved sequence elements are also annotated. The integration of UTRdb with genomic data has allowed the implementation of an efficient annotation system and a powerful retrieval resource for the selection and extraction of specific UTR subsets. All internet resources implemented for retrieval and functional analysis of 5' and 3' untranslated regions of eukaryotic mRNAs are accessible at http://utrdb.ba.itb.cnr.it/.
Collapse
Affiliation(s)
- Giorgio Grillo
- Istituto Tecnologie Biomediche del Consiglio Nazionale delle Ricerche, via Amendola 122/D, 70126 Bari, Italy
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
498
|
Abstract
BACKGROUND One of the most recent and important developments in drug discovery is a new drug development approach of building and analyzing networks that contain relationships among drugs and targets, diseases, genes and other components. These networks and their integrations provide useful information for finding new targets as well as new drugs. OBJECTIVE This review article aims to review recent developments in various types of networks and suggest the future direction of these network studies for drug discovery. METHODS Databases and networks are integrated into a more complete network to better present the relationships among drugs, targets, genes, phenotypes and diseases. After discussing the limitations and obstacles of the recent research, we suggest several strategies to build a successful and practical drug-target network. RESULTS/CONCLUSION A useful, integrated network can be built from various databases and networks by resolving several issues, such as limited coverage and inconsistency. This integrated network can be completed by the prediction of missing links, biological network comparison and drug target identification. Possible applications are multi-target drug development, drug repurposing, estimation of drug effect on target perturbations in the whole system and extraction of the suitable purpose of the drug-target sub-network.
Collapse
Affiliation(s)
- Soyoung Lee
- KAIST, Department of Bio and Brain Engineering, 335 Gwahak-ro, Yuseong-gu, Daejeon, 305-701 Korea, Republic of Korea +82 42 350 4317 ; +82 42 350 4310 ;
| | | | | |
Collapse
|
499
|
Ng KL, Liu HC, Lee SC. ncRNAppi--a tool for identifying disease-related miRNA and siRNA targeting pathways. Bioinformatics 2009; 25:3199-201. [DOI: 10.1093/bioinformatics/btp574] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
|
500
|
Keerthikumar S, Bhadra S, Kandasamy K, Raju R, Ramachandra YL, Bhattacharyya C, Imai K, Ohara O, Mohan S, Pandey A. Prediction of candidate primary immunodeficiency disease genes using a support vector machine learning approach. DNA Res 2009; 16:345-51. [PMID: 19801557 PMCID: PMC2780952 DOI: 10.1093/dnares/dsp019] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Screening and early identification of primary immunodeficiency disease (PID) genes is a major challenge for physicians. Many resources have catalogued molecular alterations in known PID genes along with their associated clinical and immunological phenotypes. However, these resources do not assist in identifying candidate PID genes. We have recently developed a platform designated Resource of Asian PDIs, which hosts information pertaining to molecular alterations, protein-protein interaction networks, mouse studies and microarray gene expression profiling of all known PID genes. Using this resource as a discovery tool, we describe the development of an algorithm for prediction of candidate PID genes. Using a support vector machine learning approach, we have predicted 1442 candidate PID genes using 69 binary features of 148 known PID genes and 3162 non-PID genes as a training data set. The power of this approach is illustrated by the fact that six of the predicted genes have recently been experimentally confirmed to be PID genes. The remaining genes in this predicted data set represent attractive candidates for testing in patients where the etiology cannot be ascribed to any of the known PID genes.
Collapse
|