351
|
Abstract
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, the Genetic Testing Registry, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Probe, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page.
Collapse
Affiliation(s)
- NCBI Resource Coordinators
- *To whom correspondence should be addressed. Eric W. Sayers. Tel: +30 1 49 62 475; Fax: +30 1 48 09 241;
| |
Collapse
|
352
|
Rubinstein WS, Maglott DR, Lee JM, Kattman BL, Malheiro AJ, Ovetsky M, Hem V, Gorelenkov V, Song G, Wallin C, Husain N, Chitipiralla S, Katz KS, Hoffman D, Jang W, Johnson M, Karmanov F, Ukrainchik A, Denisenko M, Fomous C, Hudson K, Ostell JM. The NIH genetic testing registry: a new, centralized database of genetic tests to enable access to comprehensive information and improve transparency. Nucleic Acids Res 2012. [PMID: 23193275 PMCID: PMC3531155 DOI: 10.1093/nar/gks1173] [Citation(s) in RCA: 106] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The National Institutes of Health Genetic Testing Registry (GTR; available online at http://www.ncbi.nlm.nih.gov/gtr/) maintains comprehensive information about testing offered worldwide for disorders with a genetic basis. Information is voluntarily submitted by test providers. The database provides details of each test (e.g. its purpose, target populations, methods, what it measures, analytical validity, clinical validity, clinical utility, ordering information) and laboratory (e.g. location, contact information, certifications and licenses). Each test is assigned a stable identifier of the format GTR000000000, which is versioned when the submitter updates information. Data submitted by test providers are integrated with basic information maintained in National Center for Biotechnology Information’s databases and presented on the web and through FTP (ftp.ncbi.nih.gov/pub/GTR/_README.html).
Collapse
Affiliation(s)
- Wendy S Rubinstein
- National Institutes of Health, National Library of Medicine, National Center for Biotechnology Information, Bethesda, MD 20894, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
353
|
Gray KA, Daugherty LC, Gordon SM, Seal RL, Wright MW, Bruford EA. Genenames.org: the HGNC resources in 2013. Nucleic Acids Res 2012; 41:D545-52. [PMID: 23161694 PMCID: PMC3531211 DOI: 10.1093/nar/gks1066] [Citation(s) in RCA: 193] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
The HUGO Gene Nomenclature Committee situated at the European Bioinformatics Institute assigns unique symbols and names to human genes. Since 2011, the data within our database has expanded largely owing to an increase in naming pseudogenes and non-coding RNA genes, and we now have >33,500 approved symbols. Our gene families and groups have also increased to nearly 500, with ∼45% of our gene entries associated to at least one family or group. We have also redesigned the HUGO Gene Nomenclature Committee website http://www.genenames.org creating a constant look and feel across the site and improving usability and readability for our users. The site provides a public access portal to our database with no restrictions imposed on access or the use of the data. Within this article, we review our online resources and data with particular emphasis on the updates to our website.
Collapse
Affiliation(s)
- Kristian A Gray
- HUGO Gene Nomenclature Committee, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.
| | | | | | | | | | | |
Collapse
|
354
|
A New Insight into Structural and Functional Impact of Single-Nucleotide Polymorphisms in PTEN Gene. Cell Biochem Biophys 2012; 66:249-63. [DOI: 10.1007/s12013-012-9472-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
355
|
Abstract
Molecular network data are increasingly becoming available, necessitating the development of well performing computational tools for their analyses. Such tools enabled conceptually different approaches for exploring human diseases to be undertaken, in particular, those that study the relationship between a multitude of biomolecules within a cell. Hence, a new field of network biology has emerged as part of systems biology, aiming to untangle the complexity of cellular network organization. We survey current network analysis methods that aim to give insight into human disease.
Collapse
Affiliation(s)
- Vuk Janjić
- Department of Computing, Imperial College London, 180 Queen's Gate, SW7 2AZ London, UK
| | | |
Collapse
|
356
|
Panda A, Begum T, Ghosh TC. Insights into the evolutionary features of human neurodegenerative diseases. PLoS One 2012; 7:e48336. [PMID: 23118989 PMCID: PMC3484049 DOI: 10.1371/journal.pone.0048336] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2012] [Accepted: 09/24/2012] [Indexed: 02/06/2023] Open
Abstract
Comparative analyses between human disease and non-disease genes are of great interest in understanding human disease gene evolution. However, the progression of neurodegenerative diseases (NDD) involving amyloid formation in specific brain regions is still unknown. Therefore, in this study, we mainly focused our analysis on the evolutionary features of human NDD genes with respect to non-disease genes. Here, we observed that human NDD genes are evolutionarily conserved relative to non-disease genes. To elucidate the conserved nature of NDD genes, we incorporated the evolutionary attributes like gene expression level, number of regulatory miRNAs, protein connectivity, intrinsic disorder content and relative aggregation propensity in our analysis. Our studies demonstrate that NDD genes have higher gene expression levels in favor of their lower evolutionary rates. Additionally, we observed that NDD genes have higher number of different regulatory miRNAs target sites and also have higher interaction partners than the non-disease genes. Moreover, miRNA targeted genes are known to have higher disorder content. In contrast, our analysis exclusively established that NDD genes have lower disorder content. In favor of our analysis, we found that NDD gene encoded proteins are enriched with multi interface hubs (party hubs) with lower disorder contents. Since, proteins with higher disorder content need to adapt special structure to reduce their aggregation propensity, NDD proteins found to have elevated relative aggregation propensity (RAP) in support of their lower disorder content. Finally, our categorical regression analysis confirmed the underlined relative dominance of protein connectivity, 3'UTR length, RAP, nature of hubs (singlish/multi interface) and disorder content for such evolutionary rates variation between human NDD genes and non-disease genes.
Collapse
Affiliation(s)
- Arup Panda
- Bioinformatics Centre, Bose Institute, Kolkata, India
| | - Tina Begum
- Bioinformatics Centre, Bose Institute, Kolkata, India
| | | |
Collapse
|
357
|
Doelken SC, Köhler S, Mungall CJ, Gkoutos GV, Ruef BJ, Smith C, Smedley D, Bauer S, Klopocki E, Schofield PN, Westerfield M, Robinson PN, Lewis SE. Phenotypic overlap in the contribution of individual genes to CNV pathogenicity revealed by cross-species computational analysis of single-gene mutations in humans, mice and zebrafish. Dis Model Mech 2012; 6:358-72. [PMID: 23104991 PMCID: PMC3597018 DOI: 10.1242/dmm.010322] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Numerous disease syndromes are associated with regions of copy number variation (CNV) in the human genome and, in most cases, the pathogenicity of the CNV is thought to be related to altered dosage of the genes contained within the affected segment. However, establishing the contribution of individual genes to the overall pathogenicity of CNV syndromes is difficult and often relies on the identification of potential candidates through manual searches of the literature and online resources. We describe here the development of a computational framework to comprehensively search phenotypic information from model organisms and single-gene human hereditary disorders, and thus speed the interpretation of the complex phenotypes of CNV disorders. There are currently more than 5000 human genes about which nothing is known phenotypically but for which detailed phenotypic information for the mouse and/or zebrafish orthologs is available. Here, we present an ontology-based approach to identify similarities between human disease manifestations and the mutational phenotypes in characterized model organism genes; this approach can therefore be used even in cases where there is little or no information about the function of the human genes. We applied this algorithm to detect candidate genes for 27 recurrent CNV disorders and identified 802 gene-phenotype associations, approximately half of which involved genes that were previously reported to be associated with individual phenotypic features and half of which were novel candidates. A total of 431 associations were made solely on the basis of model organism phenotype data. Additionally, we observed a striking, statistically significant tendency for individual disease phenotypes to be associated with multiple genes located within a single CNV region, a phenomenon that we denote as pheno-clustering. Many of the clusters also display statistically significant similarities in protein function or vicinity within the protein-protein interaction network. Our results provide a basis for understanding previously un-interpretable genotype-phenotype correlations in pathogenic CNVs and for mobilizing the large amount of model organism phenotype data to provide insights into human genetic disorders.
Collapse
Affiliation(s)
- Sandra C Doelken
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
358
|
Goh KI, Choi IG. Exploring the human diseasome: the human disease network. Brief Funct Genomics 2012; 11:533-42. [PMID: 23063808 DOI: 10.1093/bfgp/els032] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Advances in genome-scale molecular biology and molecular genetics have greatly elevated our knowledge on the basic components of human biology and diseases. At the same time, the importance of cellular networks between those biological components is increasingly appreciated. Built upon these recent technological and conceptual advances, a new discipline called the network medicine, an approach to understand human diseases from a network point-of-view, is about to emerge. In this review article, we will survey some recent endeavours along this direction, centred on the concept and applications of the human diseasome and the human disease network. Questions, and partial answers thereof, such as how the connectivity between molecular parts translates into the relationships between the related disorders on a global scale and how central the disease-causing genetic components are in the cellular network, will be discussed. The use of the diseasome in combination with various interactome networks and other disease-related factors is also reviewed.
Collapse
Affiliation(s)
- Kwang-Il Goh
- Department of Physics, Korea University, Seoul 136-713, Korea.
| | | |
Collapse
|
359
|
Doss CGP, Rajith B, Garwasis N, Mathew PR, Raju AS, Apoorva K, William D, Sadhana NR, Himani T, Dike IP. Screening of mutations affecting protein stability and dynamics of FGFR1-A simulation analysis. Appl Transl Genom 2012; 1:37-43. [PMID: 27896051 PMCID: PMC5121281 DOI: 10.1016/j.atg.2012.06.002] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2012] [Revised: 06/17/2012] [Accepted: 06/21/2012] [Indexed: 12/11/2022]
Abstract
Single amino acid substitutions in Fibroblast Growth Factor Receptor 1 (FGFR1) destabilize protein and have been implicated in several genetic disorders like various forms of cancer, Kallamann syndrome, Pfeiffer syndrome, Jackson Weiss syndrome, etc. In order to gain functional insight into mutation caused by amino acid substitution to protein function and expression, special emphasis was laid on molecular dynamics simulation techniques in combination with in silico tools such as SIFT, PolyPhen 2.0, I-Mutant 3.0 and SNAP. It has been estimated that 68% nsSNPs were predicted to be deleterious by I-Mutant, slightly higher than SIFT (37%), PolyPhen 2.0 (61%) and SNAP (58%). From the observed results, P722S mutation was found to be most deleterious by comparing results of all in silico tools. By molecular dynamics approach, we have shown that P722S mutation leads to increase in flexibility, and deviated more from the native structure which was supported by the decrease in the number of hydrogen bonds. In addition, biophysical analysis revealed a clear insight of stability loss due to P722S mutation in FGFR1 protein. Majority of mutations predicted by these in silico tools were in good concordance with the experimental results.
Collapse
Key Words
- FGFR1
- FGFR1, Fibroblast growth factor type 1
- GD, Grantham Deviation
- GV, Grantham Variance
- MSA, Multiple Sequence Alignments
- Molecular dynamics simulation
- NCBI, National Center for Biological Information
- OMIM, Online Mendelian Inheritance in Man
- PolyPhen 2.0, Polymorphism Phenotyping
- RI, Reliability Index
- RMSD, Root Mean Square Deviation
- RMSF, Root Mean Square Fluctuation
- SIFT, Sorting Intolerant From Tolerant
- SNAP, Screening for Non acceptable Polymorphisms
- SNPs
- SNPs, Single Nucleotide Polymorphisms
- SPC, Simple Point Charge
Collapse
Affiliation(s)
- C George Priya Doss
- Centre for Nanobiotechnology, Medical Biotechnology Division, School of Biosciences and Technology, VIT University, Vellore 632014, Tamil Nadu, India
| | - B Rajith
- Centre for Nanobiotechnology, Medical Biotechnology Division, School of Biosciences and Technology, VIT University, Vellore 632014, Tamil Nadu, India
| | - Nimisha Garwasis
- Centre for Nanobiotechnology, Medical Biotechnology Division, School of Biosciences and Technology, VIT University, Vellore 632014, Tamil Nadu, India
| | - Pretty Raju Mathew
- Centre for Nanobiotechnology, Medical Biotechnology Division, School of Biosciences and Technology, VIT University, Vellore 632014, Tamil Nadu, India
| | - Anand Solomon Raju
- Centre for Nanobiotechnology, Medical Biotechnology Division, School of Biosciences and Technology, VIT University, Vellore 632014, Tamil Nadu, India
| | - K Apoorva
- Centre for Nanobiotechnology, Medical Biotechnology Division, School of Biosciences and Technology, VIT University, Vellore 632014, Tamil Nadu, India
| | - Denise William
- Centre for Nanobiotechnology, Medical Biotechnology Division, School of Biosciences and Technology, VIT University, Vellore 632014, Tamil Nadu, India
| | - N R Sadhana
- Centre for Nanobiotechnology, Medical Biotechnology Division, School of Biosciences and Technology, VIT University, Vellore 632014, Tamil Nadu, India
| | - Tanwar Himani
- Centre for Nanobiotechnology, Medical Biotechnology Division, School of Biosciences and Technology, VIT University, Vellore 632014, Tamil Nadu, India
| | - I P Dike
- Department of Biological Sciences, Covenant University, Nigeria
| |
Collapse
|
360
|
Kitzman JO, Snyder MW, Ventura M, Lewis AP, Qiu R, Simmons LE, Gammill HS, Rubens CE, Santillan DA, Murray JC, Tabor HK, Bamshad MJ, Eichler EE, Shendure J. Noninvasive whole-genome sequencing of a human fetus. Sci Transl Med 2012; 4:137ra76. [PMID: 22674554 DOI: 10.1126/scitranslmed.3004323] [Citation(s) in RCA: 265] [Impact Index Per Article: 20.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Analysis of cell-free fetal DNA in maternal plasma holds promise for the development of noninvasive prenatal genetic diagnostics. Previous studies have been restricted to detection of fetal trisomies, to specific paternally inherited mutations, or to genotyping common polymorphisms using material obtained invasively, for example, through chorionic villus sampling. Here, we combine genome sequencing of two parents, genome-wide maternal haplotyping, and deep sequencing of maternal plasma DNA to noninvasively determine the genome sequence of a human fetus at 18.5 weeks of gestation. Inheritance was predicted at 2.8 × 10(6) parental heterozygous sites with 98.1% accuracy. Furthermore, 39 of 44 de novo point mutations in the fetal genome were detected, albeit with limited specificity. Subsampling these data and analyzing a second family trio by the same approach indicate that parental haplotype blocks of ~300 kilo-base pairs combined with shallow sequencing of maternal plasma DNA is sufficient to substantially determine the inherited complement of a fetal genome. However, ultradeep sequencing of maternal plasma DNA is necessary for the practical detection of fetal de novo mutations genome-wide. Although technical and analytical challenges remain, we anticipate that noninvasive analysis of inherited variation and de novo mutations in fetal genomes will facilitate prenatal diagnosis of both recessive and dominant Mendelian disorders.
Collapse
Affiliation(s)
- Jacob O Kitzman
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
361
|
Hardison RC. Genome-wide epigenetic data facilitate understanding of disease susceptibility association studies. J Biol Chem 2012; 287:30932-40. [PMID: 22952232 PMCID: PMC3438926 DOI: 10.1074/jbc.r112.352427] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Complex traits such as susceptibility to diseases are determined in part by variants at multiple genetic loci. Genome-wide association studies can identify these loci, but most phenotype-associated variants lie distal to protein-coding regions and are likely involved in regulating gene expression. Understanding how these genetic variants affect complex traits depends on the ability to predict and test the function of the genomic elements harboring them. Community efforts such as the ENCODE Project provide a wealth of data about epigenetic features associated with gene regulation. These data enable the prediction of testable functions for many phenotype-associated variants.
Collapse
Affiliation(s)
- Ross C Hardison
- Department of Biochemistry and Molecular Biology and Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16801, USA.
| |
Collapse
|
362
|
Abstract
The UCSC Genome Browser (http://genome.ucsc.edu) is a graphical viewer for genomic data now in its 13th year. Since the early days of the Human Genome Project, it has presented an integrated view of genomic data of many kinds. Now home to assemblies for 58 organisms, the Browser presents visualization of annotations mapped to genomic coordinates. The ability to juxtapose annotations of many types facilitates inquiry-driven data mining. Gene predictions, mRNA alignments, epigenomic data from the ENCODE project, conservation scores from vertebrate whole-genome alignments and variation data may be viewed at any scale from a single base to an entire chromosome. The Browser also includes many other widely used tools, including BLAT, which is useful for alignments from high-throughput sequencing experiments. Private data uploaded as Custom Tracks and Data Hubs in many formats may be displayed alongside the rich compendium of precomputed data in the UCSC database. The Table Browser is a full-featured graphical interface, which allows querying, filtering and intersection of data tables. The Saved Session feature allows users to store and share customized views, enhancing the utility of the system for organizing multiple trains of thought. Binary Alignment/Map (BAM), Variant Call Format and the Personal Genome Single Nucleotide Polymorphisms (SNPs) data formats are useful for visualizing a large sequencing experiment (whole-genome or whole-exome), where the differences between the data set and the reference assembly may be displayed graphically. Support for high-throughput sequencing extends to compact, indexed data formats, such as BAM, bigBed and bigWig, allowing rapid visualization of large datasets from RNA-seq and ChIP-seq experiments via local hosting.
Collapse
Affiliation(s)
- Robert M Kuhn
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA.
| | | | | |
Collapse
|
363
|
Mitsuyama S, Shimizu N. CancerProView: A graphical image database of cancer-related genes and proteins. Genomics 2012; 100:81-92. [DOI: 10.1016/j.ygeno.2012.05.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2011] [Revised: 02/25/2012] [Accepted: 05/22/2012] [Indexed: 01/26/2023]
|
364
|
Das J, Yu H. HINT: High-quality protein interactomes and their applications in understanding human disease. BMC SYSTEMS BIOLOGY 2012; 6:92. [PMID: 22846459 PMCID: PMC3483187 DOI: 10.1186/1752-0509-6-92] [Citation(s) in RCA: 296] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2011] [Accepted: 06/30/2012] [Indexed: 12/22/2022]
Abstract
Background A global map of protein-protein interactions in cellular systems provides key insights into the workings of an organism. A repository of well-validated high-quality protein-protein interactions can be used in both large- and small-scale studies to generate and validate a wide range of functional hypotheses. Results We develop HINT (http://hint.yulab.org) - a database of high-quality protein-protein interactomes for human, Saccharomyces cerevisiae, Schizosaccharomyces pombe, and Oryza sativa. These were collected from several databases and filtered both systematically and manually to remove low-quality/erroneous interactions. The resulting datasets are classified by type (binary physical interactions vs. co-complex associations) and data source (high-throughput systematic setups vs. literature-curated small-scale experiments). We find strong sociological sampling biases in literature-curated datasets of small-scale interactions. An interactome without such sampling biases was used to understand network properties of human disease-genes - hubs are unlikely to cause disease, but if they do, they usually cause multiple disorders. Conclusions HINT is of significant interest to researchers in all fields of biology as it addresses the ubiquitous need of having a repository of high-quality protein-protein interactions. These datasets can be utilized to generate specific hypotheses about specific proteins and/or pathways, as well as analyzing global properties of cellular networks. HINT will be regularly updated and all versions will be tracked.
Collapse
Affiliation(s)
- Jishnu Das
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA.
| | | |
Collapse
|
365
|
George Priya Doss C, Nagasundaram N, Tanwar H. Predicting the impact of deleterious single point mutations in SMAD gene family using structural bioinformatics approach. Interdiscip Sci 2012; 4:103-15. [DOI: 10.1007/s12539-012-0122-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2011] [Revised: 12/21/2011] [Accepted: 12/26/2011] [Indexed: 01/23/2023]
|
366
|
Bauer S, Köhler S, Schulz MH, Robinson PN. Bayesian ontology querying for accurate and noise-tolerant semantic searches. Bioinformatics 2012; 28:2502-8. [PMID: 22843981 DOI: 10.1093/bioinformatics/bts471] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Ontologies provide a structured representation of the concepts of a domain of knowledge as well as the relations between them. Attribute ontologies are used to describe the characteristics of the items of a domain, such as the functions of proteins or the signs and symptoms of disease, which opens the possibility of searching a database of items for the best match to a list of observed or desired attributes. However, naive search methods do not perform well on realistic data because of noise in the data, imprecision in typical queries and because individual items may not display all attributes of the category they belong to. RESULTS We present a method for combining ontological analysis with Bayesian networks to deal with noise, imprecision and attribute frequencies and demonstrate an application of our method as a differential diagnostic support system for human genetics. AVAILABILITY We provide an implementation for the algorithm and the benchmark at http://compbio.charite.de/boqa/. CONTACT Sebastian.Bauer@charite.de or Peter.Robinson@charite.de SUPPLEMENTARY INFORMATION Supplementary Material for this article is available at Bioinformatics online.
Collapse
Affiliation(s)
- Sebastian Bauer
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany.
| | | | | | | |
Collapse
|
367
|
Sun J, Gong X, Purow B, Zhao Z. Uncovering MicroRNA and Transcription Factor Mediated Regulatory Networks in Glioblastoma. PLoS Comput Biol 2012; 8:e1002488. [PMID: 22829753 PMCID: PMC3400583 DOI: 10.1371/journal.pcbi.1002488] [Citation(s) in RCA: 111] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2011] [Accepted: 03/05/2012] [Indexed: 12/12/2022] Open
Abstract
Glioblastoma multiforme (GBM) is the most common and lethal brain tumor in humans. Recent studies revealed that patterns of microRNA (miRNA) expression in GBM tissue samples are different from those in normal brain tissues, suggesting that a number of miRNAs play critical roles in the pathogenesis of GBM. However, little is yet known about which miRNAs play central roles in the pathology of GBM and their regulatory mechanisms of action. To address this issue, in this study, we systematically explored the main regulation format (feed-forward loops, FFLs) consisting of miRNAs, transcription factors (TFs) and their impacting GBM-related genes, and developed a computational approach to construct a miRNA-TF regulatory network. First, we compiled GBM-related miRNAs, GBM-related genes, and known human TFs. We then identified 1,128 3-node FFLs and 805 4-node FFLs with statistical significance. By merging these FFLs together, we constructed a comprehensive GBM-specific miRNA-TF mediated regulatory network. Then, from the network, we extracted a composite GBM-specific regulatory network. To illustrate the GBM-specific regulatory network is promising for identification of critical miRNA components, we specifically examined a Notch signaling pathway subnetwork. Our follow up topological and functional analyses of the subnetwork revealed that six miRNAs (miR-124, miR-137, miR-219-5p, miR-34a, miR-9, and miR-92b) might play important roles in GBM, including some results that are supported by previous studies. In this study, we have developed a computational framework to construct a miRNA-TF regulatory network and generated the first miRNA-TF regulatory network for GBM, providing a valuable resource for further understanding the complex regulatory mechanisms in GBM. The observation of critical miRNAs in the Notch signaling pathway, with partial verification from previous studies, demonstrates that our network-based approach is promising for the identification of new and important miRNAs in GBM and, potentially, other cancers. Several recent studies have implicated the critical role of microRNAs (miRNAs) in the pathogenesis of glioblastoma (GBM), the most common and lethal brain tumor in humans, suggesting that miRNAs may be clinically useful as biomarkers for brain tumors and other cancers. However, to date, the regulatory mechanisms of miRNAs in GBM are unclear. In this study, we have systematically constructed miRNA and transcription factor (TF) mediated regulatory networks specific to GBM. To demonstrate that the GBM-specific regulatory network contains functional modules that may composite of critical miRNA components, we extracted a subnetwork including GBM-related genes involved in the Notch signaling pathway. Through network topological and functional analyses of the Notch signaling pathway subnetwork, several critical miRNAs have been identified, some of which have been reinforced by previous studies. This study not only provides novel miRNAs for further experimental design but also develops a novel computational framework to construct a miRNA-TF combinatory regulatory network for a specific disease.
Collapse
Affiliation(s)
- Jingchun Sun
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Xue Gong
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Benjamin Purow
- Division of Neuro-Oncology, Neurology Department, University of Virginia Health System, Charlottesville, Virginia, United States of America
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- * E-mail:
| |
Collapse
|
368
|
Lechner M, Höhn V, Brauner B, Dunger I, Fobo G, Frishman G, Montrone C, Kastenmüller G, Waegele B, Ruepp A. CIDeR: multifactorial interaction networks in human diseases. Genome Biol 2012; 13:R62. [PMID: 22809392 PMCID: PMC3491383 DOI: 10.1186/gb-2012-13-7-r62] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2012] [Accepted: 07/18/2012] [Indexed: 12/12/2022] Open
Abstract
The pathobiology of common diseases is influenced by heterogeneous factors interacting in complex networks. CIDeR http://mips.helmholtz-muenchen.de/cider/ is a publicly available, manually curated, integrative database of metabolic and neurological disorders. The resource provides structured information on 18,813 experimentally validated interactions between molecules, bioprocesses and environmental factors extracted from the scientific literature. Systematic annotation and interactive graphical representation of disease networks make CIDeR a versatile knowledge base for biologists, analysis of large-scale data and systems biology approaches.
Collapse
|
369
|
Mathivanan S, Ji H, Tauro BJ, Chen YS, Simpson RJ. Identifying mutated proteins secreted by colon cancer cell lines using mass spectrometry. J Proteomics 2012; 76 Spec No.:141-9. [PMID: 22796352 DOI: 10.1016/j.jprot.2012.06.031] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2012] [Revised: 06/05/2012] [Accepted: 06/21/2012] [Indexed: 01/15/2023]
Abstract
Secreted proteins encoded by mutated genes (mutant proteins) are a particularly rich source of biomarkers being not only components of the cancer secretome but also actually implicated in tumorigenesis. One of the challenges of proteomics-driven biomarker discovery research is that the bulk of secreted mutant proteins cannot be identified directly and quantified by mass spectrometry due to the lack of mutated peptide information in extant proteomics databases. Here we identify, using an integrated genomics and proteomics strategy (referred to iMASp - identification of Mutated And Secreted proteins), 112 putative mutated tryptic peptides (corresponding to 57 proteins) in the collective secretomes derived from a panel of 18 human colorectal cancer (CRC) cell lines. Central to this iMASp was the creation of Human Protein Mutant Database (HPMD), against which experimentally-derived secretome peptide spectra were searched. Eight of the identified mutated tryptic peptides were confirmed by RT-PCR and cDNA sequencing of RNA extracted from those CRC cells from which the mutation was identified by mass spectrometry. The iMASp technology promises to improve the link between proteomics and genomic mutation data thereby providing an effective tool for targeting tryptic peptides with mutated amino acids as potential cancer biomarker candidates. This article is part of a Special Issue entitled: Integrated omics.
Collapse
Affiliation(s)
- Suresh Mathivanan
- Department of Biochemistry, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Victoria 3086, Australia
| | | | | | | | | |
Collapse
|
370
|
Burzynski GM, Reed X, Taher L, Stine ZE, Matsui T, Ovcharenko I, McCallion AS. Systematic elucidation and in vivo validation of sequences enriched in hindbrain transcriptional control. Genome Res 2012; 22:2278-89. [PMID: 22759862 PMCID: PMC3483557 DOI: 10.1101/gr.139717.112] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Illuminating the primary sequence encryption of enhancers is central to understanding the regulatory architecture of genomes. We have developed a machine learning approach to decipher motif patterns of hindbrain enhancers and identify 40,000 sequences in the human genome that we predict display regulatory control that includes the hindbrain. Consistent with their roles in hindbrain patterning, MEIS1, NKX6-1, as well as HOX and POU family binding motifs contributed strongly to this enhancer model. Predicted hindbrain enhancers are overrepresented at genes expressed in hindbrain and associated with nervous system development, and primarily reside in the areas of open chromatin. In addition, 77 (0.2%) of these predictions are identified as hindbrain enhancers on the VISTA Enhancer Browser, and 26,000 (60%) overlap enhancer marks (H3K4me1 or H3K27ac). To validate these putative hindbrain enhancers, we selected 55 elements distributed throughout our predictions and six low scoring controls for evaluation in a zebrafish transgenic assay. When assayed in mosaic transgenic embryos, 51/55 elements directed expression in the central nervous system. Furthermore, 30/34 (88%) predicted enhancers analyzed in stable zebrafish transgenic lines directed expression in the larval zebrafish hindbrain. Subsequent analysis of sequence fragments selected based upon motif clustering further confirmed the critical role of the motifs contributing to the classifier. Our results demonstrate the existence of a primary sequence code characteristic to hindbrain enhancers. This code can be accurately extracted using machine-learning approaches and applied successfully for de novo identification of hindbrain enhancers. This study represents a critical step toward the dissection of regulatory control in specific neuronal subtypes.
Collapse
Affiliation(s)
- Grzegorz M Burzynski
- McKusick-Nathans Institute of Genetic Medicine, Department of Molecular and Comparative Pathobiology, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | | | | | | | | | | | | |
Collapse
|
371
|
Köhler S, Doelken SC, Rath A, Aymé S, Robinson PN. Ontological phenotype standards for neurogenetics. Hum Mutat 2012; 33:1333-9. [PMID: 22573485 DOI: 10.1002/humu.22112] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2012] [Accepted: 04/13/2012] [Indexed: 12/22/2022]
Abstract
Neurological disorders comprise one of the largest groups of human diseases. Due to the myriad symptoms and the extreme degree of clinical variability characteristic of many neurological diseases, the differential diagnosis process is extremely challenging. Even though most neurogenetic diseases are individually rare, collectively, the subgroup of neurogenetic disorders is large, comprising more than 2,400 different disorders. Recently, increasing efforts have been undertaken to unravel the molecular basis of neurogenetic diseases and to correlate pathogenetic mechanisms with clinical signs and symptoms. In order to enable computer-based analyses, the systematic representation of the neurological phenotype is of major importance. We demonstrate how the Human Phenotype Ontology (HPO) can be incorporated into these efforts by providing a systematic semantic representation of phenotypic abnormalities encountered in human genetic diseases. The combination of the HPO together with the Orphanet disease classification represents a promising resource for automated disease classification, performing computational clustering and analysis of the neurogenetic phenome. Furthermore, standardized representations of neurologic phenotypic abnormalities employing the HPO link neurological phenotypic abnormalities to anatomical and functional entities represented in other biomedical ontologies through the semantic references provided by the HPO.
Collapse
Affiliation(s)
- Sebastian Köhler
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | | | | | | | | |
Collapse
|
372
|
LSHGD: a database for human leprosy susceptible genes. Genomics 2012; 100:162-6. [PMID: 22750101 DOI: 10.1016/j.ygeno.2012.06.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2012] [Revised: 05/23/2012] [Accepted: 06/23/2012] [Indexed: 01/25/2023]
Abstract
Studies aiming to explore the involvement of host genetic factors to determine susceptibility to develop disease and individual's response to the infection with Mycobacterium leprae have increased in recent years. To address this issue, we have developed a Leprosy Susceptible Human Gene Database (LSHGD) to integrate leprosy and human associated 45 genes by profound literature search. This will serve as a user-friendly and interactive platform to understand the involvement of human polymorphisms (SNPs) in leprosy, independent genetic control over both susceptibility to leprosy and its association with multi-drug resistance of M. leprae. As the first human genetic database in leprosy it aims to provide information about the associated genes, corresponding protein sequences, available three dimensional structures and polymorphism related to leprosy. In conclusion, this will serve as a multifunctional valuable tool and convenient information platform which is freely available at http://www.vit.ac.in/leprosy/leprosy.htm and enables the user to retrieve information of their interest.
Collapse
|
373
|
Gulbahce N, Yan H, Dricot A, Padi M, Byrdsong D, Franchi R, Lee DS, Rozenblatt-Rosen O, Mar JC, Calderwood MA, Baldwin A, Zhao B, Santhanam B, Braun P, Simonis N, Huh KW, Hellner K, Grace M, Chen A, Rubio R, Marto JA, Christakis NA, Kieff E, Roth FP, Roecklein-Canfield J, DeCaprio JA, Cusick ME, Quackenbush J, Hill DE, Münger K, Vidal M, Barabási AL. Viral perturbations of host networks reflect disease etiology. PLoS Comput Biol 2012; 8:e1002531. [PMID: 22761553 PMCID: PMC3386155 DOI: 10.1371/journal.pcbi.1002531] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2011] [Accepted: 03/28/2012] [Indexed: 12/20/2022] Open
Abstract
Many human diseases, arising from mutations of disease susceptibility genes (genetic diseases), are also associated with viral infections (virally implicated diseases), either in a directly causal manner or by indirect associations. Here we examine whether viral perturbations of host interactome may underlie such virally implicated disease relationships. Using as models two different human viruses, Epstein-Barr virus (EBV) and human papillomavirus (HPV), we find that host targets of viral proteins reside in network proximity to products of disease susceptibility genes. Expression changes in virally implicated disease tissues and comorbidity patterns cluster significantly in the network vicinity of viral targets. The topological proximity found between cellular targets of viral proteins and disease genes was exploited to uncover a novel pathway linking HPV to Fanconi anemia.
Collapse
Affiliation(s)
- Natali Gulbahce
- Center for Complex Networks Research (CCNR) and Department of Physics, Northeastern University, Boston, Massachusetts, United States of America
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Han Yan
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Amélie Dricot
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Megha Padi
- Center for Cancer Computational Biology (CCCB), Department of Biostatistics and Computational Biology and Department of Cancer Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Danielle Byrdsong
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Rachel Franchi
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Chemistry, Simmons College, Boston, Massachusetts, United States of America
| | - Deok-Sun Lee
- Center for Complex Networks Research (CCNR) and Department of Physics, Northeastern University, Boston, Massachusetts, United States of America
- Department of Natural Medical Sciences and Department of Physics, Inha University, Incheon, Korea
| | - Orit Rozenblatt-Rosen
- Department of Medical Oncology, Dana-Farber Cancer Institute, and Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Jessica C. Mar
- Center for Cancer Computational Biology (CCCB), Department of Biostatistics and Computational Biology and Department of Cancer Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Michael A. Calderwood
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Infectious Diseases Division, The Channing Laboratory, Brigham and Women's Hospital and Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Amy Baldwin
- Infectious Diseases Division, The Channing Laboratory, Brigham and Women's Hospital and Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Bo Zhao
- Infectious Diseases Division, The Channing Laboratory, Brigham and Women's Hospital and Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Balaji Santhanam
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Pascal Braun
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Nicolas Simonis
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Kyung-Won Huh
- Infectious Diseases Division, The Channing Laboratory, Brigham and Women's Hospital and Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Karin Hellner
- Infectious Diseases Division, The Channing Laboratory, Brigham and Women's Hospital and Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Miranda Grace
- Infectious Diseases Division, The Channing Laboratory, Brigham and Women's Hospital and Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Alyce Chen
- Infectious Diseases Division, The Channing Laboratory, Brigham and Women's Hospital and Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Renee Rubio
- Center for Cancer Computational Biology (CCCB), Department of Biostatistics and Computational Biology and Department of Cancer Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Jarrod A. Marto
- Blais Proteomics Center and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Nicholas A. Christakis
- Department of Health Care Policy, Harvard Medical School and Department of Sociology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Elliott Kieff
- Infectious Diseases Division, The Channing Laboratory, Brigham and Women's Hospital and Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Frederick P. Roth
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Jennifer Roecklein-Canfield
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Chemistry, Simmons College, Boston, Massachusetts, United States of America
| | - James A. DeCaprio
- Department of Medical Oncology, Dana-Farber Cancer Institute, and Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Michael E. Cusick
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - John Quackenbush
- Center for Cancer Computational Biology (CCCB), Department of Biostatistics and Computational Biology and Department of Cancer Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - David E. Hill
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Karl Münger
- Infectious Diseases Division, The Channing Laboratory, Brigham and Women's Hospital and Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Marc Vidal
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
- * E-mail: (MV); (ALB)
| | - Albert-László Barabási
- Center for Complex Networks Research (CCNR) and Department of Physics, Northeastern University, Boston, Massachusetts, United States of America
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- * E-mail: (MV); (ALB)
| |
Collapse
|
374
|
Schaefer C, Bromberg Y, Achten D, Rost B. Disease-related mutations predicted to impact protein function. BMC Genomics 2012; 13 Suppl 4:S11. [PMID: 22759649 PMCID: PMC3394413 DOI: 10.1186/1471-2164-13-s4-s11] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Background Non-synonymous single nucleotide polymorphisms (nsSNPs) alter the protein sequence and can cause disease. The impact has been described by reliable experiments for relatively few mutations. Here, we study predictions for functional impact of disease-annotated mutations from OMIM, PMD and Swiss-Prot and of variants not linked to disease. Results Most disease-causing mutations were predicted to impact protein function. More surprisingly, the raw predictions scores for disease-causing mutations were higher than the scores for the function-altering data set originally used for developing the prediction method (here SNAP). We might expect that diseases are caused by change-of-function mutations. However, it is surprising how well prediction methods developed for different purposes identify this link. Conversely, our predictions suggest that the set of nsSNPs not currently linked to diseases contains very few strong disease associations to be discovered. Conclusions Firstly, annotations of disease-causing nsSNPs are on average so reliable that they can be used as proxies for functional impact. Secondly, disease-causing nsSNPs can be identified very well by methods that predict the impact of mutations on protein function. This implies that the existing prediction methods provide a very good means of choosing a set of suspect SNPs relevant for disease.
Collapse
Affiliation(s)
- Christian Schaefer
- Bioinformatics-i12, Informatics, Technical University Munich, Boltzmannstrasse 3, Garching/Munich, Germany.
| | | | | | | |
Collapse
|
375
|
Doncheva NT, Kacprowski T, Albrecht M. Recent approaches to the prioritization of candidate disease genes. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2012; 4:429-42. [PMID: 22689539 DOI: 10.1002/wsbm.1177] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Many efforts are still devoted to the discovery of genes involved with specific phenotypes, in particular, diseases. High-throughput techniques are thus applied frequently to detect dozens or even hundreds of candidate genes. However, the experimental validation of many candidates is often an expensive and time-consuming task. Therefore, a great variety of computational approaches has been developed to support the identification of the most promising candidates for follow-up studies. The biomedical knowledge already available about the disease of interest and related genes is commonly exploited to find new gene-disease associations and to prioritize candidates. In this review, we highlight recent methodological advances in this research field of candidate gene prioritization. We focus on approaches that use network information and integrate heterogeneous data sources. Furthermore, we discuss current benchmarking procedures for evaluating and comparing different prioritization methods.
Collapse
|
376
|
Kamanu FK, Medvedeva YA, Schaefer U, Jankovic BR, Archer JAC, Bajic VB. Mutations and binding sites of human transcription factors. Front Genet 2012; 3:100. [PMID: 22670148 PMCID: PMC3365286 DOI: 10.3389/fgene.2012.00100] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2011] [Accepted: 05/16/2012] [Indexed: 11/13/2022] Open
Abstract
Mutations in any genome may lead to phenotype characteristics that determine ability of an individual to cope with adaptation to environmental challenges. In studies of human biology, among the most interesting ones are phenotype characteristics that determine responses to drug treatments, response to infections, or predisposition to specific inherited diseases. Most of the research in this field has been focused on the studies of mutation effects on the final gene products, peptides, and their alterations. Considerably less attention was given to the mutations that may affect regulatory mechanism(s) of gene expression, although these may also affect the phenotype characteristics. In this study we make a pilot analysis of mutations observed in the regulatory regions of 24,667 human RefSeq genes. Our study reveals that out of eight studied mutation types, "insertions" are the only one that in a statistically significant manner alters predicted transcription factor binding sites (TFBSs). We also find that 25 families of TFBSs have been altered by mutations in a statistically significant manner in the promoter regions we considered. Moreover, we find that the related transcription factors are, for example, prominent in processes related to intracellular signaling; cell fate; morphogenesis of organs and epithelium; development of urogenital system, epithelium, and tube; neuron fate commitment. Our study highlights the significance of studying mutations within the genes regulatory regions and opens way for further detailed investigations on this topic, particularly on the downstream affected pathways.
Collapse
Affiliation(s)
- Frederick Kinyua Kamanu
- Computational Bioscience Research Center, King Abdullah University of Science and Technology Thuwal, Kingdom of Saudi Arabia
| | | | | | | | | | | |
Collapse
|
377
|
Shimoyama M, Nigam R, McIntosh LS, Nagarajan R, Rice T, Rao DC, Dwinell MR. Three ontologies to define phenotype measurement data. Front Genet 2012; 3:87. [PMID: 22654893 PMCID: PMC3361058 DOI: 10.3389/fgene.2012.00087] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2012] [Accepted: 04/30/2012] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND There is an increasing need to integrate phenotype measurement data across studies for both human studies and those involving model organisms. Current practices allow researchers to access only those data involved in a single experiment or multiple experiments utilizing the same protocol. RESULTS Three ontologies were created: Clinical Measurement Ontology, Measurement Method Ontology and Experimental Condition Ontology. These ontologies provided the framework for integration of rat phenotype data from multiple studies into a single resource as well as facilitated data integration from multiple human epidemiological studies into a centralized repository. CONCLUSION An ontology based framework for phenotype measurement data affords the ability to successfully integrate vital phenotype data into critical resources, regardless of underlying technological structures allowing the user to easily query and retrieve data from multiple studies.
Collapse
Affiliation(s)
- Mary Shimoyama
- Human and Molecular Genetics Center, Medical College of Wisconsin Milwaukee, WI, USA.
| | | | | | | | | | | | | |
Collapse
|
378
|
Phan JH, Quo CF, Wang MD. Cardiovascular genomics: a biomarker identification pipeline. ACTA ACUST UNITED AC 2012; 16:809-22. [PMID: 22614726 DOI: 10.1109/titb.2012.2199570] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Genomic biomarkers are essential for understanding the underlying molecular basis of human diseases such as cardiovascular disease. In this review, we describe a biomarker identification pipeline for cardiovascular disease, which includes 1) high-throughput genomic data acquisition, 2) preprocessing and normalization of data, 3) exploratory analysis, 4) feature selection, 5) classification, and 6) interpretation and validation of candidate biomarkers. We review each step in the pipeline, presenting current and widely used bioinformatics methods. Furthermore, we analyze several publicly available cardiovascular genomics datasets to illustrate the pipeline. Finally, we summarize the current challenges and opportunities for further research.
Collapse
Affiliation(s)
- John H Phan
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA.
| | | | | |
Collapse
|
379
|
Reimand J, Hui S, Jain S, Law B, Bader GD. Domain-mediated protein interaction prediction: From genome to network. FEBS Lett 2012; 586:2751-63. [PMID: 22561014 DOI: 10.1016/j.febslet.2012.04.027] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2012] [Accepted: 04/17/2012] [Indexed: 11/19/2022]
Abstract
Protein-protein interactions (PPIs), involved in many biological processes such as cellular signaling, are ultimately encoded in the genome. Solving the problem of predicting protein interactions from the genome sequence will lead to increased understanding of complex networks, evolution and human disease. We can learn the relationship between genomes and networks by focusing on an easily approachable subset of high-resolution protein interactions that are mediated by peptide recognition modules (PRMs) such as PDZ, WW and SH3 domains. This review focuses on computational prediction and analysis of PRM-mediated networks and discusses sequence- and structure-based interaction predictors, techniques and datasets for identifying physiologically relevant PPIs, and interpreting high-resolution interaction networks in the context of evolution and human disease.
Collapse
Affiliation(s)
- Jüri Reimand
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario, Canada.
| | | | | | | | | |
Collapse
|
380
|
Magesh R, George Priya Doss C. Computational methods to work as first-pass filter in deleterious SNP analysis of alkaptonuria. ScientificWorldJournal 2012; 2012:738423. [PMID: 22606059 PMCID: PMC3349151 DOI: 10.1100/2012/738423] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2011] [Accepted: 10/31/2011] [Indexed: 01/14/2023] Open
Abstract
A major challenge in the analysis of human genetic variation is to distinguish functional from nonfunctional SNPs. Discovering these functional SNPs is one of the main goals of modern genetics and genomics studies. There is a need to effectively and efficiently identify functionally important nsSNPs which may be deleterious or disease causing and to identify their molecular effects. The prediction of phenotype of nsSNPs by computational analysis may provide a good way to explore the function of nsSNPs and its relationship with susceptibility to disease. In this context, we surveyed and compared variation databases along with in silico prediction programs to assess the effects of deleterious functional variants on protein functions. In other respects, we attempted these methods to work as first-pass filter to identify the deleterious substitutions worth pursuing for further experimental research. In this analysis, we used the existing computational methods to explore the mutation-structure-function relationship in HGD gene causing alkaptonuria.
Collapse
Affiliation(s)
- R Magesh
- Department of Biotechnology, Faculty of Biomedical Sciences, Technology & Research, Sri Ramachandra University, Chennai, India
| | | |
Collapse
|
381
|
Baxevanis AD. Searching Online Mendelian Inheritance in Man (OMIM) for information on genetic loci involved in human disease. CURRENT PROTOCOLS IN HUMAN GENETICS 2012; Chapter 9:9.13.1-9.13.10. [PMID: 22470145 DOI: 10.1002/0471142905.hg0913s73] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Online Mendelian Inheritance in Man (OMIM) is a comprehensive compendium of information on human genes and genetic disorders, with a particular emphasis on the interplay between observed phenotypes and underlying genotypes. This unit focuses on the basic methodology for formulating OMIM searches and illustrates the types of information that can be retrieved from OMIM, including descriptions of clinical manifestations resulting from genetic abnormalities. This unit also provides information on additional relevant medical and molecular biology databases. A basic knowledge of OMIM should be part of the armamentarium of physicians and scientists with an interest in research on the clinical aspects of genetic disorders.
Collapse
|
382
|
Yeh CY, Yeh HY, Arias CR, Soo VW. Pathway detection from protein interaction networks and gene expression data using color-coding methods and A∗ search algorithms. ScientificWorldJournal 2012; 2012:315797. [PMID: 22577352 PMCID: PMC3346698 DOI: 10.1100/2012/315797] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2011] [Accepted: 10/17/2011] [Indexed: 12/02/2022] Open
Abstract
With the large availability of protein interaction networks and microarray data supported, to identify the linear paths that have biological significance in search of a potential pathway is a challenge issue. We proposed a color-coding method based on the characteristics of biological network topology and applied heuristic search to speed up color-coding method. In the experiments, we tested our methods by applying to two datasets: yeast and human prostate cancer networks and gene expression data set. The comparisons of our method with other existing methods on known yeast MAPK pathways in terms of precision and recall show that we can find maximum number of the proteins and perform comparably well. On the other hand, our method is more efficient than previous ones and detects the paths of length 10 within 40 seconds using CPU Intel 1.73 GHz and 1 GB main memory running under windows operating system.
Collapse
Affiliation(s)
- Cheng-Yu Yeh
- Department of Computer Science, National Tsing Hua University, Hsinchu 300, Taiwan
- Institute of Information Systems and Applications, National Tsing Hua University, Hsinchu 300, Taiwan
| | - Hsiang-Yuan Yeh
- Department of Computer Science, National Tsing Hua University, Hsinchu 300, Taiwan
- Institute of Information Systems and Applications, National Tsing Hua University, Hsinchu 300, Taiwan
| | - Carlos Roberto Arias
- Department of Computer Science, National Tsing Hua University, Hsinchu 300, Taiwan
- Institute of Information Systems and Applications, National Tsing Hua University, Hsinchu 300, Taiwan
| | - Von-Wun Soo
- Department of Computer Science, National Tsing Hua University, Hsinchu 300, Taiwan
- Institute of Information Systems and Applications, National Tsing Hua University, Hsinchu 300, Taiwan
| |
Collapse
|
383
|
Jin W, Qin P, Lou H, Jin L, Xu S. A systematic characterization of genes underlying both complex and Mendelian diseases. Hum Mol Genet 2012; 21:1611-1624. [PMID: 22186022 DOI: 10.1093/hmg/ddr599] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2025] Open
Abstract
Traditionally, genetic disorders have been classified as either Mendelian diseases or complex diseases. This nosology has greatly benefited genetic counseling and the development of gene mapping strategies. However, based on two well-established databases, we identified that 54% (524 of 968) of the Mendelian disease genes were also involved in complex diseases, and this kind of genes has not been systematically analyzed. Here, we classified human genes into five categories: Mendelian and complex disease (MC) genes, Mendelian but not complex disease (MNC) genes, complex but not Mendelian disease (CNM) genes, essential genes and OTHER genes. First, we found that MC genes were associated with more diseases and phenotypes, and were involved in more complex protein-protein interaction network than MNC or CNM genes on average. Secondly, MC genes encoded the longest proteins and had the highest transcript count among all gene categories. Especially, tissue specificity of MC genes was much higher than that of any other gene categories (P < 7.5 × 10(-5)), although their expression level was similar to that of essential genes. Thirdly, evidences from different aspects supported that MC genes have been subjected to both purifying and positive selection. Interestingly, functions of some human disease genes might be different from those of their orthologous genes in non-primate mammalians since they were even less conserved than OTHER genes. The significant over-representation of copy number variations (CNVs) in CNM genes suggested the important roles of CNVs in complex diseases. In brief, our study not only revealed the characteristics of MC genes, but also provided new insights into the other four gene categories.
Collapse
Affiliation(s)
- Wenfei Jin
- Chinese Academy of Sciences Key Laboratory of Computational Biology, Chinese Academy of Sciences and Max Planck Society Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 200031 Shanghai, China
| | | | | | | | | |
Collapse
|
384
|
Doss C GP. In silico profiling of deleterious amino acid substitutions of potential pathological importance in haemophlia A and haemophlia B. J Biomed Sci 2012; 19:30. [PMID: 22423892 PMCID: PMC3361463 DOI: 10.1186/1423-0127-19-30] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2011] [Accepted: 03/16/2012] [Indexed: 01/08/2023] Open
Abstract
Background In this study, instead of current biochemical methods, the effects of deleterious amino acid substitutions in F8 and F9 gene upon protein structure and function were assayed by means of computational methods and information from the databases. Deleterious substitutions of F8 and F9 are responsible for Haemophilia A and Haemophilia B which is the most common genetic disease of coagulation disorders in blood. Yet, distinguishing deleterious variants of F8 and F9 from the massive amount of nonfunctional variants that occur within a single genome is a significant challenge. Methods We performed an in silico analysis of deleterious mutations and their protein structure changes in order to analyze the correlation between mutation and disease. Deleterious nsSNPs were categorized based on empirical based and support vector machine based methods to predict the impact on protein functions. Furthermore, we modeled mutant proteins and compared them with the native protein for analysis of protein structure stability. Results Out of 510 nsSNPs in F8, 378 nsSNPs (74%) were predicted to be 'intolerant' by SIFT, 371 nsSNPs (73%) were predicted to be 'damaging' by PolyPhen and 445 nsSNPs (87%) as 'less stable' by I-Mutant2.0. In F9, 129 nsSNPs (78%) were predicted to be intolerant by SIFT, 131 nsSNPs (79%) were predicted to be damaging by PolyPhen and 150 nsSNPs (90%) as less stable by I-Mutant2.0. Overall, we found that I-Mutant which emphasizes support vector machine based method outperformed SIFT and PolyPhen in prediction of deleterious nsSNPs in both F8 and F9. Conclusions The models built in this work would be appropriate for predicting the deleterious amino acid substitutions and their functions in gene regulation which would be useful for further genotype-phenotype researches as well as the pharmacogenetics studies. These in silico tools, despite being helpful in providing information about the nature of mutations, may also function as a first-pass filter to determine the substitutions worth pursuing for further experimental research in other coagulation disorder causing genes.
Collapse
Affiliation(s)
- George Priya Doss C
- School of Bio Sciences and Technology, VIT University, Vellore, Tamil Nadu, India.
| |
Collapse
|
385
|
Kawrykow A, Roumanis G, Kam A, Kwak D, Leung C, Wu C, Zarour E, Sarmenta L, Blanchette M, Waldispühl J. Phylo: a citizen science approach for improving multiple sequence alignment. PLoS One 2012; 7:e31362. [PMID: 22412834 PMCID: PMC3296692 DOI: 10.1371/journal.pone.0031362] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2011] [Accepted: 01/09/2012] [Indexed: 01/07/2023] Open
Abstract
Background Comparative genomics, or the study of the relationships of genome structure and function across different species, offers a powerful tool for studying evolution, annotating genomes, and understanding the causes of various genetic disorders. However, aligning multiple sequences of DNA, an essential intermediate step for most types of analyses, is a difficult computational task. In parallel, citizen science, an approach that takes advantage of the fact that the human brain is exquisitely tuned to solving specific types of problems, is becoming increasingly popular. There, instances of hard computational problems are dispatched to a crowd of non-expert human game players and solutions are sent back to a central server. Methodology/Principal Findings We introduce Phylo, a human-based computing framework applying “crowd sourcing” techniques to solve the Multiple Sequence Alignment (MSA) problem. The key idea of Phylo is to convert the MSA problem into a casual game that can be played by ordinary web users with a minimal prior knowledge of the biological context. We applied this strategy to improve the alignment of the promoters of disease-related genes from up to 44 vertebrate species. Since the launch in November 2010, we received more than 350,000 solutions submitted from more than 12,000 registered users. Our results show that solutions submitted contributed to improving the accuracy of up to 70% of the alignment blocks considered. Conclusions/Significance We demonstrate that, combined with classical algorithms, crowd computing techniques can be successfully used to help improving the accuracy of MSA. More importantly, we show that an NP-hard computational problem can be embedded in casual game that can be easily played by people without significant scientific training. This suggests that citizen science approaches can be used to exploit the billions of “human-brain peta-flops” of computation that are spent every day playing games. Phylo is available at: http://phylo.cs.mcgill.ca.
Collapse
Affiliation(s)
- Alexander Kawrykow
- School of Computer Science and McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
386
|
Le DH, Kwon YK. GPEC: a Cytoscape plug-in for random walk-based gene prioritization and biomedical evidence collection. Comput Biol Chem 2012; 37:17-23. [PMID: 22430954 DOI: 10.1016/j.compbiolchem.2012.02.004] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2011] [Revised: 01/10/2012] [Accepted: 02/20/2012] [Indexed: 11/18/2022]
Abstract
Finding genes associated with a disease is an important issue in the biomedical area and many gene prioritization methods have been proposed for this goal. Among these, network-based approaches are recently proposed and outperformed functional annotation-based ones. Here, we introduce a novel Cytoscape plug-in, GPEC, to help identify putative genes likely to be associated with specific diseases or pathways. In the plug-in, gene prioritization is performed through a random walk with restart algorithm, a state-of-the art network-based method, along with a gene/protein relationship network. The plug-in also allows users efficiently collect biomedical evidence for highly ranked candidate genes. A set of known genes, candidate genes and a gene/protein relationship network can be provided in a flexible way.
Collapse
Affiliation(s)
- Duc-Hau Le
- School of Computer Science and Engineering, Water Resources University, 175 Tay Son, Dong Da, Hanoi, Vietnam.
| | | |
Collapse
|
387
|
Doss CGP, NagaSundaram N. Investigating the structural impacts of I64T and P311S mutations in APE1-DNA complex: a molecular dynamics approach. PLoS One 2012; 7:e31677. [PMID: 22384055 PMCID: PMC3288039 DOI: 10.1371/journal.pone.0031677] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2011] [Accepted: 01/11/2012] [Indexed: 11/25/2022] Open
Abstract
Background Elucidating the molecular dynamic behavior of Protein-DNA complex upon mutation is crucial in current genomics. Molecular dynamics approach reveals the changes on incorporation of variants that dictate the structure and function of Protein-DNA complexes. Deleterious mutations in APE1 protein modify the physicochemical property of amino acids that affect the protein stability and dynamic behavior. Further, these mutations disrupt the binding sites and prohibit the protein to form complexes with its interacting DNA. Principal Findings In this study, we developed a rapid and cost-effective method to analyze variants in APE1 gene that are associated with disease susceptibility and evaluated their impacts on APE1-DNA complex dynamic behavior. Initially, two different in silico approaches were used to identify deleterious variants in APE1 gene. Deleterious scores that overlap in these approaches were taken in concern and based on it, two nsSNPs with IDs rs61730854 (I64T) and rs1803120 (P311S) were taken further for structural analysis. Significance Different parameters such as RMSD, RMSF, salt bridge, H-bonds and SASA applied in Molecular dynamic study reveals that predicted deleterious variants I64T and P311S alters the structure as well as affect the stability of APE1-DNA interacting functions. This study addresses such new methods for validating functional polymorphisms of human APE1 which is critically involved in causing deficit in repair capacity, which in turn leads to genetic instability and carcinogenesis.
Collapse
Affiliation(s)
- C. George Priya Doss
- Centre for Nanobiotechnology, Medical Biotechnology Division, School of Biosciences and Technology, VIT University, Vellore, Tamil Nadu, India
- * E-mail:
| | | |
Collapse
|
388
|
Bordbar A, Palsson BO. Using the reconstructed genome-scale human metabolic network to study physiology and pathology. J Intern Med 2012; 271:131-41. [PMID: 22142339 PMCID: PMC3243107 DOI: 10.1111/j.1365-2796.2011.02494.x] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Metabolism plays a key role in many major human diseases. Generation of high-throughput omics data has ushered in a new era of systems biology. Genome-scale metabolic network reconstructions provide a platform to interpret omics data in a biochemically meaningful manner. The release of the global human metabolic network, Recon 1, in 2007 has enabled new systems biology approaches to study human physiology, pathology and pharmacology. There are currently more than 20 publications that utilize Recon 1, including studies of cancer, diabetes, host-pathogen interactions, heritable metabolic disorders and off-target drug binding effects. In this mini-review, we focus on the reconstruction of the global human metabolic network and four classes of its application. We show that computational simulations for numerous pathologies have yielded clinically relevant results, many corroborated by existing or newly generated experimental data.
Collapse
Affiliation(s)
- A Bordbar
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | | |
Collapse
|
389
|
Piro RM, Di Cunto F. Computational approaches to disease-gene prediction: rationale, classification and successes. FEBS J 2012; 279:678-96. [PMID: 22221742 DOI: 10.1111/j.1742-4658.2012.08471.x] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
The identification of genes involved in human hereditary diseases often requires the time-consuming and expensive examination of a great number of possible candidate genes, since genome-wide techniques such as linkage analysis and association studies frequently select many hundreds of 'positional' candidates. Even considering the positive impact of next-generation sequencing technologies, the prioritization of candidate genes may be an important step for disease-gene identification. In this paper we develop a basic classification scheme for computational approaches to disease-gene prediction and apply it to exhaustively review bioinformatics tools that have been developed for this purpose, focusing on conceptual aspects rather than technical detail and performance. Finally, we discuss some past successes obtained by computational approaches to illustrate their beneficial contribution to medical research.
Collapse
Affiliation(s)
- Rosario M Piro
- Department of Theoretical Bioinformatics, German Cancer Research Center, (DKFZ), Heidelberg, Germany.
| | | |
Collapse
|
390
|
Zhang S, Chang Z, Li Z, DuanMu H, Li Z, Li K, Liu Y, Qiu F, Xu Y. Calculating phenotypic similarity between genes using hierarchical structure data based on semantic similarity. Gene 2012; 497:58-65. [PMID: 22305981 DOI: 10.1016/j.gene.2012.01.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2011] [Revised: 01/16/2012] [Accepted: 01/18/2012] [Indexed: 01/25/2023]
Abstract
Phenotypic similarity is correlated with a number of measures of gene function, such as relatedness at the level of direct protein-protein interaction. The phenotypic effect of a deleted or mutated gene, which is one part of gene annotation, has caught broad attention. However, there have been few measures to study phenotypic similarity with the data from Human Phenotype Ontology (HPO) database, therefore more analogous measures should be developed and investigated. We used five semantic similarity-based measures (Jiang and Conrath, Lin, Schlicker, Yu and Wu) to calculate the human phenotypic similarity between genes (PSG) with data from HPO database, and evaluated their accuracy with information of protein-protein interaction, protein complex, protein family, gene function or DNA sequence. Compared with the gene pairs that were random selected, the results of these methods were statistically significant (all P<0.001). Furthermore, we assessed the performance of these five measures by receiver operating characteristic (ROC) curve analysis, and found that most of them performed better than the previous methods. This work had proved that these measures based on semantic similarity for calculation of PSG were effective for hierarchical structure data. Our study contributes to the development and optimization of novel algorithms of PSG calculation and provides more alternative methods to researchers as well as tools and directions for PSG study.
Collapse
Affiliation(s)
- Shanzhen Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, PR China
| | | | | | | | | | | | | | | | | |
Collapse
|
391
|
Wang X, Wei X, Thijssen B, Das J, Lipkin SM, Yu H. Three-dimensional reconstruction of protein networks provides insight into human genetic disease. Nat Biotechnol 2012; 30:159-64. [PMID: 22252508 DOI: 10.1038/nbt.2106] [Citation(s) in RCA: 290] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2011] [Accepted: 12/19/2011] [Indexed: 01/13/2023]
Abstract
To better understand the molecular mechanisms and genetic basis of human disease, we systematically examine relationships between 3,949 genes, 62,663 mutations and 3,453 associated disorders by generating a three-dimensional, structurally resolved human interactome. This network consists of 4,222 high-quality binary protein-protein interactions with their atomic-resolution interfaces. We find that in-frame mutations (missense point mutations and in-frame insertions and deletions) are enriched on the interaction interfaces of proteins associated with the corresponding disorders, and that the disease specificity for different mutations of the same gene can be explained by their location within an interface. We also predict 292 candidate genes for 694 unknown disease-to-gene associations with proposed molecular mechanism hypotheses. This work indicates that knowledge of how in-frame disease mutations alter specific interactions is critical to understanding pathogenesis. Structurally resolved interaction networks should be valuable tools for interpreting the wealth of data being generated by large-scale structural genomics and disease association studies.
Collapse
Affiliation(s)
- Xiujuan Wang
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| | | | | | | | | | | |
Collapse
|
392
|
Capriotti E, Nehrt NL, Kann MG, Bromberg Y. Bioinformatics for personal genome interpretation. Brief Bioinform 2012; 13:495-512. [PMID: 22247263 DOI: 10.1093/bib/bbr070] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
An international consortium released the first draft sequence of the human genome 10 years ago. Although the analysis of this data has suggested the genetic underpinnings of many diseases, we have not yet been able to fully quantify the relationship between genotype and phenotype. Thus, a major current effort of the scientific community focuses on evaluating individual predispositions to specific phenotypic traits given their genetic backgrounds. Many resources aim to identify and annotate the specific genes responsible for the observed phenotypes. Some of these use intra-species genetic variability as a means for better understanding this relationship. In addition, several online resources are now dedicated to collecting single nucleotide variants and other types of variants, and annotating their functional effects and associations with phenotypic traits. This information has enabled researchers to develop bioinformatics tools to analyze the rapidly increasing amount of newly extracted variation data and to predict the effect of uncharacterized variants. In this work, we review the most important developments in the field--the databases and bioinformatics tools that will be of utmost importance in our concerted effort to interpret the human variome.
Collapse
Affiliation(s)
- Emidio Capriotti
- Department of Mathematics and Computer Science, University of Balearic Islands, ctra. de Valldemossa Km 7.5, Palma de Mallorca, 07122 Spain.
| | | | | | | |
Collapse
|
393
|
Cassa CA, Savage SK, Taylor PL, Green RC, McGuire AL, Mandl KD. Disclosing pathogenic genetic variants to research participants: quantifying an emerging ethical responsibility. Genome Res 2012; 22:421-8. [PMID: 22147367 DOI: 10.1101/gr.127845.111] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
There is an emerging consensus that when investigators obtain genomic data from research participants, they may incur an ethical responsibility to inform at-risk individuals about clinically significant variants discovered during the course of their research. With whole-exome sequencing becoming commonplace and the falling costs of full-genome sequencing, there will be an increasingly large number of variants identified in research participants that may be of sufficient clinical relevance to share. An explicit approach to triaging and communicating these results has yet to be developed, and even the magnitude of the task is uncertain. To develop an estimate of the number of variants that might qualify for disclosure, we apply recently published recommendations for the return of results to a defined and representative set of variants and then extrapolate these estimates to genome scale. We find that the total number of variants meeting the threshold for recommended disclosure ranges from 3955-12,579 (3.79%-12.06%, 95% CI) in the most conservative estimate to 6998-17,189 (6.69%-16.48%, 95% CI) in an estimate including variants with variable disease expressivity. Additionally, if the growth rate from the previous 4 yr continues, we estimate that the total number of disease-associated variants will grow 37% over the next 4 yr.
Collapse
Affiliation(s)
- Christopher A Cassa
- Children's Hospital Informatics Program, Children's Hospital Boston, Boston, Massachusetts 02115, USA.
| | | | | | | | | | | |
Collapse
|
394
|
Xiang Y, Payne PR, Huang K. Transactional database transformation and its application in prioritizing human disease genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:294-304. [PMID: 21422495 PMCID: PMC4047992 DOI: 10.1109/tcbb.2011.58] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Binary (0,1) matrices, commonly known as transactional databases, can represent many application data, including genephenotype data where “1” represents a confirmed gene-phenotype relation and “0” represents an unknown relation. It is natural to ask what information is hidden behind these “0”s and “1”s. Unfortunately, recent matrix completion methods, though very effective in many cases, are less likely to infer something interesting from these (0,1)-matrices. To answer this challenge, we propose INDEVI, a very succinct and effective algorithm to perform independent-evidence-based transactional database transformation. Each entry of a (0,1)-matrix is evaluated by “independent evidence” (maximal supporting patterns) extracted from the whole matrix for this entry. The value of an entry, regardless of its value as 0 or 1, has completely no effect for its independent evidence. The experiment on a genephenotype database shows that our method is highly promising in ranking candidate genes and predicting unknown disease genes.
Collapse
Affiliation(s)
- Yang Xiang
- Department of Biomedical Informatics, The Ohio State University,
3190 Graves Hall, 333 W. Tenth Ave., Columbus, OH 43210.
| | - Philip R.O. Payne
- Department of Biomedical Informatics and OSUCCC Biomedical
Informatics Shared Resource, The Ohio State University, 3190 Graves Hall,
333 W. Tenth Ave., Columbus, OH 43210.
| | - Kun Huang
- Department of Biomedical Informatics and OSUCCC Biomedical
Informatics Shared Resource, The Ohio State University, 3190 Graves Hall,
333 W. Tenth Ave., Columbus, OH 43210.
| |
Collapse
|
395
|
Mpamhanga CP, Sharman JL, Harmar AJ. How to use the IUPHAR receptor database to navigate pharmacological data. Methods Mol Biol 2012; 897:15-29. [PMID: 22674159 DOI: 10.1007/978-1-61779-909-9_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Today's data-intensive, interdisciplinary research challenges scientists to keep up to date with key experimental techniques and tools reported in the literature. The International Union of Basic and Clinical Pharmacology Database (IUPHAR-DB) goes some way to addressing this need by providing expert-curated information sourced from primary literature and displayed in a user-friendly manner online. The database provides a channel for the IUPHAR Nomenclature Committee (NC-IUPHAR) to provide recommendations on the nomenclature of receptors and ion channels, to document their properties and the ligands that are useful for receptor characterization. Here we describe IUPHAR-DB's main features and provide examples of techniques for navigating and exploring the information. The database is freely available online at http://www.iuphar-db.org/.
Collapse
|
396
|
Abstract
The association of dysregulated microRNAs (miRNAs) and diseases has been shown in a variety of studies. Here, we review a resource denoted as PhenomiR, providing systematic and comprehensive access to such studies. It allows machine-readable access to miRNA and target relations from these studies to study the impact of miRNAs on multifactorial diseases across many samples and biological replicates. We summarize the PhenomiR data structure and its content and show how to access the database and use it in everyday miRNA profile analysis using the R language.
Collapse
Affiliation(s)
- Andreas Ruepp
- Institute for Bioinformatics and Systems Biology (MIPS), Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.
| | | | | |
Collapse
|
397
|
Integration of Biomolecular Interaction Data in a Genomic and Proteomic Data Warehouse to Support Biomedical Knowledge Discovery. COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS 2012. [DOI: 10.1007/978-3-642-35686-5_10] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
398
|
Dreszer TR, Karolchik D, Zweig AS, Hinrichs AS, Raney BJ, Kuhn RM, Meyer LR, Wong M, Sloan CA, Rosenbloom KR, Roe G, Rhead B, Pohl A, Malladi VS, Li CH, Learned K, Kirkup V, Hsu F, Harte RA, Guruvadoo L, Goldman M, Giardine BM, Fujita PA, Diekhans M, Cline MS, Clawson H, Barber GP, Haussler D, James Kent W. The UCSC Genome Browser database: extensions and updates 2011. Nucleic Acids Res 2012; 40:D918-23. [PMID: 22086951 PMCID: PMC3245018 DOI: 10.1093/nar/gkr1055] [Citation(s) in RCA: 273] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Revised: 10/18/2011] [Accepted: 10/25/2011] [Indexed: 01/05/2023] Open
Abstract
The University of California Santa Cruz Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a wide variety of organisms. The Browser is an integrated tool set for visualizing, comparing, analyzing and sharing both publicly available and user-generated genomic data sets. In the past year, the local database has been updated with four new species assemblies, and we anticipate another four will be released by the end of 2011. Further, a large number of annotation tracks have been either added, updated by contributors, or remapped to the latest human reference genome. Among these are new phenotype and disease annotations, UCSC genes, and a major dbSNP update, which required new visualization methods. Growing beyond the local database, this year we have introduced 'track data hubs', which allow the Genome Browser to provide access to remotely located sets of annotations. This feature is designed to significantly extend the number and variety of annotation tracks that are publicly available for visualization and analysis from within our site. We have also introduced several usability features including track search and a context-sensitive menu of options available with a right-click anywhere on the Browser's image.
Collapse
Affiliation(s)
- Timothy R. Dreszer
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Donna Karolchik
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Ann S. Zweig
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Angie S. Hinrichs
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Brian J. Raney
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Robert M. Kuhn
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Laurence R. Meyer
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Mathew Wong
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Cricket A. Sloan
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Kate R. Rosenbloom
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Greg Roe
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Brooke Rhead
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Andy Pohl
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Venkat S. Malladi
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Chin H. Li
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Katrina Learned
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Vanessa Kirkup
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Fan Hsu
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Rachel A. Harte
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Luvina Guruvadoo
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Mary Goldman
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Belinda M. Giardine
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Pauline A. Fujita
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Melissa S. Cline
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Galt P. Barber
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - David Haussler
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - W. James Kent
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| |
Collapse
|
399
|
Abstract
Our knowledge about human genes and the consequences of mutations leading to human genetic diseases has drastically improved over the last few years. It has been recognized that many mutations are indeed pathogenic because they impact the mRNA rather than the protein itself. With our better understanding of the very complex mechanism of splicing, various bioinformatics tools have been developed. They are now frequently used not only to search for sequence motifs corresponding to splicing signals (splice sites, branch points, ESE, and ESS) but also to predict the impact of mutations on these signals. We now need to address the impact of mutations that affect the splicing process, as their consequences could vary from the activation of cryptic signals to the skipping of one or multiple exons. Despite the major developments of the bioinformatics field coupled to experimental data generated on splicing, it is today still not possible to efficiently predict the consequences of mutations impacting splicing signals, especially to predict if they will lead to exon skipping or to cryptic splice site activation.
Collapse
|
400
|
Guney E, Sanz-Pamplona R, Sierra A, Oliva B. Understanding Cancer Progression Using Protein Interaction Networks. SYSTEMS BIOLOGY IN CANCER RESEARCH AND DRUG DISCOVERY 2012:167-195. [DOI: 10.1007/978-94-007-4819-4_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|