1
|
Beck T, Rowlands T, Shorter T, Brookes AJ. GWAS Central: an expanding resource for finding and visualising genotype and phenotype data from genome-wide association studies. Nucleic Acids Res 2023; 51:D986-D993. [PMID: 36350644 PMCID: PMC9825503 DOI: 10.1093/nar/gkac1017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/18/2022] [Accepted: 10/20/2022] [Indexed: 11/10/2022] Open
Abstract
The GWAS Central resource gathers and curates extensive summary-level genome-wide association study (GWAS) data and puts a range of user-friendly but powerful website tools for the comparison and visualisation of GWAS data at the fingertips of researchers. Through our continued efforts to harmonise and import data received from GWAS authors and consortia, and data sets actively collected from public sources, the database now contains over 72.5 million P-values for over 5000 studies testing over 7.4 million unique genetic markers investigating over 1700 unique phenotypes. Here, we describe an update to integrate this extensive data collection with mouse disease model data to support insights into the functional impact of human genetic variation. GWAS Central has expanded to include mouse gene-phenotype associations observed during mouse gene knockout screens. To allow similar cross-species phenotypes to be compared, terms from mammalian and human phenotype ontologies have been mapped. New interactive interfaces to find, correlate and view human and mouse genotype-phenotype associations are included in the website toolkit. Additionally, the integrated browser for interrogating multiple association data sets has been updated and a GA4GH Beacon API endpoint has been added for discovering variants tested in GWAS. The GWAS Central resource is accessible at https://www.gwascentral.org/.
Collapse
Affiliation(s)
- Tim Beck
- Department of Genetics and Genome Biology, University of Leicester, Leicester, LE1 7RH, UK
- Health Data Research UK (HDR UK), London, UK
| | - Thomas Rowlands
- Department of Genetics and Genome Biology, University of Leicester, Leicester, LE1 7RH, UK
| | - Tom Shorter
- Department of Genetics and Genome Biology, University of Leicester, Leicester, LE1 7RH, UK
| | - Anthony J Brookes
- Department of Genetics and Genome Biology, University of Leicester, Leicester, LE1 7RH, UK
- Health Data Research UK (HDR UK), London, UK
| |
Collapse
|
2
|
Dhombres F, Bodenreider O. Interoperability between phenotypes in research and healthcare terminologies--Investigating partial mappings between HPO and SNOMED CT. J Biomed Semantics 2016; 7:3. [PMID: 26865946 PMCID: PMC4748471 DOI: 10.1186/s13326-016-0047-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2015] [Accepted: 02/02/2016] [Indexed: 12/30/2022] Open
Abstract
Background Identifying partial mappings between two terminologies is of special importance when one terminology is finer-grained than the other, as is the case for the Human Phenotype Ontology (HPO), mainly used for research purposes, and SNOMED CT, mainly used in healthcare. Objectives To investigate and contrast lexical and logical approaches to deriving partial mappings between HPO and SNOMED CT. Methods 1) Lexical approach—We identify modifiers in HPO terms and attempt to map demodified terms to SNOMED CT through UMLS; 2) Logical approach—We leverage subsumption relations in HPO to infer partial mappings to SNOMED CT; 3) Comparison—We analyze the specific contribution of each approach and evaluate the quality of the partial mappings through manual review. Results There are 7358 HPO concepts with no complete mapping to SNOMED CT. We identified partial mappings lexically for 33 % of them and logically for 82 %. We identified partial mappings both lexically and logically for 27 %. The clinical relevance of the partial mappings (for a cohort selection use case) is 49 % for lexical mappings and 67 % for logical mappings. Conclusions Through complete and partial mappings, 92 % of the 10,454 HPO concepts can be mapped to SNOMED CT (30 % complete and 62 % partial). Equivalence mappings between HPO and SNOMED CT allow for interoperability between data described using these two systems. However, due to differences in focus and granularity, equivalence is only possible for 30 % of HPO classes. In the remaining cases, partial mappings provide a next-best approach for traversing between the two systems. Both lexical and logical mapping techniques produce mappings that cannot be generated by the other technique, suggesting that the two techniques are complementary to each other. Finally, this work demonstrates interesting properties (both lexical and logical) of HPO and SNOMED CT and illustrates some limitations of mapping through UMLS.
Collapse
Affiliation(s)
- Ferdinand Dhombres
- National Library of Medicine, National Institutes of Health, Bethesda, MD USA
| | - Olivier Bodenreider
- National Library of Medicine, National Institutes of Health, Bethesda, MD USA
| |
Collapse
|
3
|
Mina E, Thompson M, Kaliyaperumal R, Zhao J, der Horst VE, Tatum Z, Hettne KM, Schultes EA, Mons B, Roos M. Nanopublications for exposing experimental data in the life-sciences: a Huntington's Disease case study. J Biomed Semantics 2015; 6:5. [PMID: 26464783 PMCID: PMC4603842 DOI: 10.1186/2041-1480-6-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2014] [Accepted: 10/31/2014] [Indexed: 12/20/2022] Open
Abstract
Data from high throughput experiments often produce far more results than can ever appear in the main text or tables of a single research article. In these cases, the majority of new associations are often archived either as supplemental information in an arbitrary format or in publisher-independent databases that can be difficult to find. These data are not only lost from scientific discourse, but are also elusive to automated search, retrieval and processing. Here, we use the nanopublication model to make scientific assertions that were concluded from a workflow analysis of Huntington’s Disease data machine-readable, interoperable, and citable. We followed the nanopublication guidelines to semantically model our assertions as well as their provenance metadata and authorship. We demonstrate interoperability by linking nanopublication provenance to the Research Object model. These results indicate that nanopublications can provide an incentive for researchers to expose data that is interoperable and machine-readable for future use and preservation for which they can get credits for their effort. Nanopublications can have a leading role into hypotheses generation offering opportunities to produce large-scale data integration.
Collapse
Affiliation(s)
- Eleni Mina
- Department of Human Genetics, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden, The Netherlands
| | - Mark Thompson
- Department of Human Genetics, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden, The Netherlands
| | - Rajaram Kaliyaperumal
- Department of Human Genetics, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden, The Netherlands
| | - Jun Zhao
- Department of Zoology, University of Oxford, Oxford, UK
| | - van Eelke der Horst
- Department of Human Genetics, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden, The Netherlands
| | - Zuotian Tatum
- Department of Human Genetics, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden, The Netherlands
| | - Kristina M Hettne
- Department of Human Genetics, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden, The Netherlands
| | - Erik A Schultes
- Department of Human Genetics, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden, The Netherlands
| | - Barend Mons
- Department of Human Genetics, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden, The Netherlands
| | - Marco Roos
- Department of Human Genetics, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden, The Netherlands
| |
Collapse
|
4
|
Dhombres F, Winnenburg R, Case JT, Bodenreider O. Extending the coverage of phenotypes in SNOMED CT through post-coordination. Stud Health Technol Inform 2015; 216:795-9. [PMID: 26262161 PMCID: PMC5875691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
OBJECTIVES To extend the coverage of phenotypes in SNOMED CT through post-coordination. METHODS We identify frequent modifiers in terms from the Human Phenotype Ontology (HPO), which we associate with templates for post-coordinated expressions in SNOMED CT. RESULTS We identified 176 modifiers, created 12 templates, and generated 1,617 post-coordinated expressions. CONCLUSIONS Through this novel approach, we can increase the current number of mappings by 50%.
Collapse
Affiliation(s)
- Ferdinand Dhombres
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, MD, USA
| | - Rainer Winnenburg
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| | - James T. Case
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, MD, USA
| | - Olivier Bodenreider
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, MD, USA
| |
Collapse
|
5
|
West C, Azria D, Chang-Claude J, Davidson S, Lambin P, Rosenstein B, De Ruysscher D, Talbot C, Thierens H, Valdagni R, Vega A, Yuille M. The REQUITE project: validating predictive models and biomarkers of radiotherapy toxicity to reduce side-effects and improve quality of life in cancer survivors. Clin Oncol (R Coll Radiol) 2014; 26:739-42. [PMID: 25267305 DOI: 10.1016/j.clon.2014.09.008] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2014] [Accepted: 09/04/2014] [Indexed: 12/25/2022]
Affiliation(s)
- C West
- University of Manchester, Manchester, UK.
| | - D Azria
- University of Montpellier, Montpellier, France
| | - J Chang-Claude
- German Cancer Research Centre (DKFZ), Heidelberg, Germany
| | - S Davidson
- The Christie NHS Foundation Trust, Manchester, UK
| | - P Lambin
- University of Maastricht (Maastro-GROW), Maastricht, The Netherlands
| | - B Rosenstein
- Mount Sinai School of Medicine, New York, NY, USA
| | | | - C Talbot
- University of Leicester, Leicester, UK
| | | | - R Valdagni
- Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
| | - A Vega
- Fundación Pública Galega Medicina Xenómica, Santiago de Compostela, Spain
| | - M Yuille
- University of Manchester, Manchester, UK
| |
Collapse
|
6
|
Masseroli M, Mons B, Bongcam-Rudloff E, Ceri S, Kel A, Rechenmann F, Lisacek F, Romano P. Integrated Bio-Search: challenges and trends for the integration, search and comprehensive processing of biological information. BMC Bioinformatics 2014; 15 Suppl 1:S2. [PMID: 24564249 PMCID: PMC4015876 DOI: 10.1186/1471-2105-15-s1-s2] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Many efforts exist to design and implement approaches and tools for data capture, integration and analysis in the life sciences. Challenges are not only the heterogeneity, size and distribution of information sources, but also the danger of producing too many solutions for the same problem. Methodological, technological, infrastructural and social aspects appear to be essential for the development of a new generation of best practices and tools. In this paper, we analyse and discuss these aspects from different perspectives, by extending some of the ideas that arose during the NETTAB 2012 Workshop, making reference especially to the European context. First, relevance of using data and software models for the management and analysis of biological data is stressed. Second, some of the most relevant community achievements of the recent years, which should be taken as a starting point for future efforts in this research domain, are presented. Third, some of the main outstanding issues, challenges and trends are analysed. The challenges related to the tendency to fund and create large scale international research infrastructures and public-private partnerships in order to address the complex challenges of data intensive science are especially discussed. The needs and opportunities of Genomic Computing (the integration, search and display of genomic information at a very specific level, e.g. at the level of a single DNA region) are then considered. In the current data and network-driven era, social aspects can become crucial bottlenecks. How these may best be tackled to unleash the technical abilities for effective data integration and validation efforts is then discussed. Especially the apparent lack of incentives for already overwhelmed researchers appears to be a limitation for sharing information and knowledge with other scientists. We point out as well how the bioinformatics market is growing at an unprecedented speed due to the impact that new powerful in silico analysis promises to have on better diagnosis, prognosis, drug discovery and treatment, towards personalized medicine. An open business model for bioinformatics, which appears to be able to reduce undue duplication of efforts and support the increased reuse of valuable data sets, tools and platforms, is finally discussed.
Collapse
Affiliation(s)
- Marco Masseroli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milano, 20133, Italy
| | - Barend Mons
- Leiden University Medical Center, Leiden, 2333 ZA, The Netherlands
- Netherlands Bioinformatics Center, Nijmegen, 6500 HB, The Netherlands
| | - Erik Bongcam-Rudloff
- Department of Animal Breeding and Genetics, SLU-Global Bioinformatics Centre, Swedish University of Agricultural Sciences, Uppsala, 75124, Sweden
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, 75108, Sweden
| | - Stefano Ceri
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milano, 20133, Italy
| | - Alexander Kel
- GeneXplain GmbH, Wolfenbüttel, 38302, Germany
- Institute of Chemical Biology and Fundamental Medicine SBRAS, Novosibirsk, 630090, Russia
| | | | - Frederique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, 1211 Geneva 4, Switzerland
- Section of Biology, University of Geneva, 1211 Geneva 4, Switzerland
| | - Paolo Romano
- Biopolymers and Proteomics, IRCCS AOU San Martino IST, Genoa, 16132, Italy
| |
Collapse
|
7
|
GWAS Central: a comprehensive resource for the comparison and interrogation of genome-wide association studies. Eur J Hum Genet 2013; 22:949-52. [PMID: 24301061 PMCID: PMC4060122 DOI: 10.1038/ejhg.2013.274] [Citation(s) in RCA: 113] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2013] [Revised: 10/04/2013] [Accepted: 10/25/2013] [Indexed: 01/29/2023] Open
Abstract
To facilitate broad and convenient integrative visualization of and access to GWAS data, we have created the GWAS Central resource (http://www.gwascentral.org). This database seeks to provide a comprehensive collection of summary-level genetic association data, structured both for maximal utility and for safe open access (i.e., non-directional signals to fully preclude research subject identification). The resource emphasizes on advanced tools that allow comparison and discovery of relevant data sets from the perspective of genes, genome regions, phenotypes or traits. Tested markers and relevant genomic features can be visually interrogated across up to 16 multiple association data sets in a single view, starting at a chromosome-wide view and increasing in resolution down to individual bases. In addition, users can privately upload and view their own data as temporary files. Search and display utility is further enhanced by exploiting phenotype ontology annotations to allow genetic variants associated with phenotypes and traits of interest to be precisely identified, across all studies. Data submissions are accepted from individual researchers, groups and consortia, whereas we also actively gather data sets from various public sources. As a result, the resource now provides over 67 million P-values for over 1600 studies, making it the world's largest openly accessible online collection of summary-level GWAS association information.
Collapse
|