1
|
Paramasivan S, Ashick M, Dudley KJ, Satake N, Mills PC, Sadowski P, Nagaraj SH. VPBrowse: Genome-based representation of MS/MS spectra to quantify 10,000 bovine proteins. Proteomics 2024:e2300431. [PMID: 38468111 DOI: 10.1002/pmic.202300431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 02/11/2024] [Accepted: 02/26/2024] [Indexed: 03/13/2024]
Abstract
SWATH is a data acquisition strategy acclaimed for generating quantitatively accurate and consistent measurements of proteins across multiple samples. Its utility for proteomics studies in nonlaboratory animals, however, is currently compromised by the lack of sufficiently comprehensive and reliable public libraries, either experimental or predicted, and relevant platforms that support their sharing and utilization in an intuitive manner. Here we describe the development of the Veterinary Proteome Browser, VPBrowse (http://browser.proteo.cloud/), an on-line platform for genome-based representation of the Bos taurus proteome, which is equipped with an interactive database and tools for searching, visualization, and building quantitative mass spectrometry assays. In its current version (VPBrowse 1.0), it contains high-quality fragmentation spectra acquired on QToF instrument for over 36,000 proteotypic peptides, the experimental evidence for over 10,000 proteins. Data can be downloaded in different formats to enable analysis using popular software packages for SWATH data processing whilst normalization to iRT scale ensures compatibility with diverse chromatography systems. When applied to published blood plasma dataset from the biomarker discovery study, the resource supported label-free quantification of additional proteins not reported by the authors previously including PSMA4, a tissue leakage protein and a promising candidate biomarker of animal's response to dehorning-related injury.
Collapse
Affiliation(s)
- Selvam Paramasivan
- School of Veterinary Science, The University of Queensland, Gatton, Queensland, Australia
- Central Analytical Research Facility, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Mohamed Ashick
- LifeBytes India Private Limited, Bengaluru, Karnataka, India
| | - Kevin J Dudley
- Central Analytical Research Facility, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Nana Satake
- School of Veterinary Science, The University of Queensland, Gatton, Queensland, Australia
| | - Paul C Mills
- School of Veterinary Science, The University of Queensland, Gatton, Queensland, Australia
| | - Pawel Sadowski
- Central Analytical Research Facility, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Shivashankar H Nagaraj
- Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, Queensland, Australia
- Translational Research Institute, Brisbane, Queensland, Australia
| |
Collapse
|
2
|
Matsuzaki K, Kitayama M, Yamamoto K, Aida R, Imai T, Ishida M, Katafuchi R, Kawamura T, Yokoo T, Narita I, Suzuki Y. A Pragmatic Method to Integrate Data From Preexisting Cohort Studies Using the Clinical Data Interchange Standards Consortium (CDISC) Study Data Tabulation Model: Case Study. JMIR Med Inform 2023; 11:e46725. [PMID: 38153801 PMCID: PMC10766166 DOI: 10.2196/46725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 09/13/2023] [Accepted: 09/14/2023] [Indexed: 09/29/2023] Open
Abstract
Background In recent years, many researchers have focused on the use of legacy data, such as pooled analyses that collect and reanalyze data from multiple studies. However, the methodology for the integration of preexisting databases whose data were collected for different purposes has not been established. Previously, we developed a tool to efficiently generate Study Data Tabulation Model (SDTM) data from hypothetical clinical trial data using the Clinical Data Interchange Standards Consortium (CDISC) SDTM. Objective This study aimed to design a practical model for integrating preexisting databases using the CDISC SDTM. Methods Data integration was performed in three phases: (1) the confirmation of the variables, (2) SDTM mapping, and (3) the generation of the SDTM data. In phase 1, the definitions of the variables in detail were confirmed, and the data sets were converted to a vertical structure. In phase 2, the items derived from the SDTM format were set as mapping items. Three types of metadata (domain name, variable name, and test code), based on the CDISC SDTM, were embedded in the Research Electronic Data Capture (REDCap) field annotation. In phase 3, the data dictionary, including the SDTM metadata, was outputted in the Operational Data Model (ODM) format. Finally, the mapped SDTM data were generated using REDCap2SDTM version 2. Results SDTM data were generated as a comma-separated values file for each of the 7 domains defined in the metadata. A total of 17 items were commonly mapped to 3 databases. Because the SDTM data were set in each database correctly, we were able to integrate 3 independently preexisting databases into 1 database in the CDISC SDTM format. Conclusions Our project suggests that the CDISC SDTM is useful for integrating multiple preexisting databases.
Collapse
Affiliation(s)
- Keiichi Matsuzaki
- Department of Public Health, School of Medicine, Kitasato University, Sagamihara, Japan
| | - Megumi Kitayama
- Clinical Study Support Center, Wakayama Medical University Hospital, Wakayama, Japan
| | - Keiichi Yamamoto
- Translational Research Institute for Medical Innovation, Osaka Dental University, Osaka, Japan
| | - Rei Aida
- Department of Medical Statistics, Osaka Metropolitan University, Osaka, Japan
| | - Takumi Imai
- Clinical & Translational Research Center, Kobe University Hospital, Kobe, Japan
| | - Mami Ishida
- Department of Medical Informatics and Clinical Epidemiology, Kyoto Prefectural University of Medicine, Kyoto, Japan
| | - Ritsuko Katafuchi
- Kidney Unit, National Hospital Organization Fukuokahigashi Medical Center, Fukuoka, Japan
- Kidney Unit, Medical Corporation Houshikai Kano Hospital, Fukuoka, Japan
| | - Tetsuya Kawamura
- Division of Kidney and Hypertension, Department of Internal Medicine, Jikei University School of Medicine, Tokyo, Japan
| | - Takashi Yokoo
- Division of Kidney and Hypertension, Department of Internal Medicine, Jikei University School of Medicine, Tokyo, Japan
| | - Ichiei Narita
- Division of Clinical Nephrology and Rheumatology, Graduate School of Medical and Dental Sciences, Niigata University, Niigata, Japan
| | - Yusuke Suzuki
- Department of Nephrology, Faculty of Medicine, Juntendo University, Tokyo, Japan
| |
Collapse
|
3
|
Oborotov GA, Koshechkin KA, Orlov YL. Application of Artificial Intelligence or machine learning in risk sharing agreements for pharmacotherapy risk management. J Integr Bioinform 2023; 20:jib-2023-0014. [PMID: 38073025 PMCID: PMC10757074 DOI: 10.1515/jib-2023-0014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 11/17/2023] [Indexed: 12/31/2023] Open
Abstract
Applications of Artificial Intelligence in medical informatics solutions risk sharing have social value. At a time of ever-increasing cost for the provision of medicines to citizens, there is a need to restrain the growth of health care costs. The search for computer technologies to stop or slow down the growth of costs acquires a new very important and significant meaning. We discussed the two information technologies in pharmacotherapy and the possibility of combining and sharing them, namely the combination of risk-sharing agreements and Machine Learning, which was made possible by the development of Artificial Intelligence (AI). Neural networks could be used to predict the outcome to reduce the risk factors for treatment. AI-based data processing automation technologies could be also used for risk-sharing agreements automation.
Collapse
Affiliation(s)
- Grigory A. Oborotov
- Chair of Information and Internet Technologies, Digital Health Institute, I.M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation (Sechenov University), Moscow, Russia
| | - Konstantin A. Koshechkin
- Chair of Information and Internet Technologies, Digital Health Institute, I.M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation (Sechenov University), Moscow, Russia
| | - Yuriy L. Orlov
- Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
- Agrarian and Technological Institute, Peoples’ Friendship University of Russia, Moscow, Russia
| |
Collapse
|
4
|
Jung J, Joe H, Ha K, Lim JM, Kim HG. Biomedical Entity Explorer: A Web Server for Biomedical Entity Exploration. J Comput Biol 2021; 28:619-628. [PMID: 34081565 DOI: 10.1089/cmb.2020.0364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Biomedical Entity Explorer (BEE) is a web server that can search for biomedical entities from a database of six biomedical entity types (gene, miRNA, drug, disease, single nucleotide polymorphism [SNP], pathway) and their gene associations. The search results can be explored using intersections, unions, and negations. BEE has integrated biomedical entities from 16 databases (Ensemble, PharmGKB, Genetic Home Reference, Tarbase, Mirbase, NCI Thesaurus, DisGeNET, Linked life data, UMLS, GSEA MsigDB, Reactome, KEGG, Gene Ontology, HGVD, SNPedia, and dbSNP) based on their gene associations and built a database with their synonyms, descriptions, and links containing individual details. Users can enter the keyword of one or more entities and select the type of entity for which they want to know the relationship for and by using set operations such as union, negation, and intersection, they can navigate the search results more clearly. We believe that BEE will not only be useful for biologists querying for complex associations between entities, but can also be a good starting point for general users searching for biomedical entities. BEE is accessible at (http://bike-bee.snu.ac.kr).
Collapse
Affiliation(s)
- Jinuk Jung
- Biomedical Knowledge Engineering Laboratory, Seoul National University School of Dentistry, Seoul, Korea.,Alopax-Algo, Co. Ltd, Seoul, Korea
| | - Hyunwhan Joe
- Biomedical Knowledge Engineering Laboratory, Seoul National University School of Dentistry, Seoul, Korea
| | - Kyungsik Ha
- Dental Research Institute, Seoul National University, Seoul, Korea
| | - Jin-Muk Lim
- Biomedical Knowledge Engineering Laboratory, Seoul National University School of Dentistry, Seoul, Korea.,Alopax-Algo, Co. Ltd, Seoul, Korea
| | - Hong-Gee Kim
- Biomedical Knowledge Engineering Laboratory, Seoul National University School of Dentistry, Seoul, Korea.,Dental Research Institute, Seoul National University, Seoul, Korea
| |
Collapse
|
5
|
Wang B, Yang H, Sun J, Dou C, Huang J, Guo FB. BioMaster: An Integrated Database and Analytic Platform to Provide Comprehensive Information About BioBrick Parts. Front Microbiol 2021; 12:593979. [PMID: 33552037 PMCID: PMC7858672 DOI: 10.3389/fmicb.2021.593979] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 01/04/2021] [Indexed: 01/25/2023] Open
Abstract
Synthetic biology seeks to create new biological parts, devices, and systems, and to reconfigure existing natural biological systems for custom-designed purposes. The standardized BioBrick parts are the foundation of synthetic biology. The incomplete and flawed metadata of BioBrick parts, however, are a major obstacle for designing genetic circuit easily, quickly, and accurately. Here, a database termed BioMaster http://www.biomaster-uestc.cn was developed to extensively complement information about BioBrick parts, which includes 47,934 items of BioBrick parts from the international Genetically Engineered Machine (iGEM) Registry with more comprehensive information integrated from 10 databases, providing corresponding information about functions, activities, interactions, and related literature. Moreover, BioMaster is also a user-friendly platform for retrieval and analyses of relevant information on BioBrick parts.
Collapse
Affiliation(s)
- Beibei Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
- Centre for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Huayi Yang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jianan Sun
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Chuhao Dou
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jian Huang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
- Centre for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Feng-Biao Guo
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
- Centre for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
6
|
Wang J, Liu Z, Bellen HJ, Yamamoto S. Navigating MARRVEL, a Web-Based Tool that Integrates Human Genomics and Model Organism Genetics Information. J Vis Exp 2019:10.3791/59542. [PMID: 31475990 PMCID: PMC7401700 DOI: 10.3791/59542] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Through whole-exome/genome sequencing, human geneticists identify rare variants that segregate with disease phenotypes. To assess if a specific variant is pathogenic, one must query many databases to determine whether the gene of interest is linked to a genetic disease, whether the specific variant has been reported before, and what functional data is available in model organism databases that may provide clues about the gene's function in human. MARRVEL (Model organism Aggregated Resources for Rare Variant ExpLoration) is a one-stop data collection tool for human genes and variants and their orthologous genes in seven model organisms including in mouse, rat, zebrafish, fruit fly, nematode worm, fission yeast, and budding yeast. In this Protocol, we provide an overview of what MARRVEL can be used for and discuss how different datasets can be used to assess whether a variant of unknown significance (VUS) in a known disease-causing gene or a variant in a gene of uncertain significance (GUS) may be pathogenic. This protocol will guide a user through searching multiple human databases simultaneously starting with a human gene with or without a variant of interest. We also discuss how to utilize data from OMIM, ExAC/gnomAD, ClinVar, Geno2MP, DGV and DECHIPHER. Moreover, we illustrate how to interpret a list of ortholog candidate genes, expression patterns, and GO terms in model organisms associated with each human gene. Furthermore, we discuss the value protein structural domain annotations provided and explain how to use the multiple species protein alignment feature to assess whether a variant of interest affects an evolutionarily conserved domain or amino acid. Finally, we will discuss three different use-cases of this website. MARRVEL is an easily accessible open access website designed for both clinical and basic researchers and serves as a starting point to design experiments for functional studies.
Collapse
Affiliation(s)
- Julia Wang
- Program in Developmental Biology, Baylor College of Medicine; Medical Scientist Training Program, Baylor College of Medicine
| | - Zhandong Liu
- Department of Pediatrics, Baylor College of Medicine; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital
| | - Hugo J Bellen
- Program in Developmental Biology, Baylor College of Medicine; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital; Department of Molecular and Human Genetics, Baylor College of Medicine; Department of Neuroscience, Baylor College of Medicine; Howard Hughes Medical Institute, Baylor College of Medicine
| | - Shinya Yamamoto
- Program in Developmental Biology, Baylor College of Medicine; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital; Department of Molecular and Human Genetics, Baylor College of Medicine; Department of Neuroscience, Baylor College of Medicine;
| |
Collapse
|
7
|
Vieira V, Ferreira J, Rodrigues R, Liu F, Rocha M. A Model Integration Pipeline for the Improvement of Human Genome-Scale Metabolic Reconstructions. J Integr Bioinform 2018; 16:/j/jib.2019.16.issue-1/jib-2018-0068/jib-2018-0068.xml. [PMID: 30808160 PMCID: PMC6798860 DOI: 10.1515/jib-2018-0068] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Revised: 10/25/2018] [Accepted: 11/14/2018] [Indexed: 01/17/2023] Open
Abstract
Metabolism has been a major field of study in the last years, mainly due to its importance in understanding cell physiology and certain disease phenotypes due to its deregulation. Genome-scale metabolic models (GSMMs) have been established as important tools to help achieve a better understanding of human metabolism. Towards this aim, advances in systems biology and bioinformatics have allowed the reconstruction of several human GSMMs, although some limitations and challenges remain, such as the lack of external identifiers for both metabolites and reactions. A pipeline was developed to integrate multiple GSMMs, starting by retrieving information from the main human GSMMs and evaluating the presence of external database identifiers and annotations for both metabolites and reactions. Information from metabolites was included into a graph database with omics data repositories, allowing clustering of metabolites through their similarity regarding database cross-referencing. Metabolite annotation of several older GSMMs was enriched, allowing the identification and integration of common entities. Using this information, as well as other metrics, we successfully integrated reactions from these models. These methods can be leveraged towards the creation of a unified consensus model of human metabolism.
Collapse
Affiliation(s)
- Vítor Vieira
- Center of Biological Engineering, University of Minho – Campus de Gualtar, Braga, Portugal
| | - Jorge Ferreira
- Center of Biological Engineering, University of Minho – Campus de Gualtar, Braga, Portugal
| | - Rúben Rodrigues
- Center of Biological Engineering, University of Minho – Campus de Gualtar, Braga, Portugal
| | - Filipe Liu
- Argonne National Laboratory, Lemont, IL, USA
| | - Miguel Rocha
- Center of Biological Engineering, University of Minho – Campus de Gualtar, Braga, Portugal
| |
Collapse
|
8
|
Pozdeyev N, Yoo M, Mackie R, Schweppe RE, Tan AC, Haugen BR. Integrating heterogeneous drug sensitivity data from cancer pharmacogenomic studies. Oncotarget 2016; 7:51619-51625. [PMID: 27322211 PMCID: PMC5239501 DOI: 10.18632/oncotarget.10010] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2016] [Accepted: 05/29/2016] [Indexed: 01/22/2023] Open
Abstract
The consistency of in vitro drug sensitivity data is of key importance for cancer pharmacogenomics. Previous attempts to correlate drug sensitivities from the large pharmacogenomics databases, such as the Cancer Cell Line Encyclopedia (CCLE) and the Genomics of Drug Sensitivity in Cancer (GDSC), have produced discordant results. We developed a new drug sensitivity metric, the area under the dose response curve adjusted for the range of tested drug concentrations, which allows integration of heterogeneous drug sensitivity data from the CCLE, the GDSC, and the Cancer Therapeutics Response Portal (CTRP). We show that there is moderate to good agreement of drug sensitivity data for many targeted therapies, particularly kinase inhibitors. The results of this largest cancer cell line drug sensitivity data analysis to date are accessible through the online portal, which serves as a platform for high power pharmacogenomics analysis.
Collapse
Affiliation(s)
- Nikita Pozdeyev
- Department of Medicine, University of Colorado Cancer Center, University of Colorado School of Medicine, Aurora, CO, USA
| | - Minjae Yoo
- Department of Medicine, University of Colorado Cancer Center, University of Colorado School of Medicine, Aurora, CO, USA
| | - Ryan Mackie
- Department of Medicine, University of Colorado Cancer Center, University of Colorado School of Medicine, Aurora, CO, USA
| | - Rebecca E. Schweppe
- Department of Medicine, University of Colorado Cancer Center, University of Colorado School of Medicine, Aurora, CO, USA
| | - Aik Choon Tan
- Department of Medicine, University of Colorado Cancer Center, University of Colorado School of Medicine, Aurora, CO, USA
| | - Bryan R. Haugen
- Department of Medicine, University of Colorado Cancer Center, University of Colorado School of Medicine, Aurora, CO, USA
| |
Collapse
|
9
|
Anguita A, García-Remesal M, de la Iglesia D, Graf N, Maojo V. Toward a view-oriented approach for aligning RDF-based biomedical repositories. Methods Inf Med 2014; 54:50-5. [PMID: 24777240 DOI: 10.3414/me13-02-0020] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2013] [Accepted: 03/17/2014] [Indexed: 11/09/2022]
Abstract
INTRODUCTION This article is part of the Focus Theme of METHODS of Information in Medicine on "Managing Interoperability and Complexity in Health Systems". BACKGROUND The need for complementary access to multiple RDF databases has fostered new lines of research, but also entailed new challenges due to data representation disparities. While several approaches for RDF-based database integration have been proposed, those focused on schema alignment have become the most widely adopted. All state-of-the-art solutions for aligning RDF-based sources resort to a simple technique inherited from legacy relational database integration methods. This technique - known as element-to-element (e2e) mappings - is based on establishing 1:1 mappings between single primitive elements - e.g. concepts, attributes, relationships, etc. - belonging to the source and target schemas. However, due to the intrinsic nature of RDF - a representation language based on defining tuples < subject, predicate, object > -, one may find RDF elements whose semantics vary dramatically when combined into a view involving other RDF elements - i.e. they depend on their context. The latter cannot be adequately represented in the target schema by resorting to the traditional e2e approach. These approaches fail to properly address this issue without explicitly modifying the target ontology, thus lacking the required expressiveness for properly reflecting the intended semantics in the alignment information. OBJECTIVES To enhance existing RDF schema alignment techniques by providing a mechanism to properly represent elements with context-dependent semantics, thus enabling users to perform more expressive alignments, including scenarios that cannot be adequately addressed by the existing approaches. METHODS Instead of establishing 1:1 correspondences between single primitive elements of the schemas, we propose adopting a view-based approach. The latter is targeted at establishing mapping relationships between RDF subgraphs - that can be regarded as the equivalent of views in traditional databases -, rather than between single schema elements. This approach enables users to represent scenarios defined by context-dependent RDF elements that cannot be properly represented when adopting the currently existing approaches. RESULTS We developed a software tool implementing our view-based strategy. Our tool is currently being used in the context of the European Commission funded p-medicine project, targeted at creating a technological framework to integrate clinical and genomic data to facilitate the development of personalized drugs and therapies for cancer, based on the genetic profile of the patient. We used our tool to integrate different RDF-based databases - including different repositories of clinical trials and DICOM images - using the Health Data Ontology Trunk (HDOT) ontology as the target schema. CONCLUSIONS The importance of database integration methods and tools in the context of biomedical research has been widely recognized. Modern research in this area - e.g. identification of disease biomarkers, or design of personalized therapies - heavily relies on the availability of a technical framework to enable researchers to uniformly access disparate repositories. We present a method and a tool that implement a novel alignment method specifically designed to support and enhance the integration of RDF-based data sources at schema (metadata) level. This approach provides an increased level of expressiveness compared to other existing solutions, and allows solving heterogeneity scenarios that cannot be properly represented using other state-of-the-art techniques.
Collapse
Affiliation(s)
- A Anguita
- Alberto Anguita, PhD, Group of Biomedical Informatics, Universidad Politécnica de Madrid, Campus de Montegancedo s/n, 28660 Boadilla del Monte, Spain, E-mail:
| | | | | | | | | |
Collapse
|
10
|
Boyle B, Hopkins N, Lu Z, Raygoza Garay JA, Mozzherin D, Rees T, Matasci N, Narro ML, Piel WH, Mckay SJ, Lowry S, Freeland C, Peet RK, Enquist BJ. The taxonomic name resolution service: an online tool for automated standardization of plant names. BMC Bioinformatics 2013; 14:16. [PMID: 23324024 PMCID: PMC3554605 DOI: 10.1186/1471-2105-14-16] [Citation(s) in RCA: 214] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2012] [Accepted: 01/02/2013] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND The digitization of biodiversity data is leading to the widespread application of taxon names that are superfluous, ambiguous or incorrect, resulting in mismatched records and inflated species numbers. The ultimate consequences of misspelled names and bad taxonomy are erroneous scientific conclusions and faulty policy decisions. The lack of tools for correcting this 'names problem' has become a fundamental obstacle to integrating disparate data sources and advancing the progress of biodiversity science. RESULTS The TNRS, or Taxonomic Name Resolution Service, is an online application for automated and user-supervised standardization of plant scientific names. The TNRS builds upon and extends existing open-source applications for name parsing and fuzzy matching. Names are standardized against multiple reference taxonomies, including the Missouri Botanical Garden's Tropicos database. Capable of processing thousands of names in a single operation, the TNRS parses and corrects misspelled names and authorities, standardizes variant spellings, and converts nomenclatural synonyms to accepted names. Family names can be included to increase match accuracy and resolve many types of homonyms. Partial matching of higher taxa combined with extraction of annotations, accession numbers and morphospecies allows the TNRS to standardize taxonomy across a broad range of active and legacy datasets. CONCLUSIONS We show how the TNRS can resolve many forms of taxonomic semantic heterogeneity, correct spelling errors and eliminate spurious names. As a result, the TNRS can aid the integration of disparate biological datasets. Although the TNRS was developed to aid in standardizing plant names, its underlying algorithms and design can be extended to all organisms and nomenclatural codes. The TNRS is accessible via a web interface at http://tnrs.iplantcollaborative.org/ and as a RESTful web service and application programming interface. Source code is available at https://github.com/iPlantCollaborativeOpenSource/TNRS/.
Collapse
Affiliation(s)
- Brad Boyle
- Department of Ecology and Evolutionary Biology, University of Arizona Tucson, P.O. Box 210088, Tucson, AZ, 85721, USA
- The iPlant Collaborative, Thomas W. Keating Bioresearch Building, 1657 East Helen Street, Tucson, AZ, 85721, USA
| | - Nicole Hopkins
- The iPlant Collaborative, Thomas W. Keating Bioresearch Building, 1657 East Helen Street, Tucson, AZ, 85721, USA
- BIO5 Institute, 1657 East Helen Street, PO Box 210240, Tucson, AZ, 85721-0240, USA
| | - Zhenyuan Lu
- The iPlant Collaborative, Thomas W. Keating Bioresearch Building, 1657 East Helen Street, Tucson, AZ, 85721, USA
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY, 11724-2202, USA
| | - Juan Antonio Raygoza Garay
- The iPlant Collaborative, Thomas W. Keating Bioresearch Building, 1657 East Helen Street, Tucson, AZ, 85721, USA
- BIO5 Institute, 1657 East Helen Street, PO Box 210240, Tucson, AZ, 85721-0240, USA
| | - Dmitry Mozzherin
- 7 MBL street, Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA, 02543, USA
| | - Tony Rees
- Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania, 7001, Australia
| | - Naim Matasci
- Department of Ecology and Evolutionary Biology, University of Arizona Tucson, P.O. Box 210088, Tucson, AZ, 85721, USA
- The iPlant Collaborative, Thomas W. Keating Bioresearch Building, 1657 East Helen Street, Tucson, AZ, 85721, USA
- BIO5 Institute, 1657 East Helen Street, PO Box 210240, Tucson, AZ, 85721-0240, USA
| | - Martha L Narro
- The iPlant Collaborative, Thomas W. Keating Bioresearch Building, 1657 East Helen Street, Tucson, AZ, 85721, USA
- BIO5 Institute, 1657 East Helen Street, PO Box 210240, Tucson, AZ, 85721-0240, USA
| | - William H Piel
- Yale-NUS College, 6 College Avenue East, Singapore, 138614, Singapore
| | - Sheldon J Mckay
- The iPlant Collaborative, Thomas W. Keating Bioresearch Building, 1657 East Helen Street, Tucson, AZ, 85721, USA
- BIO5 Institute, 1657 East Helen Street, PO Box 210240, Tucson, AZ, 85721-0240, USA
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY, 11724-2202, USA
| | - Sonya Lowry
- The iPlant Collaborative, Thomas W. Keating Bioresearch Building, 1657 East Helen Street, Tucson, AZ, 85721, USA
- BIO5 Institute, 1657 East Helen Street, PO Box 210240, Tucson, AZ, 85721-0240, USA
| | - Chris Freeland
- Missouri Botanical Garden, 4344 Shaw Blvd. |, St. Louis, MO, 63110, USA
| | - Robert K Peet
- Department of Biology, CB 3280, University of North Carolina, Chapel Hill, NC, 27599-3280, USA
| | - Brian J Enquist
- Department of Ecology and Evolutionary Biology, University of Arizona Tucson, P.O. Box 210088, Tucson, AZ, 85721, USA
- The Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM, 87501, USA
| |
Collapse
|
11
|
Lawson CL, Dutta S, Westbrook JD, Henrick K, Berman HM. Representation of viruses in the remediated PDB archive. Acta Crystallogr D Biol Crystallogr 2008; D64:874-82. [PMID: 18645236 PMCID: PMC2677383 DOI: 10.1107/s0907444908017393] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/18/2008] [Accepted: 06/09/2008] [Indexed: 11/24/2022]
Abstract
A new scheme has been devised to represent viruses and other biological assemblies with regular noncrystallographic symmetry in the Protein Data Bank (PDB). The scheme describes existing and anticipated PDB entries of this type using generalized descriptions of deposited and experimental coordinate frames, symmetry and frame transformations. A simplified notation has been adopted to express the symmetry generation of assemblies from deposited coordinates and matrix operations describing the required point, helical or crystallographic symmetry. Complete correct information for building full assemblies, subassemblies and crystal asymmetric units of all virus entries is now available in the remediated PDB archive.
Collapse
Affiliation(s)
- Catherine L Lawson
- RCSB Protein Data Bank, Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, 610 Taylor Road, Piscataway, NJ 08854-8087, USA.
| | | | | | | | | |
Collapse
|