1
|
Remsen D. The use and limits of scientific names in biological informatics. Zookeys 2016:207-23. [PMID: 26877660 PMCID: PMC4741222 DOI: 10.3897/zookeys.550.9546] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2015] [Accepted: 03/09/2015] [Indexed: 11/21/2022] Open
Abstract
Scientific names serve to label biodiversity information: information related to species. Names, and their underlying taxonomic definitions, however, are unstable and ambiguous. This negatively impacts the utility of names as identifiers and as effective indexing tools in biological informatics where names are commonly utilized for searching, retrieving and integrating information about species. Semiotics provides a general model for describing the relationship between taxon names and taxon concepts. It distinguishes syntactics, which governs relationships among names, from semantics, which represents the relations between those labels and the taxa to which they refer. In the semiotic context, changes in semantics (i.e., taxonomic circumscription) do not consistently result in a corresponding and reflective change in syntax. Further, when syntactic changes do occur, they may be in response to semantic changes or in response to syntactic rules. This lack of consistency in the cardinal relationship between names and taxa places limits on how scientific names may be used in biological informatics in initially anchoring, and in the subsequent retrieval and integration, of relevant biodiversity information. Precision and recall are two measures of relevance. In biological taxonomy, recall is negatively impacted by changes or ambiguity in syntax while precision is negatively impacted when there are changes or ambiguity in semantics. Because changes in syntax are not correlated with changes in semantics, scientific names may be used, singly or conflated into synonymous sets, to improve recall in pattern recognition or search and retrieval. Names cannot be used, however, to improve precision. This is because changes in syntax do not uniquely identify changes in circumscription. These observations place limits on the utility of scientific names within biological informatics applications that rely on names as identifiers for taxa. Taxonomic systems and services used to organize and integrate information about taxa must accommodate the inherent semantic ambiguity of scientific names. The capture and articulation of circumscription differences (i.e., multiple taxon concepts) within such systems must be accompanied with distinct concept identifiers that can be employed in association with, or in replacement of, traditional scientific names.
Collapse
Affiliation(s)
- David Remsen
- Department of Marine Resources, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543
| |
Collapse
|
2
|
Akella LM, Norton CN, Miller H. NetiNeti: discovery of scientific names from text using machine learning methods. BMC Bioinformatics 2012; 13:211. [PMID: 22913485 PMCID: PMC3542245 DOI: 10.1186/1471-2105-13-211] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2010] [Accepted: 08/06/2012] [Indexed: 12/12/2022] Open
Abstract
Background A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information. Results We present NetiNeti (Name Extraction from Textual Information-Name Extraction for Taxonomic Indexing), a machine learning based approach for recognition of scientific names including the discovery of new species names from text that will also handle misspellings, OCR errors and other variations in names. The system generates candidate names using rules for scientific names and applies probabilistic machine learning methods to classify names based on structural features of candidate names and features derived from their contexts. NetiNeti can also disambiguate scientific names from other names using the contextual information. We evaluated NetiNeti on legacy biodiversity texts and biomedical literature (MEDLINE). NetiNeti performs better (precision = 98.9% and recall = 70.5%) compared to a popular dictionary based approach (precision = 97.5% and recall = 54.3%) on a 600-page biodiversity book that was manually marked by an annotator. On a small set of PubMed Central’s full text articles annotated with scientific names, the precision and recall values are 98.5% and 96.2% respectively. NetiNeti found more than 190,000 unique binomial and trinomial names in more than 1,880,000 PubMed records when used on the full MEDLINE database. NetiNeti also successfully identifies almost all of the new species names mentioned within web pages. Conclusions We present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at
http://namefinding.ubio.org.
Collapse
|
3
|
Wheeler Q, Bourgoin T, Coddington J, Gostony T, Hamilton A, Larimer R, Polaszek A, Schauff M, Solis MA. Nomenclatural benchmarking: the roles of digital typification and telemicroscopy. Zookeys 2012; 209:193-202. [PMID: 22859888 PMCID: PMC3406476 DOI: 10.3897/zookeys.209.3486] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2012] [Accepted: 07/13/2012] [Indexed: 11/12/2022] Open
Abstract
Nomenclatural benchmarking is the periodic realignment of species names with species theories and is necessary for the accurate and uniform use of Linnaean binominals in the face of changing species limits. Gaining access to types, often for little more than a cursory examination by an expert, is a major bottleneck in the advance and availability of biodiversity informatics. For the nearly two million described species it has been estimated that five to six million name-bearing type specimens exist, including those for synonymized binominals. Recognizing that examination of types in person will remain necessary in special cases, we propose a four-part strategy for opening access to types that relies heavily on digitization and that would eliminate much of the bottleneck: (1) modify codes of nomenclature to create registries of nomenclatural acts, such as the proposed ZooBank, that include a requirement for digital representations (e-types) for all newly described species to avoid adding to backlog; (2) an "r" strategy that would engineer and deploy a network of automated instruments capable of rapidly creating 3-D images of type specimens not requiring participation of taxon experts; (3) a "K" strategy using remotely operable microscopes to engage taxon experts in targeting and annotating informative characters of types to supplement and extend information content of rapidly acquired e-types, a process that can be done on an as-needed basis as in the normal course of revisionary taxonomy; and (4) creation of a global e-type archive associated with the commissions on nomenclature and species registries providing one-stop-shopping for e-types. We describe a first generation implementation of the "K" strategy that adapts current technology to create a network of Remotely Operable Benchmarkers Of Types (ROBOT) specifically engineered to handle the largest backlog of types, pinned insect specimens. The three initial instruments will be in the Smithsonian Institution(Washington, DC), Natural History Museum (London), and Museum National d'Histoire Naturelle (Paris), networking the three largest insect collections in the world with entomologists worldwide. These three instruments make possible remote examination, manipulation, and photography of types for more than 600,000 species. This is a cybertaxonomy demonstration project that we anticipate will lead to similar instruments for a wide range of museum specimens and objects as well as revolutionary changes in collaborative taxonomy and formal and public taxonomic education.
Collapse
Affiliation(s)
- Quentin Wheeler
- International Institute for Species Exploration, Arizona State University, Tempe, AZ 85287 USA
| | - Thierry Bourgoin
- Laboratoire d’Entomologie, Museum National d’Histoire Naturelle, Rue Buffon, Paris, France
| | - Jonathan Coddington
- National Museum of Natural History, Smithsonian Institution, Washington, DC 20530 USA
| | - Timothy Gostony
- International Institute for Species Exploration, Arizona State University, Tempe, AZ 85287 USA
| | - Andrew Hamilton
- International Institute for Species Exploration, Arizona State University, Tempe, AZ 85287 USA
| | - Roy Larimer
- Visionary Digital, Palmyra, VA 22963 USA 6 National Museum of Natural History, Smithsonian Institution, Washington, DC 20530 USA
| | - Andrew Polaszek
- Department of Life Sciences, The Natural History Museum, London SW7 5BD, U.K
| | - Michael Schauff
- United States Department of Agriculture, Systematic Entomology Laboratory, Beltsville, MD 20705 USA
| | - M. Alma Solis
- United States Department of Agriculture, Systematic Entomology Laboratory, Beltsville, MD 20705 USA
| |
Collapse
|
4
|
Ningthoujam SS, Talukdar AD, Potsangbam KS, Choudhury MD. Challenges in developing medicinal plant databases for sharing ethnopharmacological knowledge. JOURNAL OF ETHNOPHARMACOLOGY 2012; 141:9-32. [PMID: 22401841 DOI: 10.1016/j.jep.2012.02.042] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2011] [Revised: 02/19/2012] [Accepted: 02/25/2012] [Indexed: 05/31/2023]
Abstract
ETHNOPHARMACOLOGICAL RELEVANCE Major research contributions in ethnopharmacology have generated vast amount of data associated with medicinal plants. Computerized databases facilitate data management and analysis making coherent information available to researchers, planners and other users. Web-based databases also facilitate knowledge transmission and feed the circle of information exchange between the ethnopharmacological studies and public audience. However, despite the development of many medicinal plant databases, a lack of uniformity is still discernible. Therefore, it calls for defining a common standard to achieve the common objectives of ethnopharmacology. AIM OF THE STUDY The aim of the study is to review the diversity of approaches in storing ethnopharmacological information in databases and to provide some minimal standards for these databases. MATERIALS AND METHODS Survey for articles on medicinal plant databases was done on the Internet by using selective keywords. Grey literatures and printed materials were also searched for information. Listed resources were critically analyzed for their approaches in content type, focus area and software technology. RESULTS Necessity for rapid incorporation of traditional knowledge by compiling primary data has been felt. While citation collection is common approach for information compilation, it could not fully assimilate local literatures which reflect traditional knowledge. Need for defining standards for systematic evaluation, checking quality and authenticity of the data is felt. Databases focussing on thematic areas, viz., traditional medicine system, regional aspect, disease and phytochemical information are analyzed. Issues pertaining to data standard, data linking and unique identification need to be addressed in addition to general issues like lack of update and sustainability. In the background of the present study, suggestions have been made on some minimum standards for development of medicinal plant database. CONCLUSION In spite of variations in approaches, existence of many overlapping features indicates redundancy of resources and efforts. As the development of global data in a single database may not be possible in view of the culture-specific differences, efforts can be given to specific regional areas. Existing scenario calls for collaborative approach for defining a common standard in medicinal plant database for knowledge sharing and scientific advancement.
Collapse
|
5
|
Wheeler QD, Knapp S, Stevenson DW, Stevenson J, Blum SD, Boom BM, Borisy GG, Buizer JL, De Carvalho MR, Cibrian A, Donoghue MJ, Doyle V, Gerson EM, Graham CH, Graves P, Graves SJ, Guralnick RP, Hamilton AL, Hanken J, Law W, Lipscomb DL, Lovejoy TE, Miller H, Miller JS, Naeem S, Novacek MJ, Page LM, Platnick NI, Porter-Morgan H, Raven PH, Solis MA, Valdecasas AG, Van Der Leeuw S, Vasco A, Vermeulen N, Vogel J, Walls RL, Wilson EO, Woolley JB. Mapping the biosphere: exploring species to understand the origin, organization and sustainability of biodiversity. SYST BIODIVERS 2012. [DOI: 10.1080/14772000.2012.665095] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
6
|
Huang X, Qiao G. Biodiversity data sharing is not just about species names: response to Santos and Branco. Trends Ecol Evol 2012. [DOI: 10.1016/j.tree.2011.10.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
7
|
Jones AC, White RJ, Orme ER. Identifying and relating biological concepts in the Catalogue of Life. J Biomed Semantics 2011; 2:7. [PMID: 22004596 PMCID: PMC3245425 DOI: 10.1186/2041-1480-2-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2011] [Accepted: 10/17/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In this paper we describe our experience of adding globally unique identifiers to the Species 2000 and ITIS Catalogue of Life, an on-line index of organisms which is intended, ultimately, to cover all the world's known species. The scientific species names held in the Catalogue are names that already play an extensive role as terms in the organisation of information about living organisms in bioinformatics and other domains, but the effectiveness of their use is hindered by variation in individuals' opinions and understanding of these terms; indeed, in some cases more than one name will have been used to refer to the same organism. This means that it is desirable to be able to give unique labels to each of these differing concepts within the catalogue and to be able to determine which concepts are being used in other systems, in order that they can be associated with the concepts in the catalogue. Not only is this needed, but it is also necessary to know the relationships between alternative concepts that scientists might have employed, as these determine what can be inferred when data associated with related concepts is being processed. A further complication is that the catalogue itself is evolving as scientific opinion changes due to an increasing understanding of life. RESULTS We describe how we are using Life Science Identifiers (LSIDs) as globally unique identifiers in the Catalogue of Life, explaining how the mapping to species concepts is performed, how concepts are associated with specific editions of the catalogue, and how the Taxon Concept Schema has been adopted in order to express information about concepts and their relationships. We explore the implications of using globally unique identifiers in order to refer to abstract concepts such as species, which incorporate at least a measure of subjectivity in their definition, in contrast with the more traditional use of such identifiers to refer to more tangible entities, events, documents, observations, etc. CONCLUSIONS A major reason for adopting identifiers such as LSIDs is to facilitate data integration. We have demonstrated the incorporation of LSIDs into the Catalogue of Life, in a manner consistent with the biodiversity informatics community's conventions for LSID use. The Catalogue of Life is therefore available as a taxonomy of organisms for use within various disciplines, including biomedical research, by software written with an awareness of these conventions.
Collapse
Affiliation(s)
- Andrew C Jones
- Cardiff School of Computer Science & Informatics, Cardiff University, Queen's Buildings, 5 The Parade, Cardiff CF24 3AA, UK.
| | | | | |
Collapse
|
8
|
Gerner M, Nenadic G, Bergman CM. LINNAEUS: a species name identification system for biomedical literature. BMC Bioinformatics 2010; 11:85. [PMID: 20149233 PMCID: PMC2836304 DOI: 10.1186/1471-2105-11-85] [Citation(s) in RCA: 153] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2009] [Accepted: 02/11/2010] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND The task of recognizing and identifying species names in biomedical literature has recently been regarded as critical for a number of applications in text and data mining, including gene name recognition, species-specific document retrieval, and semantic enrichment of biomedical articles. RESULTS In this paper we describe an open-source species name recognition and normalization software system, LINNAEUS, and evaluate its performance relative to several automatically generated biomedical corpora, as well as a novel corpus of full-text documents manually annotated for species mentions. LINNAEUS uses a dictionary-based approach (implemented as an efficient deterministic finite-state automaton) to identify species names and a set of heuristics to resolve ambiguous mentions. When compared against our manually annotated corpus, LINNAEUS performs with 94% recall and 97% precision at the mention level, and 98% recall and 90% precision at the document level. Our system successfully solves the problem of disambiguating uncertain species mentions, with 97% of all mentions in PubMed Central full-text documents resolved to unambiguous NCBI taxonomy identifiers. CONCLUSIONS LINNAEUS is an open source, stand-alone software system capable of recognizing and normalizing species name mentions with speed and accuracy, and can therefore be integrated into a range of bioinformatics and text-mining applications. The software and manually annotated corpus can be downloaded freely at http://linnaeus.sourceforge.net/.
Collapse
Affiliation(s)
- Martin Gerner
- Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK
| | - Goran Nenadic
- School of Computer Science, University of Manchester, Manchester, M13 9PL, UK
| | - Casey M Bergman
- Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK
| |
Collapse
|
9
|
Lughadha EN, Miller C. Accelerating global access to plant diversity information. TRENDS IN PLANT SCIENCE 2009; 14:622-628. [PMID: 19836991 DOI: 10.1016/j.tplants.2009.08.014] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2009] [Revised: 08/26/2009] [Accepted: 08/27/2009] [Indexed: 05/28/2023]
Abstract
Botanic gardens play key roles in the development and dissemination of plant information resources. Drivers for change have included progress in information technology, growing public expectations of electronic access and international conservation policy. Great advances have been made in the quantity, quality and accessibility of plant information in digital form and the extent to which information from multiple providers can be accessed through a single portal. However, significant challenges remain to be addressed in making botanic gardens resources maximally accessible and impactful, not least the overwhelming volume of material which still awaits digitisation. The year 2010 represents an opportunity for botanic gardens to showcase their collaborative achievements in delivery of electronic plant information and reinforce their relevance to pressing environmental issues.
Collapse
|
10
|
Paton A. Biodiversity informatics and the plant conservation baseline. TRENDS IN PLANT SCIENCE 2009; 14:629-637. [PMID: 19783196 DOI: 10.1016/j.tplants.2009.08.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2009] [Revised: 07/24/2009] [Accepted: 08/12/2009] [Indexed: 05/28/2023]
Abstract
Primary baseline data on taxonomy and species distribution, and its integration with environmental variables, has a valuable role to play in achieving internationally recognised targets for plant diversity conservation, such as the Global Strategy for Plant Conservation. The importance of primary baseline data and the role of biodiversity informatics in linking these data to other environmental variables are discussed. The need to maintain digital resources and make them widely accessible is an additional requirement of institutions who already collect and maintain this baseline data. The lack of resources in many species-rich areas to gather these data and make them widely accessible needs to be addressed if the full benefit of biodiversity informatics on plant conservation is to be realised.
Collapse
Affiliation(s)
- Alan Paton
- Herbarium, Library, Art and Archives, Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AB, UK.
| |
Collapse
|
11
|
Clark BR, Godfray HCJ, Kitching IJ, Mayo SJ, Scoble MJ. Taxonomy as an eScience. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2009; 367:953-966. [PMID: 19087937 DOI: 10.1098/rsta.2008.0190] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The Internet has the potential to provide wider access to biological taxonomy, the knowledge base of which is currently fragmented across a large number of ink-on-paper publications dating from the middle of the eighteenth century. A system (the CATE project) is proposed in which consensus or consolidated taxonomies are presented in the form of Web-based revisions. The workflow is designed to allow the community to offer, online, additions and taxonomic changes ('proposals') to the consolidated taxonomies (e.g. new species and synonymies). A means of quality control in the form of online peer review as part of the editorial process is also included in the workflow. The CATE system rests on taxonomic expertise and judgement, rather than using aggregation technology to accumulate taxonomic information from across the Web. The CATE application and its system and architecture are described in the context of the wider aims and purpose of the project.
Collapse
|
12
|
The future role of bio-ontologies for developing a general data standard in biology: chance and challenge for zoo-morphology. ZOOMORPHOLOGY 2008. [DOI: 10.1007/s00435-008-0081-5] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
13
|
Sarkar IN, Schenk R, Norton CN. Exploring historical trends using taxonomic name metadata. BMC Evol Biol 2008; 8:144. [PMID: 18477399 PMCID: PMC2408592 DOI: 10.1186/1471-2148-8-144] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2007] [Accepted: 05/13/2008] [Indexed: 11/10/2022] Open
Abstract
Background Authority and year information have been attached to taxonomic names since Linnaean times. The systematic structure of taxonomic nomenclature facilitates the ability to develop tools that can be used to explore historical trends that may be associated with taxonomy. Results From the over 10.7 million taxonomic names that are part of the uBio system [4], approximately 3 million names were identified to have taxonomic authority information from the years 1750 to 2004. A pipe-delimited file was then generated, organized according to a Linnaean hierarchy and by years from 1750 to 2004, and imported into an Excel workbook. A series of macros were developed to create an Excel-based tool and a complementary Web site to explore the taxonomic data. A cursory and speculative analysis of the data reveals observable trends that may be attributable to significant events that are of both taxonomic (e.g., publishing of key monographs) and societal importance (e.g., world wars). The findings also help quantify the number of taxonomic descriptions that may be made available through digitization initiatives. Conclusion Temporal organization of taxonomic data can be used to identify interesting biological epochs relative to historically significant events and ongoing efforts. We have developed an Excel workbook and complementary Web site that enables one to explore taxonomic trends for Linnaean taxonomic groupings, from Kingdoms to Families.
Collapse
|
14
|
Page RDM. Biodiversity informatics: the challenge of linking data and the role of shared identifiers. Brief Bioinform 2008; 9:345-54. [PMID: 18445641 DOI: 10.1093/bib/bbn022] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A major challenge facing biodiversity informatics is integrating data stored in widely distributed databases. Initial efforts have relied on taxonomic names as the shared identifier linking records in different databases. However, taxonomic names have limitations as identifiers, being neither stable nor globally unique, and the pace of molecular taxonomic and phylogenetic research means that a lot of information in public sequence databases is not linked to formal taxonomic names. This review explores the use of other identifiers, such as specimen codes and GenBank accession numbers, to link otherwise disconnected facts in different databases. The structure of these links can also be exploited using the PageRank algorithm to rank the results of searches on biodiversity databases. The key to rich integration is a commitment to deploy and reuse globally unique, shared identifiers [such as Digital Object Identifiers (DOIs) and Life Science Identifiers (LSIDs)], and the implementation of services that link those identifiers.
Collapse
Affiliation(s)
- Roderic D M Page
- Division of Environmental and Evolutional Biology, Institute of Biomedical and Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK.
| |
Collapse
|
15
|
Godfray HCJ, Clark BR, Kitching IJ, Mayo SJ, Scoble MJ. The Web and the Structure of Taxonomy. Syst Biol 2007; 56:943-55. [DOI: 10.1080/10635150701777521] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open
Affiliation(s)
- H. C. J. Godfray
- Department of Zoology, University of Oxford South Park Road, Oxford, OX1 3PS, UK E-mail: (H.C.J.G.)
| | - B. R. Clark
- Department of Zoology, University of Oxford South Park Road, Oxford, OX1 3PS, UK E-mail: (H.C.J.G.)
| | - I. J. Kitching
- Department of Entomology, Natural History Museum Cromwell Road, London, SW7 5BD, UK
| | - S. J. Mayo
- Royal Botanic Gardens Kew, Richmond, TW9 3AE, UK
| | - M. J. Scoble
- Department of Entomology, Natural History Museum Cromwell Road, London, SW7 5BD, UK
| |
Collapse
|
16
|
Page RDM. TBMap: a taxonomic perspective on the phylogenetic database TreeBASE. BMC Bioinformatics 2007; 8:158. [PMID: 17511869 PMCID: PMC1885449 DOI: 10.1186/1471-2105-8-158] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2007] [Accepted: 05/18/2007] [Indexed: 11/10/2022] Open
Abstract
Background TreeBASE is currently the only available large-scale database of published organismal phylogenies. Its utility is hampered by a lack of taxonomic consistency, both within the database, and with names of organisms in external genomic, specimen, and taxonomic databases. The extent to which the phylogenetic knowledge in TreeBASE becomes integrated with these other sources is limited by this lack of consistency. Description Taxonomic names in TreeBASE were mapped onto names in the external taxonomic databases IPNI, ITIS, NCBI, and uBio, and graph G of these mappings was constructed. Additional edges representing taxonomic synonymies were added to G, then all components of G were extracted. These components correspond to "name clusters", and group together names in TreeBASE that are inferred to refer to the same taxon. The mapping to NCBI enables hierarchical queries to be performed, which can improve TreeBASE information retrieval by an order of magnitude. Conclusion TBMap database provides a mapping of the bulk of the names in TreeBASE to names in external taxonomic databases, and a clustering of those mappings into sets of names that can be regarded as equivalent. This mapping enables queries and visualisations that cannot otherwise be constructed. A simple query interface to the mapping and names clusters is available at .
Collapse
Affiliation(s)
- Roderic D M Page
- Division of Environmental and Evolutionary Biology, Institute of Biomedical and Life Sciences, Graham Kerr Building, University of Glasgow, Glasgow, UK.
| |
Collapse
|
17
|
Leary PR, Remsen DP, Norton CN, Patterson DJ, Sarkar IN. uBioRSS: Tracking taxonomic literature using RSS. Bioinformatics 2007; 23:1434-6. [PMID: 17392332 DOI: 10.1093/bioinformatics/btm109] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED Web content syndication through standard formats such as RSS and ATOM has become an increasingly popular mechanism for publishers, news sources and blogs to disseminate regularly updated content. These standardized syndication formats deliver content directly to the subscriber, allowing them to locally aggregate content from a variety of sources instead of having to find the information on multiple websites. The uBioRSS application is a 'taxonomically intelligent' service customized for the biological sciences. It aggregates syndicated content from academic publishers and science news feeds, and then uses a taxonomic Named Entity Recognition algorithm to identify and index taxonomic names within those data streams. The resulting name index is cross-referenced to current global taxonomic datasets to provide context for browsing the publications by taxonomic group. This process, called taxonomic indexing, draws upon services developed specifically for biological sciences, collectively referred to as 'taxonomic intelligence'. Such value-added enhancements can provide biologists with accelerated and improved access to current biological content. AVAILABILITY http://names.ubio.org/rss/
Collapse
Affiliation(s)
- Patrick R Leary
- MBL Informatics, Marine Biological Laboratory, Woods Hole, MA 02543, USA
| | | | | | | | | |
Collapse
|
18
|
Padial JM, de la Riva I. Taxonomic Inflation and the Stability of Species Lists: The Perils of Ostrich's Behavior. Syst Biol 2006; 55:859-67. [PMID: 17060206 DOI: 10.1080/1063515060081588] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
Affiliation(s)
- José M Padial
- Department of Biodiversity and Evolutionary Biology, Museo Nacional de Ciencias Naturales-CSIC, C/José Gutiérrez Abascal 2, Madrid, Spain
| | | |
Collapse
|