Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Cui H, Xu D, Chong SS, Ramirez M, Rodenhausen T, Macklin JA, Ludäscher B, Morris RA, Soto EM, Koch NM. Introducing Explorer of Taxon Concepts with a case study on spider measurement matrix building. BMC Bioinformatics 2016;17:471. [PMID: 27855645 PMCID: PMC5114841 DOI: 10.1186/s12859-016-1352-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Accepted: 11/11/2016] [Indexed: 11/10/2022] Open

For:	Cui H, Xu D, Chong SS, Ramirez M, Rodenhausen T, Macklin JA, Ludäscher B, Morris RA, Soto EM, Koch NM. Introducing Explorer of Taxon Concepts with a case study on spider measurement matrix building. BMC Bioinformatics 2016;17:471. [PMID: 27855645 PMCID: PMC5114841 DOI: 10.1186/s12859-016-1352-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Accepted: 11/11/2016] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Folk RA, Guralnick RP, LaFrance RT. FloraTraiter: Automated parsing of traits from descriptive biodiversity literature. APPLICATIONS IN PLANT SCIENCES 2024;12:e11563. [PMID: 38369975 PMCID: PMC10873814 DOI: 10.1002/aps3.11563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 09/13/2023] [Accepted: 10/01/2023] [Indexed: 02/20/2024]

Campbell DL, Thessen AE, Ries L. A novel curation system to facilitate data integration across regional citizen science survey programs. PeerJ 2020;8:e9219. [PMID: 32821528 PMCID: PMC7395600 DOI: 10.7717/peerj.9219] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Accepted: 04/28/2020] [Indexed: 11/20/2022] Open

Walton S, Livermore L, Bánki O, Cubey R, Drinkwater R, Englund M, Goble C, Groom Q, Kermorvant C, Rey I, Santos C, Scott B, Williams A, Wu Z. Landscape Analysis for the Specimen Data Refinery. RESEARCH IDEAS AND OUTCOMES 2020. [DOI: 10.3897/rio.6.e57602] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

The Spider Anatomy Ontology (SPD)—A Versatile Tool to Link Anatomy with Cross-Disciplinary Data. DIVERSITY 2019. [DOI: 10.3390/d11100202] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]

Endara L, Thessen AE, Cole HA, Walls R, Gkoutos G, Cao Y, Chong SS, Cui H. Modifier Ontologies for frequency, certainty, degree, and coverage phenotype modifier. Biodivers Data J 2018;6:e29232. [PMID: 30532623 PMCID: PMC6281706 DOI: 10.3897/bdj.6.e29232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 11/20/2018] [Indexed: 11/21/2022] Open

Abstract

Background: When phenotypic characters are described in the literature, they may be constrained or clarified with additional information such as the location or degree of expression, these terms are called "modifiers". With effort underway to convert narrative character descriptions to computable data, ontologies for such modifiers are needed. Such ontologies can also be used to guide term usage in future publications. Spatial and method modifiers are the subjects of ontologies that already have been developed or are under development. In this work, frequency (e.g., rarely, usually), certainty (e.g., probably, definitely), degree (e.g., slightly, extremely), and coverage modifiers (e.g., sparsely, entirely) are collected, reviewed, and used to create two modifier ontologies with different design considerations. The basic goal is to express the sequential relationships within a type of modifiers, for example, usually is more frequent than rarely, in order to allow data annotated with ontology terms to be classified accordingly. Method: Two designs are proposed for the ontology, both using the list pattern: a closed ordered list (i.e., five-bin design) and an open ordered list design. The five-bin design puts the modifier terms into a set of 5 fixed bins with interval object properties, for example, one_level_more/less_frequently_than, where new terms can only be added as synonyms to existing classes. The open list approach starts with 5 bins, but supports the extensibility of the list via ordinal properties, for example, more/less_frequently_than, allowing new terms to be inserted as a new class anywhere in the list. The consequences of the different design decisions are discussed in the paper. CharaParser was used to extract modifiers from plant, ant, and other taxonomic descriptions. After a manual screening, 130 modifier words were selected as the candidate terms for the modifier ontologies. Four curators/experts (three biologists and one information scientist specialized in biosemantics) reviewed and categorized the terms into 20 bins using the Ontology Term Organizer (OTO) (http://biosemantics.arizona.edu/OTO). Inter-curator variations were reviewed and expressed in the final ontologies. Results: Frequency, certainty, degree, and coverage terms with complete agreement among all curators were used as class labels or exact synonyms. Terms with different interpretations were either excluded or included using "broader synonym" or "not recommended" annotation properties. These annotations explicitly allow for the user to be aware of the semantic ambiguity associated with the terms and whether they should be used with caution or avoided. Expert categorization results showed that 16 out of 20 bins contained terms with full agreements, suggesting differentiating the modifiers into 5 levels/bins balances the need to differentiate modifiers and the need for the ontology to reflect user consensus. Two ontologies, developed using the Protege ontology editor, are made available as OWL files and can be downloaded from https://github.com/biosemantics/ontologies. Contribution: We built the first two modifier ontologies following a consensus-based approach with terms commonly used in taxonomic literature. The five-bin ontology has been used in the Explorer of Taxon Concepts web toolkit to compute the similarity between characters extracted from literature to facilitate taxon concepts alignments. The two ontologies will also be used in an ontology-informed authoring tool for taxonomists to facilitate consistency in modifier term usage.

Collapse

Affiliation(s)

Lorena Endara University of Florida, Gainesville, United States of AmericaUniversity of FloridaGainesvilleUnited States of America
Anne E Thessen The Ronin Institute for Independent Scholarship, Monclair, NJ, United States of AmericaThe Ronin Institute for Independent ScholarshipMonclair, NJUnited States of America
Heather A Cole Science and Technology Branch, Agriculture and Agri-Food Canada, Government of Canada, Ottawa, CanadaScience and Technology Branch, Agriculture and Agri-Food Canada, Government of CanadaOttawaCanada
Ramona Walls CyVerse, Tucson, United States of AmericaCyVerseTucsonUnited States of America
Georgios Gkoutos College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, United KingdomCollege of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of BirminghamBirminghamUnited Kingdom Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, B15 2TT, Birmingham, United KingdomInstitute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, B15 2TTBirminghamUnited Kingdom
Yujie Cao Center for Studies of Information Resources, Wuhan Universtity, Wuhan, ChinaCenter for Studies of Information Resources, Wuhan UniverstityWuhanChina
Steven S. Chong National Center for Ecological Analysis and Synthesis, University of California, Santa Barbara, Santa Barbara, United States of AmericaNational Center for Ecological Analysis and Synthesis, University of California, Santa BarbaraSanta BarbaraUnited States of America University of Arizona, Tucson, United States of AmericaUniversity of ArizonaTucsonUnited States of America
Hong Cui University of Arizona, Tucson, United States of AmericaUniversity of ArizonaTucsonUnited States of America

Collapse

Cui H, Macklin JA, Sachs J, Reznicek A, Starr J, Ford B, Penev L, Chen HL. Incentivising use of structured language in biological descriptions: Author-driven phenotype data and ontology production. Biodivers Data J 2018;6:e29616. [PMID: 30473620 PMCID: PMC6235995 DOI: 10.3897/bdj.6.e29616] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Accepted: 10/23/2018] [Indexed: 01/17/2023] Open

Abstract

Phenotypes are used for a multitude of purposes such as defining species, reconstructing phylogenies, diagnosing diseases or improving crop and animal productivity, but most of this phenotypic data is published in free-text narratives that are not computable. This means that the complex relationship between the genome, the environment and phenotypes is largely inaccessible to analysis and important questions related to the evolution of organisms, their diseases or their response to climate change cannot be fully addressed. It takes great effort to manually convert free-text narratives to a computable format before they can be used in large-scale analyses. We argue that this manual curation approach is not a sustainable solution to produce computable phenotypic data for three reasons: 1) it does not scale to all of biodiversity; 2) it does not stop the publication of free-text phenotypes that will continue to need manual curation in the future and, most importantly, 3) It does not solve the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other). Our empirical studies have shown that inter-curator variation is as high as 40% even within a single project. With this level of variation, it is difficult to imagine that data integrated from multiple curation projects can be of high quality. The key causes of this variation have been identified as semantic vagueness in original phenotype descriptions and difficulties in using standardised vocabularies (ontologies). We argue that the authors describing phenotypes are the key to the solution. Given the right tools and appropriate attribution, the authors should be in charge of developing a project's semantics and ontology. This will speed up ontology development and improve the semantic clarity of phenotype descriptions from the moment of publication. A proof of concept project on this idea was funded by NSF ABI in July 2017. We seek readers input or critique of the proposed approaches to help achieve community-based computable phenotype data production in the near future. Results from this project will be accessible through https://biosemantics.github.io/author-driven-production.

Collapse

Xu D, Chong SS, Rodenhausen T, Cui H. Resolving "orphaned" non-specific structures using machine learning and natural language processing methods. Biodivers Data J 2018:e26659. [PMID: 30393454 PMCID: PMC6207837 DOI: 10.3897/bdj.6.e26659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Accepted: 07/27/2018] [Indexed: 11/12/2022] Open

Vaidya G, Lepage D, Guralnick R. The tempo and mode of the taxonomic correction process: How taxonomists have corrected and recorrected North American bird species over the last 127 years. PLoS One 2018;13:e0195736. [PMID: 29672539 PMCID: PMC5909608 DOI: 10.1371/journal.pone.0195736] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Accepted: 03/28/2018] [Indexed: 11/19/2022] Open

Abstract

While studies of taxonomy usually focus on species description, there is also a taxonomic correction process that retests and updates existing species circumscriptions on the basis of new evidence. These corrections may themselves be subsequently retested and recorrected. We studied this correction process by using the Check-List of North and Middle American Birds, a well-known taxonomic checklist that spans 130 years. We identified 142 lumps and 95 splits across sixty-three versions of the Check-List and found that while lumping rates have markedly decreased since the 1970s, splitting rates are accelerating. We found that 74% of North American bird species recognized today have never been corrected (i.e., lumped or split) over the period of the checklist, while 16% have been corrected exactly once and 10% have been corrected twice or more. Since North American bird species are known to have been extensively lumped in the first half of the 20^th century with the advent of the biological species concept, we determined whether most splits seen today were the result of those lumps being recorrected. We found that 5% of lumps and 23% of splits fully reverted previous corrections, while a further 3% of lumps and 13% of splits are partial reversions. These results show a taxonomic correction process with moderate levels of recorrection, particularly of previous lumps. However, 81% of corrections do not revert any previous corrections, suggesting that the majority result in novel circumscriptions not previously recognized by the Check-List. We could find no order or family with a significantly higher rate of correction than any other, but twenty-two genera as currently recognized by the AOU do have significantly higher rates than others. Given the currently accelerating rate of splitting, prediction of the end-point of the taxonomic recorrection process is difficult, and many entirely new taxonomic concepts are still being, and likely will continue to be, proposed and further tested.

Collapse

Folk RA, Sun M, Soltis PS, Smith SA, Soltis DE, Guralnick RP. Challenges of comprehensive taxon sampling in comparative biology: Wrestling with rosids. AMERICAN JOURNAL OF BOTANY 2018;105:433-445. [PMID: 29665035 DOI: 10.1002/ajb2.1059] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2017] [Accepted: 12/19/2017] [Indexed: 06/08/2023]

Endara L, Cui H, Burleigh JG. Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing. APPLICATIONS IN PLANT SCIENCES 2018;6:e1035. [PMID: 29732265 PMCID: PMC5895189 DOI: 10.1002/aps3.1035] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Accepted: 01/31/2018] [Indexed: 05/09/2023]