1
|
Acheson E, Purves RS. Extracting and modeling geographic information from scientific articles. PLoS One 2021; 16:e0244918. [PMID: 33406109 PMCID: PMC7787447 DOI: 10.1371/journal.pone.0244918] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Accepted: 12/20/2020] [Indexed: 11/29/2022] Open
Abstract
Scientific articles often contain relevant geographic information such as where field work was performed or where patients were treated. Most often, this information appears in the full-text article contents as a description in natural language including place names, with no accompanying machine-readable geographic metadata. Automatically extracting this geographic information could help conduct meta-analyses, find geographical research gaps, and retrieve articles using spatial search criteria. Research on this problem is still in its infancy, with many works manually processing corpora for locations and few cross-domain studies. In this paper, we develop a fully automatic pipeline to extract and represent relevant locations from scientific articles, applying it to two varied corpora. We obtain good performance, with full pipeline precision of 0.84 for an environmental corpus, and 0.78 for a biomedical corpus. Our results can be visualized as simple global maps, allowing human annotators to both explore corpus patterns in space and triage results for downstream analysis. Future work should not only focus on improving individual pipeline components, but also be informed by user needs derived from the potential spatial analysis and exploration of such corpora.
Collapse
Affiliation(s)
- Elise Acheson
- Department of Geography, University of Zurich, Zurich, Switzerland
- * E-mail:
| | - Ross S. Purves
- Department of Geography, University of Zurich, Zurich, Switzerland
| |
Collapse
|
2
|
Known unknowns: Filling the gaps in scientific knowledge production in the Caatinga. PLoS One 2019; 14:e0219359. [PMID: 31269071 PMCID: PMC6608954 DOI: 10.1371/journal.pone.0219359] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2018] [Accepted: 06/23/2019] [Indexed: 11/19/2022] Open
Abstract
The Caatinga is an ecologically unique semi-arid region of northeast Brazil characterized by high levels of endemism and severe anthropogenic threats from agricultural development and climate change. It is also one of the least known biomes in Brazil due to a combination of inadequate investment, low regional research capacity and difficult working conditions. However, while the lack of scientific knowledge of the Caatinga is well known, the spatial and temporal distribution of knowledge production has not been investigated. This is important because such biases undermine the development of effective conservation policy and practice and increase the uncertainty associated with conservation actions. Here, we map the geography of conservation knowledge production in the Caatinga and use an innovative hurdle model to identify the presumptive factors driving these patterns. Our analysis revealed strong geographic patterns, with research sites concentrated in the east of the region and in areas close to roads and research centres. There was also a positive association between conservation knowledge production and risk of desertification, indicating that conservation scientists are responding to conservation challenges faced by Caatinga’s fauna and flora arising from climate change. Our results also highlight the pivotal role of pioneer scientists (those who develop research sites in previously unstudied/understudied areas) in determining the future geographic patterns of knowledge production. We conclude our article with a brief discussion of potential policies for increasing the spatial representativeness of conservation research in this remarkable ecosystem.
Collapse
|
4
|
Karl JW. Mining location information from life- and earth-sciences studies to facilitate knowledge discovery. JOURNAL OF LIBRARIANSHIP AND INFORMATION SCIENCE 2018. [DOI: 10.1177/0961000618759413] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Location information in published studies represents an untapped resource for literature discovery, applicable to a range of domains. The ability to easily discover scientific articles from specific places, nearby locales, or similar (but geographically separate) areas worldwide is important for advancing science and addressing global sustainability challenges. However, the thematic and not geographic nature of current search tools makes location-based searches challenging and inefficient. Manually geolocating studies is labor intensive, and place-name recognition algorithms have performed poorly due to prevalence of irrelevant place names in scientific articles. These challenges have hindered past efforts to create map-based literature search tools. Thus, automated approaches are needed to sustain article georeferencing efforts. Common pattern-matching algorithms (parsers) can be used to identify and extract geographic coordinates from the text of published articles. Pattern-matching algorithms (geoparsers) were developed using regular expressions and lexical parsing and tested their performance against sets of full-text articles from multiple journals that were manually scanned for coordinates. Both geoparsers performed well at recognizing and extracting coordinates from articles with accuracy ranging from 85.1% to 100%, and the lexical geoparser performing marginally better. Omission errors (i.e. missed coordinates) were 0% to 14.9% for the regular expression geoparser and 0% to 10.3% for the lexical geoparser. Only a single commission error (i.e. erroneous coordinate) was encountered with the lexical geoparser. The ability to automatically identify and extract location information from published studies opens new possibilities for transforming scientific literature discovery and supporting novel research.
Collapse
|
5
|
Maggio A, Kuffer J, Lazzari M. Advances and trends in bibliographic research: Examples of new technological applications for the cataloguing of the georeferenced library heritage. JOURNAL OF LIBRARIANSHIP AND INFORMATION SCIENCE 2016. [DOI: 10.1177/0961000616652134] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In the age of digital archives and online data consultation, bibliographic research is considered as a key tool for supporting scientific research and study. The online catalogue allows the achievement of more ambitious aims and global interest thanks to its ability to associate data relating to the geographic contextualization of the catalogued editorial products (deduced from the title and content) with the search for more traditional bibliographic data through the inclusion of a specific and standardized ‘field’. Successively, the locations identified by the cataloguer are georeferenced by using GIS applications, which allows the simultaneous view of the distribution of global and local geographical contexts specific for each item owned by a library, archive or museum. The usefulness of such an application lies in the possibility for the library to have a greater awareness of its collection, thus permitting the acquisition of an additional element of evaluation in the management and planning of purchases and donations. In this way, the ability to filter the information from OPAC search will be combined with the basic research carried out by the user by selecting only the libraries in possession of works related to a specific geographical context, involved in different specific studies (literature, landscape, environment). Although this ability is still limited to a few specific studies, the use of tools that allow an overview of the geographical distribution of places could represent an operating standard through the definition of a special protocol. These tools are now used mostly in experimental studies in which the use of open source software has enabled the creation of maps. This paper shows the state of the art of the applications worldwide presenting experimental case studies (i.e. Coos Bay, Oregon; Basilicata, Italy) and also suggests different applications in the field of national and international protocols of library cataloguing.
Collapse
|
7
|
McInerny GJ, Chen M, Freeman R, Gavaghan D, Meyer M, Rowland F, Spiegelhalter DJ, Stefaner M, Tessarolo G, Hortal J. Information visualisation for science and policy: engaging users and avoiding bias. Trends Ecol Evol 2014; 29:148-57. [PMID: 24565371 DOI: 10.1016/j.tree.2014.01.003] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2013] [Revised: 12/06/2013] [Accepted: 01/11/2014] [Indexed: 01/27/2023]
Abstract
Visualisations and graphics are fundamental to studying complex subject matter. However, beyond acknowledging this value, scientists and science-policy programmes rarely consider how visualisations can enable discovery, create engaging and robust reporting, or support online resources. Producing accessible and unbiased visualisations from complicated, uncertain data requires expertise and knowledge from science, policy, computing, and design. However, visualisation is rarely found in our scientific training, organisations, or collaborations. As new policy programmes develop [e.g., the Intergovernmental Platform on Biodiversity and Ecosystem Services (IPBES)], we need information visualisation to permeate increasingly both the work of scientists and science policy. The alternative is increased potential for missed discoveries, miscommunications, and, at worst, creating a bias towards the research that is easiest to display.
Collapse
Affiliation(s)
- Greg J McInerny
- Department of Computer Science, University of Oxford, Wolfson Building, Parks Road, Oxford, OX1 3QD, UK; Computational Science Laboratory, Microsoft Research Ltd, 21 Station Road, Cambridge, CB1 2FB, UK.
| | - Min Chen
- Oxford E-science Research Centre, 7 Keble Road, University of Oxford, Oxford, OX1 3QG, UK
| | - Robin Freeman
- Institute of Zoology, Zoological Society of London, Regent's Park, London, NW1 4RY, UK
| | - David Gavaghan
- Department of Computer Science, University of Oxford, Wolfson Building, Parks Road, Oxford, OX1 3QD, UK
| | - Miriah Meyer
- Scientific Computing and Imaging Institute, School of Computing, University of Utah, Salt Lake City, UT 84112, USA
| | - Francis Rowland
- EMBL, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - David J Spiegelhalter
- Statistical Laboratory, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Road, Cambridge, Cambridge, CB3 0WB, UK
| | | | - Geizi Tessarolo
- Departamento de Ecologia, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, GO, Brazil; Departamento de Biogeografía y Cambio Global, Museo Nacional de Ciencias Naturales (CSIC), C/José Gutiérrez Abascal 2, 28006 Madrid, Spain
| | - Joaquin Hortal
- Departamento de Biogeografía y Cambio Global, Museo Nacional de Ciencias Naturales (CSIC), C/José Gutiérrez Abascal 2, 28006 Madrid, Spain
| |
Collapse
|