1
|
Bauermeister S, Phatak M, Sparks K, Sargent L, Griswold M, McHugh C, Nalls M, Young S, Bauermeister J, Elliott P, Steptoe A, Porteous D, Dufouil C, Gallacher J. Evaluating the harmonisation potential of diverse cohort datasets. Eur J Epidemiol 2023; 38:605-615. [PMID: 37099244 PMCID: PMC10232583 DOI: 10.1007/s10654-023-00997-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 03/22/2023] [Indexed: 04/27/2023]
Abstract
Data discovery, the ability to find datasets relevant to an analysis, increases scientific opportunity, improves rigour and accelerates activity. Rapid growth in the depth, breadth, quantity and availability of data provides unprecedented opportunities and challenges for data discovery. A potential tool for increasing the efficiency of data discovery, particularly across multiple datasets is data harmonisation.A set of 124 variables, identified as being of broad interest to neurodegeneration, were harmonised using the C-Surv data model. Harmonisation strategies used were simple calibration, algorithmic transformation and standardisation to the Z-distribution. Widely used data conventions, optimised for inclusiveness rather than aetiological precision, were used as harmonisation rules. The harmonisation scheme was applied to data from four diverse population cohorts.Of the 120 variables that were found in the datasets, correspondence between the harmonised data schema and cohort-specific data models was complete or close for 111 (93%). For the remainder, harmonisation was possible with a marginal a loss of granularity.Although harmonisation is not an exact science, sufficient comparability across datasets was achieved to enable data discovery with relatively little loss of informativeness. This provides a basis for further work extending harmonisation to a larger variable list, applying the harmonisation to further datasets, and incentivising the development of data discovery tools.
Collapse
Affiliation(s)
| | - Mukta Phatak
- Alzheimer Disease Data Initiative, Kirkland, Washington, USA
| | | | | | | | - Caitlin McHugh
- Alzheimer Disease Data Initiative, Kirkland, Washington, USA
| | - Mike Nalls
- Data Tecnica International LLC, Washington, USA
| | | | | | | | | | | | | | | |
Collapse
|
2
|
Khalsa SJS, Borsa A, Nandigam V, Phan M, Lin K, Crosby C, Fricker H, Baru C, Lopez L. OpenAltimetry - rapid analysis and visualization of Spaceborne altimeter data. Earth Sci Inform 2020; 15:1471-1480. [PMID: 36003899 PMCID: PMC9392693 DOI: 10.1007/s12145-020-00520-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Accepted: 08/31/2020] [Indexed: 06/13/2023]
Abstract
NASA's Ice, Cloud, and land Elevation Satellite-2 (ICESat-2) carries a laser altimeter that fires 10,000 pulses per second towards Earth and records the travel time of individual photons to measure the elevation of the surface below. The volume of data produced by ICESat-2, nearly a TB per day, presents significant challenges for users wishing to efficiently explore the dataset. NASA's National Snow and Ice Data Center (NSIDC) Distributed Active Archive Center (DAAC), which is responsible for archiving and distributing ICESat-2 data, provides search and subsetting services on mission data products, but providing interactive data discovery and visualization tools needed to assess data coverage and quality in a given area of interest is outside of NSIDC's mandate. The OpenAltimetry project, a NASA-funded collaboration between NSIDC, UNAVCO and the University of California San Diego, has developed a web-based cyberinfrastructure platform that allows users to locate, visualize, and download ICESat-2 surface elevation data and photon clouds for any location on Earth, on demand. OpenAltimetry also provides access to elevations and waveforms for ICESat (the predecessor mission to ICESat-2). In addition, OpenAltimetry enables data access via APIs, opening opportunities for rapid access, experimentation, and computation via third party applications like Jupyter notebooks. OpenAltimetry emphasizes ease-of-use for new users and rapid access to entire altimetry datasets for experts and has been successful in meeting the needs of different user groups. In this paper we describe the principles that guided the design and development of the OpenAltimetry platform and provide a high-level overview of the cyberinfrastructure components of the system.
Collapse
Affiliation(s)
| | - Adrian Borsa
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA USA
| | | | - Minh Phan
- University of California San Diego, La Jolla, CA USA
| | - Kai Lin
- University of California San Diego, La Jolla, CA USA
| | | | - Helen Fricker
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA USA
| | - Chaitan Baru
- University of California San Diego, La Jolla, CA USA
| | - Luis Lopez
- University of Colorado Boulder, Boulder, CO USA
| |
Collapse
|
3
|
Corti P, Kralidis AT, Lewis B. Enhancing discovery in spatial data infrastructures using a search engine. PeerJ Comput Sci 2018; 4:e152. [PMID: 33816806 PMCID: PMC7924664 DOI: 10.7717/peerj-cs.152] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Accepted: 04/03/2018] [Indexed: 06/12/2023]
Abstract
A spatial data infrastructure (SDI) is a framework of geospatial data, metadata, users and tools intended to provide an efficient and flexible way to use spatial information. One of the key software components of an SDI is the catalogue service which is needed to discover, query and manage the metadata. Catalogue services in an SDI are typically based on the Open Geospatial Consortium (OGC) Catalogue Service for the Web (CSW) standard which defines common interfaces for accessing the metadata information. A search engine is a software system capable of supporting fast and reliable search, which may use 'any means necessary' to get users to the resources they need quickly and efficiently. These techniques may include full text search, natural language processing, weighted results, fuzzy tolerance results, faceting, hit highlighting, recommendations and many others. In this paper we present an example of a search engine being added to an SDI to improve search against large collections of geospatial datasets. The Centre for Geographic Analysis (CGA) at Harvard University re-engineered the search component of its public domain SDI (Harvard WorldMap) which is based on the GeoNode platform. A search engine was added to the SDI stack to enhance the CSW catalogue discovery abilities. It is now possible to discover spatial datasets from metadata by using the standard search operations of the catalogue and to take advantage of the new abilities of the search engine, to return relevant and reliable content to SDI users.
Collapse
Affiliation(s)
- Paolo Corti
- Center for Geographic Analysis, Harvard University, Cambridge, MA, USA
| | | | - Benjamin Lewis
- Center for Geographic Analysis, Harvard University, Cambridge, MA, USA
| |
Collapse
|
4
|
Michael E, Singh BK, Mayala BK, Smith ME, Hampton S, Nabrzyski J. Continental-scale, data-driven predictive assessment of eliminating the vector-borne disease, lymphatic filariasis, in sub-Saharan Africa by 2020. BMC Med 2017; 15:176. [PMID: 28950862 PMCID: PMC5615442 DOI: 10.1186/s12916-017-0933-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/06/2017] [Accepted: 08/16/2017] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND There are growing demands for predicting the prospects of achieving the global elimination of neglected tropical diseases as a result of the institution of large-scale nation-wide intervention programs by the WHO-set target year of 2020. Such predictions will be uncertain due to the impacts that spatial heterogeneity and scaling effects will have on parasite transmission processes, which will introduce significant aggregation errors into any attempt aiming to predict the outcomes of interventions at the broader spatial levels relevant to policy making. We describe a modeling platform that addresses this problem of upscaling from local settings to facilitate predictions at regional levels by the discovery and use of locality-specific transmission models, and we illustrate the utility of using this approach to evaluate the prospects for eliminating the vector-borne disease, lymphatic filariasis (LF), in sub-Saharan Africa by the WHO target year of 2020 using currently applied or newly proposed intervention strategies. METHODS AND RESULTS: We show how a computational platform that couples site-specific data discovery with model fitting and calibration can allow both learning of local LF transmission models and simulations of the impact of interventions that take a fuller account of the fine-scale heterogeneous transmission of this parasitic disease within endemic countries. We highlight how such a spatially hierarchical modeling tool that incorporates actual data regarding the roll-out of national drug treatment programs and spatial variability in infection patterns into the modeling process can produce more realistic predictions of timelines to LF elimination at coarse spatial scales, ranging from district to country to continental levels. Our results show that when locally applicable extinction thresholds are used, only three countries are likely to meet the goal of LF elimination by 2020 using currently applied mass drug treatments, and that switching to more intensive drug regimens, increasing the frequency of treatments, or switching to new triple drug regimens will be required if LF elimination is to be accelerated in Africa. The proportion of countries that would meet the goal of eliminating LF by 2020 may, however, reach up to 24/36 if the WHO 1% microfilaremia prevalence threshold is used and sequential mass drug deliveries are applied in countries. CONCLUSIONS We have developed and applied a data-driven spatially hierarchical computational platform that uses the discovery of locally applicable transmission models in order to predict the prospects for eliminating the macroparasitic disease, LF, at the coarser country level in sub-Saharan Africa. We show that fine-scale spatial heterogeneity in local parasite transmission and extinction dynamics, as well as the exact nature of intervention roll-outs in countries, will impact the timelines to achieving national LF elimination on this continent.
Collapse
Affiliation(s)
- Edwin Michael
- Department of Biological Sciences, University of Notre Dame, Galvin Life Science Center, Notre Dame, IN, 46556, USA.
| | - Brajendra K Singh
- Department of Biological Sciences, University of Notre Dame, Galvin Life Science Center, Notre Dame, IN, 46556, USA
| | - Benjamin K Mayala
- Department of Biological Sciences, University of Notre Dame, Galvin Life Science Center, Notre Dame, IN, 46556, USA
| | - Morgan E Smith
- Department of Biological Sciences, University of Notre Dame, Galvin Life Science Center, Notre Dame, IN, 46556, USA
| | - Scott Hampton
- Center for Research Computing, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Jaroslaw Nabrzyski
- Center for Research Computing, University of Notre Dame, Notre Dame, IN, 46556, USA
| |
Collapse
|