1
|
Butters OW, Wilson RC, Burton PR. Recognizing, reporting and reducing the data curation debt of cohort studies. Int J Epidemiol 2021; 49:1067-1074. [PMID: 32617581 PMCID: PMC7660145 DOI: 10.1093/ije/dyaa087] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Accepted: 04/27/2020] [Indexed: 11/12/2022] Open
Abstract
Good data curation is integral to cohort studies, but it is not always done to a level necessary to ensure the longevity of the data a study holds. In this opinion paper, we introduce the concept of data curation debt—the data curation equivalent to the software engineering principle of technical debt. Using the context of UK cohort studies, we define data curation debt—describing examples and their potential impact. We highlight that accruing this debt can make it more difficult to use the data in the future. Additionally, the long-running nature of cohort studies means that interest is accrued on this debt and compounded over time—increasing the impact a debt could have on a study and its stakeholders. Primary causes of data curation debt are discussed across three categories: longevity of hardware, software and data formats; funding; and skills shortages. Based on cross-domain best practice, strategies to reduce the debt and preventive measures are proposed—with importance given to the recognition and transparent reporting of data curation debt. Describing the debt in this way, we encapsulate a multi-faceted issue in simple terms understandable by all cohort study stakeholders. Data curation debt is not only confined to the UK, but is an issue the international community must be aware of and address. This paper aims to stimulate a discussion between cohort studies and their stakeholders on how to address the issue of data curation debt. If data curation debt is left unchecked it could become impossible to use highly valued cohort study data, and ultimately represents an existential risk to studies themselves.
Collapse
Affiliation(s)
- Oliver W Butters
- Population Health Sciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK.,Department of Public Health, Policy and Systems, University of Liverpool, UK
| | - Rebecca C Wilson
- Population Health Sciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK.,Department of Public Health, Policy and Systems, University of Liverpool, UK
| | - Paul R Burton
- Population Health Sciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
| |
Collapse
|
2
|
Abstract
Cohort studies collect, generate and distribute data over long periods of time - often over the lifecourse of their participants. It is common for these studies to host a list of publications (which can number many thousands) on their website to demonstrate the impact of the study and facilitate the search of existing research to which the study data has contributed. The ability to search and explore these publication lists varies greatly between studies. We believe a lack of rich search and exploration functionality is a barrier to entry for new or prospective users of a study's data, since it may be difficult to find and evaluate previous work in a given area. These lists of publications are also typically manually curated, resulting in a lack of rich metadata to analyse, making bibliometric analysis difficult. We present here a software pipeline that aggregates metadata from a variety of third-party providers to power a web based search and exploration tool for lists of publications. Alongside core publication metadata (i.e. author lists, keywords etc.), we include geocoding of first authors and citations in our pipeline. This allows a characterisation of a study as a whole based on common locations of authors, frequency of keywords, citation profile etc. This enriched publications metadata can be useful for generating project impact metrics and web-based graphics useful for public dissemination. In addition, the pipeline produces a research data set for bibliometric analysis or social studies of science.
Collapse
Affiliation(s)
- Oliver W. Butters
- Department of Public Health, Policy and Systems, University of Liverpool, Liverpool, UK
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
- Social and Community Medicine, University of Bristol, Bristol, UK
| | - Rebecca C. Wilson
- Department of Public Health, Policy and Systems, University of Liverpool, Liverpool, UK
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
- Social and Community Medicine, University of Bristol, Bristol, UK
| | - Hugh Garner
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
- Social and Community Medicine, University of Bristol, Bristol, UK
| | - Thomas W. Y. Burton
- Social and Community Medicine, University of Bristol, Bristol, UK
- Department of Computer Science, University of Oxford, Oxford, UK
| |
Collapse
|
3
|
Butters OW, Wilson RC, Garner H, Burton TWY. PUblications Metadata Augmentation (PUMA) pipeline. F1000Res 2020; 9:1095. [PMID: 34026049 PMCID: PMC8108552 DOI: 10.12688/f1000research.25484.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/01/2021] [Indexed: 11/20/2022] Open
Abstract
Cohort studies collect, generate and distribute data over long periods of time - often over the lifecourse of their participants. It is common for these studies to host a list of publications (which can number many thousands) on their website to demonstrate the impact of the study and facilitate the search of existing research to which the study data has contributed. The ability to search and explore these publication lists varies greatly between studies. We believe a lack of rich search and exploration functionality of study publications is a barrier to entry for new or prospective users of a study's data, since it may be difficult to find and evaluate previous work in a given area. These lists of publications are also typically manually curated, resulting in a lack of rich metadata to analyse, making bibliometric analysis difficult. We present here a software pipeline that aggregates metadata from a variety of third-party providers to power a web based search and exploration tool for lists of publications. Alongside core publication metadata (i.e. author lists, keywords etc.), we include geocoding of first authors and citation counts in our pipeline. This allows a characterisation of a study as a whole based on common locations of authors, frequency of keywords, citation profile etc. This enriched publications metadata can be useful for generating study impact metrics and web-based graphics for public dissemination. In addition, the pipeline produces a research data set for bibliometric analysis or social studies of science. We use a previously published list of publications from a cohort study as an exemplar input data set to show the output and utility of the pipeline here.
Collapse
Affiliation(s)
- Oliver W. Butters
- Department of Public Health, Policy and Systems, University of Liverpool, Liverpool, UK
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
- Social and Community Medicine, University of Bristol, Bristol, UK
| | - Rebecca C. Wilson
- Department of Public Health, Policy and Systems, University of Liverpool, Liverpool, UK
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
- Social and Community Medicine, University of Bristol, Bristol, UK
| | - Hugh Garner
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
- Social and Community Medicine, University of Bristol, Bristol, UK
| | - Thomas W. Y. Burton
- Social and Community Medicine, University of Bristol, Bristol, UK
- Department of Computer Science, University of Oxford, Oxford, UK
| |
Collapse
|
4
|
Murtagh MJ, Minion JT, Turner A, Wilson RC, Blell M, Ochieng C, Murtagh B, Roberts S, Butters OW, Burton PR. The ECOUTER methodology for stakeholder engagement in translational research. BMC Med Ethics 2017; 18:24. [PMID: 28376776 PMCID: PMC5379503 DOI: 10.1186/s12910-017-0167-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2016] [Accepted: 01/08/2017] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Because no single person or group holds knowledge about all aspects of research, mechanisms are needed to support knowledge exchange and engagement. Expertise in the research setting necessarily includes scientific and methodological expertise, but also expertise gained through the experience of participating in research and/or being a recipient of research outcomes (as a patient or member of the public). Engagement is, by its nature, reciprocal and relational: the process of engaging research participants, patients, citizens and others (the many 'publics' of engagement) brings them closer to the research but also brings the research closer to them. When translating research into practice, engaging the public and other stakeholders is explicitly intended to make the outcomes of translation relevant to its constituency of users. METHODS In practice, engagement faces numerous challenges and is often time-consuming, expensive and 'thorny' work. We explore the epistemic and ontological considerations and implications of four common critiques of engagement methodologies that contest: representativeness, communication and articulation, impacts and outcome, and democracy. The ECOUTER (Employing COnceptUal schema for policy and Translation Engagement in Research) methodology addresses problems of representation and epistemic foundationalism using a methodology that asks, "How could it be otherwise?" ECOUTER affords the possibility of engagement where spatial and temporal constraints are present, relying on saturation as a method of 'keeping open' the possible considerations that might emerge and including reflexive use of qualitative analytic methods. RESULTS This paper describes the ECOUTER process, focusing on one worked example and detailing lessons learned from four other pilots. ECOUTER uses mind-mapping techniques to 'open up' engagement, iteratively and organically. ECOUTER aims to balance the breadth, accessibility and user-determination of the scope of engagement. An ECOUTER exercise comprises four stages: (1) engagement and knowledge exchange; (2) analysis of mindmap contributions; (3) development of a conceptual schema (i.e. a map of concepts and their relationship); and (4) feedback, refinement and development of recommendations. CONCLUSION ECOUTER refuses fixed truths but also refuses a fixed nature. Its promise lies in its flexibility, adaptability and openness. ECOUTER will be formed and re-formed by the needs and creativity of those who use it.
Collapse
Affiliation(s)
- Madeleine J. Murtagh
- Data2Knowledge (D2K) Research Group, School of Social and Community Medicine, University of Bristol, Bristol, UK
- Centre for Policy, Ethics and Life Sciences (PEALS), Newcastle University, Newcastle, UK
| | - Joel T. Minion
- Data2Knowledge (D2K) Research Group, School of Social and Community Medicine, University of Bristol, Bristol, UK
| | - Andrew Turner
- Data2Knowledge (D2K) Research Group, School of Social and Community Medicine, University of Bristol, Bristol, UK
| | - Rebecca C. Wilson
- Data2Knowledge (D2K) Research Group, School of Social and Community Medicine, University of Bristol, Bristol, UK
| | - Mwenza Blell
- Department of Sociology, University of Cambridge, Cambridge, UK
| | - Cynthia Ochieng
- Data2Knowledge (D2K) Research Group, School of Social and Community Medicine, University of Bristol, Bristol, UK
| | - Barnaby Murtagh
- Data2Knowledge (D2K) Research Group, School of Social and Community Medicine, University of Bristol, Bristol, UK
- Urban Cow Productions, London, UK
| | - Stephanie Roberts
- Data2Knowledge (D2K) Research Group, School of Social and Community Medicine, University of Bristol, Bristol, UK
| | - Oliver W. Butters
- Data2Knowledge (D2K) Research Group, School of Social and Community Medicine, University of Bristol, Bristol, UK
| | - Paul R Burton
- Data2Knowledge (D2K) Research Group, School of Social and Community Medicine, University of Bristol, Bristol, UK
| |
Collapse
|
5
|
Abstract
ECOUTER (
Employing
COncept
ual schema for policy and
Translation
E in
Research – French for ‘to listen’ – is a new stakeholder engagement method incorporating existing evidence to help participants draw upon their own knowledge of cognate issues and interact on a topic of shared concern. The results of an ECOUTER can form the basis of recommendations for research, governance, practice and/or policy. This paper describes the development of a digital methodology for the ECOUTER engagement process based on currently available mind mapping freeware software. The implementation of an ECOUTER process tailored to applications within health studies are outlined for both online and face-to-face scenarios. Limitations of the present digital methodology are discussed, highlighting the requirement of a purpose built software for ECOUTER research purposes.
Collapse
Affiliation(s)
- Rebecca C Wilson
- Data 2 Knowledge Research Group, School of Social and Community Medicine, University of Bristol, Bristol, BS8 2BN, UK
| | - Oliver W Butters
- Data 2 Knowledge Research Group, School of Social and Community Medicine, University of Bristol, Bristol, BS8 2BN, UK
| | - Tom Clark
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, BS8 2BN, UK
| | - Joel Minion
- Data 2 Knowledge Research Group, School of Social and Community Medicine, University of Bristol, Bristol, BS8 2BN, UK
| | - Andrew Turner
- Data 2 Knowledge Research Group, School of Social and Community Medicine, University of Bristol, Bristol, BS8 2BN, UK
| | - Madeleine J Murtagh
- Data 2 Knowledge Research Group, School of Social and Community Medicine, University of Bristol, Bristol, BS8 2BN, UK
| |
Collapse
|