1
|
Schwedhelm C, Nimptsch K, Ahrens W, Hasselhorn HM, Jöckel KH, Katzke V, Kluttig A, Linkohr B, Mikolajczyk R, Nöthlings U, Perrar I, Peters A, Schmidt CO, Schmidt B, Schulze MB, Stang A, Zeeb H, Pischon T. Chronic disease outcome metadata from German observational studies - public availability and FAIR principles. Sci Data 2023; 10:868. [PMID: 38052810 PMCID: PMC10698176 DOI: 10.1038/s41597-023-02726-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 11/07/2023] [Indexed: 12/07/2023] Open
Abstract
Metadata from epidemiological studies, including chronic disease outcome metadata (CDOM), are important to be findable to allow interpretability and reusability. We propose a comprehensive metadata schema and used it to assess public availability and findability of CDOM from German population-based observational studies participating in the consortium National Research Data Infrastructure for Personal Health Data (NFDI4Health). Additionally, principal investigators from the included studies completed a checklist evaluating consistency with FAIR principles (Findability, Accessibility, Interoperability, Reusability) within their studies. Overall, six of sixteen studies had complete publicly available CDOM. The most frequent CDOM source was scientific publications and the most frequently missing metadata were availability of codes of the International Classification of Diseases, Tenth Revision (ICD-10). Principal investigators' main perceived barriers for consistency with FAIR principles were limited human and financial resources. Our results reveal that CDOM from German population-based studies have incomplete availability and limited findability. There is a need to make CDOM publicly available in searchable platforms or metadata catalogues to improve their FAIRness, which requires human and financial resources.
Collapse
Affiliation(s)
- Carolina Schwedhelm
- Molecular Epidemiology Research Group, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, 13125, Germany.
| | - Katharina Nimptsch
- Molecular Epidemiology Research Group, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, 13125, Germany
| | - Wolfgang Ahrens
- Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, 28359, Germany
- Institute of Statistics, Faculty of Mathematics and Computer Science, University of Bremen, Bremen, 28334, Germany
| | - Hans Martin Hasselhorn
- Department of Occupational Health Science, University of Wuppertal, Wuppertal, 42119, Germany
| | - Karl-Heinz Jöckel
- Institute for Medical Informatics, Biometry and Epidemiology, University Hospital of Essen, Essen, 45122, Germany
| | - Verena Katzke
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| | - Alexander Kluttig
- Institute of Medical Epidemiology, Biometrics, and Informatics, Interdisciplinary Center for Health Sciences, Medical Faculty of the Martin-Luther-University Halle-Wittenberg, Halle (Saale), 06112, Germany
| | - Birgit Linkohr
- Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, 85764, Germany
| | - Rafael Mikolajczyk
- Institute of Medical Epidemiology, Biometrics, and Informatics, Interdisciplinary Center for Health Sciences, Medical Faculty of the Martin-Luther-University Halle-Wittenberg, Halle (Saale), 06112, Germany
- DZPG (German Center for Mental Health), partner site Halle-Jena-Magdeburg, 07743, Jena, Germany
| | - Ute Nöthlings
- Institute of Nutrition and Food Sciences, Nutritional Epidemiology, University of Bonn, Bonn, 53115, Germany
| | - Ines Perrar
- Institute of Nutrition and Food Sciences, Nutritional Epidemiology, University of Bonn, Bonn, 53115, Germany
| | - Annette Peters
- Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, 85764, Germany
- Institute for Medical Information Processing, Biometry and Epidemiology, Department of Epidemiology, Medical Faculty of the Ludwig-Maximilians-Universität München, Munich, 81377, Germany
| | - Carsten O Schmidt
- Institute for Community Medicine, University Medicine Greifswald, Greifswald, 17489, Germany
| | - Börge Schmidt
- Institute for Medical Informatics, Biometry and Epidemiology, University Hospital of Essen, Essen, 45122, Germany
| | - Matthias B Schulze
- Department of Molecular Epidemiology, German Institute of Human Nutrition Potsdam Rehbruecke, Nuthetal, 14558, Germany
- Institute of Nutritional Science, University of Potsdam, Nuthetal, 14558, Germany
| | - Andreas Stang
- Institute for Medical Informatics, Biometry and Epidemiology, University Hospital of Essen, Essen, 45122, Germany
- Department of Epidemiology, School of Public Health, Boston University, Boston, MA, 02118, USA
| | - Hajo Zeeb
- Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, 28359, Germany
- Faculty 11 - Human and Health Sciences, University of Bremen, Bremen, 28359, Germany
| | - Tobias Pischon
- Molecular Epidemiology Research Group, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, 13125, Germany
- Biobank Technology Platform, Max-Delbrueck-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, 13125, Germany
- Core Facility Biobank, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, 13125, Germany
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, 10117, Germany
| |
Collapse
|
2
|
Framework and baseline examination of the German National Cohort (NAKO). Eur J Epidemiol 2022; 37:1107-1124. [PMID: 36260190 PMCID: PMC9581448 DOI: 10.1007/s10654-022-00890-5] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 06/14/2022] [Indexed: 11/25/2022]
Abstract
The German National Cohort (NAKO) is a multidisciplinary, population-based prospective cohort study that aims to investigate the causes of widespread diseases, identify risk factors and improve early detection and prevention of disease. Specifically, NAKO is designed to identify novel and better characterize established risk and protection factors for the development of cardiovascular diseases, cancer, diabetes, neurodegenerative and psychiatric diseases, musculoskeletal diseases, respiratory and infectious diseases in a random sample of the general population. Between 2014 and 2019, a total of 205,415 men and women aged 19–74 years were recruited and examined in 18 study centres in Germany. The baseline assessment included a face-to-face interview, self-administered questionnaires and a wide range of biomedical examinations. Biomaterials were collected from all participants including serum, EDTA plasma, buffy coats, RNA and erythrocytes, urine, saliva, nasal swabs and stool. In 56,971 participants, an intensified examination programme was implemented. Whole-body 3T magnetic resonance imaging was performed in 30,861 participants on dedicated scanners. NAKO collects follow-up information on incident diseases through a combination of active follow-up using self-report via written questionnaires at 2–3 year intervals and passive follow-up via record linkages. All study participants are invited for re-examinations at the study centres in 4–5 year intervals. Thereby, longitudinal information on changes in risk factor profiles and in vascular, cardiac, metabolic, neurocognitive, pulmonary and sensory function is collected. NAKO is a major resource for population-based epidemiology to identify new and tailored strategies for early detection, prediction, prevention and treatment of major diseases for the next 30 years.
Collapse
|
3
|
Wulms N, Redmann L, Herpertz C, Bonberg N, Berger K, Sundermann B, Minnerup H. The Effect of Training Sample Size on the Prediction of White Matter Hyperintensity Volume in a Healthy Population Using BIANCA. Front Aging Neurosci 2022; 13:720636. [PMID: 35126084 PMCID: PMC8812526 DOI: 10.3389/fnagi.2021.720636] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 11/29/2021] [Indexed: 12/01/2022] Open
Abstract
Introduction: White matter hyperintensities of presumed vascular origin (WMH) are an important magnetic resonance imaging marker of cerebral small vessel disease and are associated with cognitive decline, stroke, and mortality. Their relevance in healthy individuals, however, is less clear. This is partly due to the methodological challenge of accurately measuring rare and small WMH with automated segmentation programs. In this study, we tested whether WMH volumetry with FMRIB software library v6.0 (FSL; https://fsl.fmrib.ox.ac.uk/fsl/fslwiki) Brain Intensity AbNormality Classification Algorithm (BIANCA), a customizable and trainable algorithm that quantifies WMH volume based on individual data training sets, can be optimized for a normal aging population. Methods: We evaluated the effect of varying training sample sizes on the accuracy and the robustness of the predicted white matter hyperintensity volume in a population (n = 201) with a low prevalence of confluent WMH and a substantial proportion of participants without WMH. BIANCA was trained with seven different sample sizes between 10 and 40 with increments of 5. For each sample size, 100 random samples of T1w and FLAIR images were drawn and trained with manually delineated masks. For validation, we defined an internal and external validation set and compared the mean absolute error, resulting from the difference between manually delineated and predicted WMH volumes for each set. For spatial overlap, we calculated the Dice similarity index (SI) for the external validation cohort. Results: The study population had a median WMH volume of 0.34 ml (IQR of 1.6 ml) and included n = 28 (18%) participants without any WMH. The mean absolute error of the difference between BIANCA prediction and manually delineated masks was minimized and became more robust with an increasing number of training participants. The lowest mean absolute error of 0.05 ml (SD of 0.24 ml) was identified in the external validation set with a training sample size of 35. Compared to the volumetric overlap, the spatial overlap was poor with an average Dice similarity index of 0.14 (SD 0.16) in the external cohort, driven by subjects with very low lesion volumes. Discussion: We found that the performance of BIANCA, particularly the robustness of predictions, could be optimized for use in populations with a low WMH load by enlargement of the training sample size. Further work is needed to evaluate and potentially improve the prediction accuracy for low lesion volumes. These findings are important for current and future population-based studies with the majority of participants being normal aging people.
Collapse
Affiliation(s)
- Niklas Wulms
- Institute of Epidemiology and Social Medicine, University of Muenster, Muenster, Germany
- *Correspondence: Niklas Wulms
| | - Lea Redmann
- Institute of Epidemiology and Social Medicine, University of Muenster, Muenster, Germany
| | - Christine Herpertz
- Institute of Epidemiology and Social Medicine, University of Muenster, Muenster, Germany
| | - Nadine Bonberg
- Institute of Epidemiology and Social Medicine, University of Muenster, Muenster, Germany
| | - Klaus Berger
- Institute of Epidemiology and Social Medicine, University of Muenster, Muenster, Germany
| | - Benedikt Sundermann
- Clinic of Radiology, University Hospital Muenster, Muenster, Germany
- Institute of Radiology and Neuroradiology, Evangelisches Krankenhaus, Medical Campus, University of Oldenburg, Oldenburg, Germany
- Research Center Neurosensory Science, University of Oldenburg, Oldenburg, Germany
| | - Heike Minnerup
- Institute of Epidemiology and Social Medicine, University of Muenster, Muenster, Germany
| |
Collapse
|
4
|
Rendtel U, Liebig S, Meister R, Wagner GG, Zinn S. Die Erforschung der Dynamik der Corona-Pandemie in Deutschland: Survey-Konzepte und eine exemplarische Umsetzung mit dem Sozio-oekonomischen Panel (SOEP). ASTA WIRTSCHAFTS- UND SOZIALSTATISTISCHES ARCHIV 2021. [PMCID: PMC8655718 DOI: 10.1007/s11943-021-00296-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Die Weltgesundheitsorganisation (WHO) hat im Frühjahr 2020 Richtlinien für Bevölkerungsstichproben veröffentlicht, die Basisdaten für gesundheitspolitische Entscheidungen im Pandemiefall liefern können. Diese Richtlinien umzusetzen ist keineswegs trivial. In diesem Beitrag schildern wir die Herausforderungen einer entsprechenden statistischen Erfassung der Corona Pandemie. Hierbei gehen wir im ersten Teil auf die Erfassung der Dunkelziffer bei der Meldung von Corona Infektionen, die Messung von Krankheitsverläufen im außerklinischen Bereich, die Messung von Risikomerkmalen sowie die Erfassung von zeitlichen und regionalen Veränderungen der Pandemie-Intensität ein. Wir diskutieren verschiedene Möglichkeiten, aber auch praktische Grenzen der Survey-Statistik, den vielfältigen Herausforderungen durch eine geeignete Anlage der Stichprobe und des Survey-Designs zu begegnen. Ein zentraler Punkt ist die schwierige Koppelung medizinischer Tests mit bevölkerungsrepräsentativen Umfragen, wobei bei einer personalisierten Rückmeldung der Testergebnisse das Statistik-Geheimnis eine besondere Herausforderung darstellt. Im zweiten Teil berichten wir wie eine der großen Wiederholungsbefragungen in Deutschland, das Sozio-oekonomische Panel (SOEP), für eine WHO-konforme Covid-19-Erhebung genutzt wird, die im Rahmen einer Kooperation des Robert-Koch-Instituts (RKI) mit dem SOEP als „RKI-SOEP Stichprobe“ im September 2020 gestartet wurde. Erste Ergebnisse zum Rücklauf dieser Studie, die ab Oktober 2021 mit einer zweiten Erhebungswelle bei denselben Personen fortgesetzt werden wird, werden vorgestellt. Es zeigt sich, dass knapp fünf Prozent der bereits in der Vergangenheit erfolgreich Befragten aufgrund der Anfrage zwei Tests zu machen die weitere Teilnahme an der SOEP-Studie verweigern. Berücksichtigt man alle in der Studie erhobenen Informationen (IgG-Antikörper-Tests, PCR-Tests und Fragebögen) ergibt eine erste Schätzung, dass sich bis November 2020 nur etwa zwei Prozent der in Privathaushalten lebenden Erwachsenen in Deutschland mit SARS-CoV‑2 infiziert hatten. Damit war die Zahl der Infektionen etwa doppelt so hoch wie die offiziell gemeldeten Infektionszahlen.
Collapse
Affiliation(s)
| | - Stefan Liebig
- Freie Universität Berlin, Berlin, Deutschland
- Sozio-oekonomisches Panel (SOEP), Berlin, Deutschland
| | | | - Gert G. Wagner
- Sozio-oekonomisches Panel (SOEP), Berlin, Deutschland
- Max PIanck Institut für Bildungsforschung, Berlin, Deutschland
| | - Sabine Zinn
- Sozio-oekonomisches Panel (SOEP), Berlin, Deutschland
- Humboldt Universität, Berlin, Deutschland
| |
Collapse
|
5
|
Pigeot I, Kollhorst B, Didelez V. [Secondary Data for Pharmacoepidemiological Research - Making the Best of It!]. DAS GESUNDHEITSWESEN 2021; 83:S69-S76. [PMID: 34695869 DOI: 10.1055/a-1633-3827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Studies using secondary data such as health care claims data are often faced with methodological challenges due to the time-dependence of key quantities or unmeasured confounding. In the present paper, we discuss approaches to avoid or suitably address various sources of potential bias. In particular, we illustrate the target trial principle, marginal structural models, and instrumental variables with examples from the "GePaRD" database. Finally, we discuss the strengths and limitations of record linkage which can sometimes be used to supply missing information.
Collapse
Affiliation(s)
- Iris Pigeot
- Leibniz-Institut für Präventionsforschung und Epidemiologie - BIPS, Abteilung Biometrie und EDV, Bremen, Deutschland.,Fachbereich Mathematik und Informatik, Universität Bremen, Bremen, Deutschland
| | - Bianca Kollhorst
- Leibniz-Institut für Präventionsforschung und Epidemiologie - BIPS, Abteilung Biometrie und EDV, Bremen, Deutschland
| | - Vanessa Didelez
- Leibniz-Institut für Präventionsforschung und Epidemiologie - BIPS, Abteilung Biometrie und EDV, Bremen, Deutschland.,Fachbereich Mathematik und Informatik, Universität Bremen, Bremen, Deutschland
| |
Collapse
|