1
|
Davis HE, McCorkell L, Vogel JM, Topol EJ. Long COVID: major findings, mechanisms and recommendations. Nat Rev Microbiol 2023; 21:133-146. [PMID: 36639608 PMCID: PMC9839201 DOI: 10.1038/s41579-022-00846-2] [Citation(s) in RCA: 2062] [Impact Index Per Article: 1031.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/05/2022] [Indexed: 01/15/2023]
Abstract
Long COVID is an often debilitating illness that occurs in at least 10% of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections. More than 200 symptoms have been identified with impacts on multiple organ systems. At least 65 million individuals worldwide are estimated to have long COVID, with cases increasing daily. Biomedical research has made substantial progress in identifying various pathophysiological changes and risk factors and in characterizing the illness; further, similarities with other viral-onset illnesses such as myalgic encephalomyelitis/chronic fatigue syndrome and postural orthostatic tachycardia syndrome have laid the groundwork for research in the field. In this Review, we explore the current literature and highlight key findings, the overlap with other conditions, the variable onset of symptoms, long COVID in children and the impact of vaccinations. Although these key findings are critical to understanding long COVID, current diagnostic and treatment options are insufficient, and clinical trials must be prioritized that address leading hypotheses. Additionally, to strengthen long COVID research, future studies must account for biases and SARS-CoV-2 testing issues, build on viral-onset research, be inclusive of marginalized populations and meaningfully engage patients throughout the research process.
Collapse
|
Review |
2 |
2062 |
2
|
Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, Uliano-Silva M, Chow W, Fungtammasan A, Kim J, Lee C, Ko BJ, Chaisson M, Gedman GL, Cantin LJ, Thibaud-Nissen F, Haggerty L, Bista I, Smith M, Haase B, Mountcastle J, Winkler S, Paez S, Howard J, Vernes SC, Lama TM, Grutzner F, Warren WC, Balakrishnan CN, Burt D, George JM, Biegler MT, Iorns D, Digby A, Eason D, Robertson B, Edwards T, Wilkinson M, Turner G, Meyer A, Kautt AF, Franchini P, Detrich HW, Svardal H, Wagner M, Naylor GJP, Pippel M, Malinsky M, Mooney M, Simbirsky M, Hannigan BT, Pesout T, Houck M, Misuraca A, Kingan SB, Hall R, Kronenberg Z, Sović I, Dunn C, Ning Z, Hastie A, Lee J, Selvaraj S, Green RE, Putnam NH, Gut I, Ghurye J, Garrison E, Sims Y, Collins J, Pelan S, Torrance J, Tracey A, Wood J, Dagnew RE, Guan D, London SE, Clayton DF, Mello CV, Friedrich SR, Lovell PV, Osipova E, Al-Ajli FO, Secomandi S, Kim H, Theofanopoulou C, Hiller M, Zhou Y, Harris RS, Makova KD, Medvedev P, Hoffman J, Masterson P, Clark K, Martin F, Howe K, Flicek P, Walenz BP, Kwak W, Clawson H, et alRhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, Uliano-Silva M, Chow W, Fungtammasan A, Kim J, Lee C, Ko BJ, Chaisson M, Gedman GL, Cantin LJ, Thibaud-Nissen F, Haggerty L, Bista I, Smith M, Haase B, Mountcastle J, Winkler S, Paez S, Howard J, Vernes SC, Lama TM, Grutzner F, Warren WC, Balakrishnan CN, Burt D, George JM, Biegler MT, Iorns D, Digby A, Eason D, Robertson B, Edwards T, Wilkinson M, Turner G, Meyer A, Kautt AF, Franchini P, Detrich HW, Svardal H, Wagner M, Naylor GJP, Pippel M, Malinsky M, Mooney M, Simbirsky M, Hannigan BT, Pesout T, Houck M, Misuraca A, Kingan SB, Hall R, Kronenberg Z, Sović I, Dunn C, Ning Z, Hastie A, Lee J, Selvaraj S, Green RE, Putnam NH, Gut I, Ghurye J, Garrison E, Sims Y, Collins J, Pelan S, Torrance J, Tracey A, Wood J, Dagnew RE, Guan D, London SE, Clayton DF, Mello CV, Friedrich SR, Lovell PV, Osipova E, Al-Ajli FO, Secomandi S, Kim H, Theofanopoulou C, Hiller M, Zhou Y, Harris RS, Makova KD, Medvedev P, Hoffman J, Masterson P, Clark K, Martin F, Howe K, Flicek P, Walenz BP, Kwak W, Clawson H, Diekhans M, Nassar L, Paten B, Kraus RHS, Crawford AJ, Gilbert MTP, Zhang G, Venkatesh B, Murphy RW, Koepfli KP, Shapiro B, Johnson WE, Di Palma F, Marques-Bonet T, Teeling EC, Warnow T, Graves JM, Ryder OA, Haussler D, O'Brien SJ, Korlach J, Lewin HA, Howe K, Myers EW, Durbin R, Phillippy AM, Jarvis ED. Towards complete and error-free genome assemblies of all vertebrate species. Nature 2021; 592:737-746. [PMID: 33911273 PMCID: PMC8081667 DOI: 10.1038/s41586-021-03451-0] [Show More Authors] [Citation(s) in RCA: 1270] [Impact Index Per Article: 317.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 03/12/2021] [Indexed: 02/02/2023]
Abstract
High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1-4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
Collapse
|
Research Support, N.I.H., Extramural |
4 |
1270 |
3
|
Akdel M, Pires DEV, Pardo EP, Jänes J, Zalevsky AO, Mészáros B, Bryant P, Good LL, Laskowski RA, Pozzati G, Shenoy A, Zhu W, Kundrotas P, Serra VR, Rodrigues CHM, Dunham AS, Burke D, Borkakoti N, Velankar S, Frost A, Basquin J, Lindorff-Larsen K, Bateman A, Kajava AV, Valencia A, Ovchinnikov S, Durairaj J, Ascher DB, Thornton JM, Davey NE, Stein A, Elofsson A, Croll TI, Beltrao P. A structural biology community assessment of AlphaFold2 applications. Nat Struct Mol Biol 2022; 29:1056-1067. [PMID: 36344848 PMCID: PMC9663297 DOI: 10.1038/s41594-022-00849-w] [Citation(s) in RCA: 305] [Impact Index Per Article: 101.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 09/20/2022] [Indexed: 11/09/2022]
Abstract
Most proteins fold into 3D structures that determine how they function and orchestrate the biological processes of the cell. Recent developments in computational methods for protein structure predictions have reached the accuracy of experimentally determined models. Although this has been independently verified, the implementation of these methods across structural-biology applications remains to be tested. Here, we evaluate the use of AlphaFold2 (AF2) predictions in the study of characteristic structural elements; the impact of missense variants; function and ligand binding site predictions; modeling of interactions; and modeling of experimental structural data. For 11 proteomes, an average of 25% additional residues can be confidently modeled when compared with homology modeling, identifying structural features rarely seen in the Protein Data Bank. AF2-based predictions of protein disorder and complexes surpass dedicated tools, and AF2 models can be used across diverse applications equally well compared with experimentally determined structures, when the confidence metrics are critically considered. In summary, we find that these advances are likely to have a transformative impact in structural biology and broader life-science research.
Collapse
|
research-article |
3 |
305 |
4
|
Varoquaux G, Cheplygina V. Machine learning for medical imaging: methodological failures and recommendations for the future. NPJ Digit Med 2022; 5:48. [PMID: 35413988 PMCID: PMC9005663 DOI: 10.1038/s41746-022-00592-y] [Citation(s) in RCA: 210] [Impact Index Per Article: 70.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 03/09/2022] [Indexed: 12/23/2022] Open
Abstract
Research in computer analysis of medical images bears many promises to improve patients' health. However, a number of systematic challenges are slowing down the progress of the field, from limitations of the data, such as biases, to research incentives, such as optimizing for publication. In this paper we review roadblocks to developing and assessing methods. Building our analysis on evidence from the literature and data challenges, we show that at every step, potential biases can creep in. On a positive note, we also discuss on-going efforts to counteract these problems. Finally we provide recommendations on how to further address these problems in the future.
Collapse
|
Review |
3 |
210 |
5
|
Fernandez NF, Gundersen GW, Rahman A, Grimes ML, Rikova K, Hornbeck P, Ma’ayan A. Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data. Sci Data 2017; 4:170151. [PMID: 28994825 PMCID: PMC5634325 DOI: 10.1038/sdata.2017.151] [Citation(s) in RCA: 144] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Accepted: 09/06/2017] [Indexed: 01/11/2023] Open
Abstract
Most tools developed to visualize hierarchically clustered heatmaps generate static images. Clustergrammer is a web-based visualization tool with interactive features such as: zooming, panning, filtering, reordering, sharing, performing enrichment analysis, and providing dynamic gene annotations. Clustergrammer can be used to generate shareable interactive visualizations by uploading a data table to a web-site, or by embedding Clustergrammer in Jupyter Notebooks. The Clustergrammer core libraries can also be used as a toolkit by developers to generate visualizations within their own applications. Clustergrammer is demonstrated using gene expression data from the cancer cell line encyclopedia (CCLE), original post-translational modification data collected from lung cancer cells lines by a mass spectrometry approach, and original cytometry by time of flight (CyTOF) single-cell proteomics data from blood. Clustergrammer enables producing interactive web based visualizations for the analysis of diverse biological data.
Collapse
|
Dataset |
8 |
144 |
6
|
Young JY, Westbrook JD, Feng Z, Sala R, Peisach E, Oldfield TJ, Sen S, Gutmanas A, Armstrong DR, Berrisford JM, Chen L, Chen M, Di Costanzo L, Dimitropoulos D, Gao G, Ghosh S, Gore S, Guranovic V, Hendrickx PMS, Hudson BP, Igarashi R, Ikegawa Y, Kobayashi N, Lawson CL, Liang Y, Mading S, Mak L, Mir MS, Mukhopadhyay A, Patwardhan A, Persikova I, Rinaldi L, Sanz-Garcia E, Sekharan MR, Shao C, Swaminathan GJ, Tan L, Ulrich EL, van Ginkel G, Yamashita R, Yang H, Zhuravleva MA, Quesada M, Kleywegt GJ, Berman HM, Markley JL, Nakamura H, Velankar S, Burley SK. OneDep: Unified wwPDB System for Deposition, Biocuration, and Validation of Macromolecular Structures in the PDB Archive. Structure 2017; 25:536-545. [PMID: 28190782 DOI: 10.1016/j.str.2017.01.004] [Citation(s) in RCA: 116] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 11/08/2016] [Accepted: 01/10/2017] [Indexed: 10/20/2022]
Abstract
OneDep, a unified system for deposition, biocuration, and validation of experimentally determined structures of biological macromolecules to the PDB archive, has been developed as a global collaboration by the worldwide PDB (wwPDB) partners. This new system was designed to ensure that the wwPDB could meet the evolving archiving requirements of the scientific community over the coming decades. OneDep unifies deposition, biocuration, and validation pipelines across all wwPDB, EMDB, and BMRB deposition sites with improved focus on data quality and completeness in these archives, while supporting growth in the number of depositions and increases in their average size and complexity. In this paper, we describe the design, functional operation, and supporting infrastructure of the OneDep system, and provide initial performance assessments.
Collapse
|
Research Support, N.I.H., Extramural |
8 |
116 |
7
|
Winther KT, Hoffmann MJ, Boes JR, Mamun O, Bajdich M, Bligaard T. Catalysis-Hub.org, an open electronic structure database for surface reactions. Sci Data 2019; 6:75. [PMID: 31138816 PMCID: PMC6538711 DOI: 10.1038/s41597-019-0081-y] [Citation(s) in RCA: 105] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Accepted: 04/17/2019] [Indexed: 11/08/2022] Open
Abstract
We present a new open repository for chemical reactions on catalytic surfaces, available at https://www.catalysis-hub.org . The featured database for surface reactions contains more than 100,000 chemisorption and reaction energies obtained from electronic structure calculations, and is continuously being updated with new datasets. In addition to providing quantum-mechanical results for a broad range of reactions and surfaces from different publications, the database features a systematic, large-scale study of chemical adsorption and hydrogenation on bimetallic alloy surfaces. The database contains reaction specific information, such as the surface composition and reaction energy for each reaction, as well as the surface geometries and calculational parameters, essential for data reproducibility. By providing direct access via the web-interface as well as a Python API, we seek to accelerate the discovery of catalytic materials for sustainable energy applications by enabling researchers to efficiently use the data as a basis for new calculations and model generation.
Collapse
|
research-article |
6 |
105 |
8
|
Bradley VC, Kuriwaki S, Isakov M, Sejdinovic D, Meng XL, Flaxman S. Unrepresentative big surveys significantly overestimated US vaccine uptake. Nature 2021; 600:695-700. [PMID: 34880504 PMCID: PMC8653636 DOI: 10.1038/s41586-021-04198-4] [Citation(s) in RCA: 97] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 10/29/2021] [Indexed: 12/20/2022]
Abstract
Surveys are a crucial tool for understanding public opinion and behaviour, and their accuracy depends on maintaining statistical representativeness of their target populations by minimizing biases from all sources. Increasing data size shrinks confidence intervals but magnifies the effect of survey bias: an instance of the Big Data Paradox1. Here we demonstrate this paradox in estimates of first-dose COVID-19 vaccine uptake in US adults from 9 January to 19 May 2021 from two large surveys: Delphi-Facebook2,3 (about 250,000 responses per week) and Census Household Pulse4 (about 75,000 every two weeks). In May 2021, Delphi-Facebook overestimated uptake by 17 percentage points (14-20 percentage points with 5% benchmark imprecision) and Census Household Pulse by 14 (11-17 percentage points with 5% benchmark imprecision), compared to a retroactively updated benchmark the Centers for Disease Control and Prevention published on 26 May 2021. Moreover, their large sample sizes led to miniscule margins of error on the incorrect estimates. By contrast, an Axios-Ipsos online panel5 with about 1,000 responses per week following survey research best practices6 provided reliable estimates and uncertainty quantification. We decompose observed error using a recent analytic framework1 to explain the inaccuracy in the three surveys. We then analyse the implications for vaccine hesitancy and willingness. We show how a survey of 250,000 respondents can produce an estimate of the population mean that is no more accurate than an estimate from a simple random sample of size 10. Our central message is that data quality matters more than data quantity, and that compensating the former with the latter is a mathematically provable losing proposition.
Collapse
|
research-article |
4 |
97 |
9
|
Community evaluation of glycoproteomics informatics solutions reveals high-performance search strategies for serum glycopeptide analysis. Nat Methods 2021; 18:1304-1316. [PMID: 34725484 PMCID: PMC8566223 DOI: 10.1038/s41592-021-01309-x] [Citation(s) in RCA: 84] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 09/22/2021] [Indexed: 12/17/2022]
Abstract
Glycoproteomics is a powerful yet analytically challenging research tool. Software packages aiding the interpretation of complex glycopeptide tandem mass spectra have appeared, but their relative performance remains untested. Conducted through the HUPO Human Glycoproteomics Initiative, this community study, comprising both developers and users of glycoproteomics software, evaluates solutions for system-wide glycopeptide analysis. The same mass spectrometrybased glycoproteomics datasets from human serum were shared with participants and the relative team performance for N- and O-glycopeptide data analysis was comprehensively established by orthogonal performance tests. Although the results were variable, several high-performance glycoproteomics informatics strategies were identified. Deep analysis of the data revealed key performance-associated search parameters and led to recommendations for improved 'high-coverage' and 'high-accuracy' glycoproteomics search solutions. This study concludes that diverse software packages for comprehensive glycopeptide data analysis exist, points to several high-performance search strategies and specifies key variables that will guide future software developments and assist informatics decision-making in glycoproteomics.
Collapse
|
research-article |
4 |
84 |
10
|
Kaushal R, Hripcsak G, Ascheim DD, Bloom T, Campion TR, Caplan AL, Currie BP, Check T, Deland EL, Gourevitch MN, Hart R, Horowitz CR, Kastenbaum I, Levin AA, Low AFH, Meissner P, Mirhaji P, Pincus HA, Scaglione C, Shelley D, Tobin JN. Changing the research landscape: the New York City Clinical Data Research Network. J Am Med Inform Assoc 2014; 21:587-90. [PMID: 24821739 PMCID: PMC4078297 DOI: 10.1136/amiajnl-2014-002764] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
The New York City Clinical Data Research Network (NYC-CDRN), funded by the Patient-Centered Outcomes Research Institute (PCORI), brings together 22 organizations including seven independent health systems to enable patient-centered clinical research, support a national network, and facilitate learning healthcare systems. The NYC-CDRN includes a robust, collaborative governance and organizational infrastructure, which takes advantage of its participants' experience, expertise, and history of collaboration. The technical design will employ an information model to document and manage the collection and transformation of clinical data, local institutional staging areas to transform and validate data, a centralized data processing facility to aggregate and share data, and use of common standards and tools. We strive to ensure that our project is patient-centered; nurtures collaboration among all stakeholders; develops scalable solutions facilitating growth and connections; chooses simple, elegant solutions wherever possible; and explores ways to streamline the administrative and regulatory approval process across sites.
Collapse
|
Research Support, Non-U.S. Gov't |
11 |
73 |
11
|
Conroy MC, Lacey B, Bešević J, Omiyale W, Feng Q, Effingham M, Sellers J, Sheard S, Pancholi M, Gregory G, Busby J, Collins R, Allen NE. UK Biobank: a globally important resource for cancer research. Br J Cancer 2023; 128:519-527. [PMID: 36402876 PMCID: PMC9938115 DOI: 10.1038/s41416-022-02053-5] [Citation(s) in RCA: 64] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 10/26/2022] [Accepted: 10/27/2022] [Indexed: 11/21/2022] Open
Abstract
UK Biobank is a large-scale prospective study with deep phenotyping and genomic data. Its open-access policy allows researchers worldwide, from academia or industry, to perform health research in the public interest. Between 2006 and 2010, the study recruited 502,000 adults aged 40-69 years from the general population of the United Kingdom. At enrolment, participants provided information on a wide range of factors, physical measurements were taken, and biological samples (blood, urine and saliva) were collected for long-term storage. Participants have now been followed up for over a decade with more than 52,000 incident cancer cases recorded. The study continues to be enhanced with repeat assessments, web-based questionnaires, multi-modal imaging, and conversion of the stored biological samples to genomic and other '-omic' data. The study has already demonstrated its value in enabling research into the determinants of cancer, and future planned enhancements will make the resource even more valuable to cancer researchers. Over 26,000 researchers worldwide are currently using the data, performing a wide range of cancer research. UK Biobank is uniquely placed to transform our understanding of the causes of cancer development and progression, and drive improvements in cancer treatment and prevention over the coming decades.
Collapse
|
Review |
2 |
64 |
12
|
Cousijn H, Kenall A, Ganley E, Harrison M, Kernohan D, Lemberger T, Murphy F, Polischuk P, Taylor S, Martone M, Clark T. A data citation roadmap for scientific publishers. Sci Data 2018; 5:180259. [PMID: 30457573 PMCID: PMC6244190 DOI: 10.1038/sdata.2018.259] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Accepted: 10/04/2018] [Indexed: 12/04/2022] Open
Abstract
This article presents a practical roadmap for scholarly publishers to implement data citation in accordance with the Joint Declaration of Data Citation Principles (JDDCP), a synopsis and harmonization of the recommendations of major science policy bodies. It was developed by the Publishers Early Adopters Expert Group as part of the Data Citation Implementation Pilot (DCIP) project, an initiative of FORCE11.org and the NIH BioCADDIE program. The structure of the roadmap presented here follows the "life of a paper" workflow and includes the categories Pre-submission, Submission, Production, and Publication. The roadmap is intended to be publisher-agnostic so that all publishers can use this as a starting point when implementing JDDCP-compliant data citation. Authors reading this roadmap will also better know what to expect from publishers and how to enable their own data citations to gain maximum impact, as well as complying with what will become increasingly common funder mandates on data transparency.
Collapse
|
Research Support, N.I.H., Extramural |
7 |
58 |
13
|
Bakas S, Sako C, Akbari H, Bilello M, Sotiras A, Shukla G, Rudie JD, Santamaría NF, Kazerooni AF, Pati S, Rathore S, Mamourian E, Ha SM, Parker W, Doshi J, Baid U, Bergman M, Binder ZA, Verma R, Lustig RA, Desai AS, Bagley SJ, Mourelatos Z, Morrissette J, Watt CD, Brem S, Wolf RL, Melhem ER, Nasrallah MP, Mohan S, O'Rourke DM, Davatzikos C. The University of Pennsylvania glioblastoma (UPenn-GBM) cohort: advanced MRI, clinical, genomics, & radiomics. Sci Data 2022; 9:453. [PMID: 35906241 PMCID: PMC9338035 DOI: 10.1038/s41597-022-01560-7] [Citation(s) in RCA: 54] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 07/12/2022] [Indexed: 02/05/2023] Open
Abstract
Glioblastoma is the most common aggressive adult brain tumor. Numerous studies have reported results from either private institutional data or publicly available datasets. However, current public datasets are limited in terms of: a) number of subjects, b) lack of consistent acquisition protocol, c) data quality, or d) accompanying clinical, demographic, and molecular information. Toward alleviating these limitations, we contribute the "University of Pennsylvania Glioblastoma Imaging, Genomics, and Radiomics" (UPenn-GBM) dataset, which describes the currently largest publicly available comprehensive collection of 630 patients diagnosed with de novo glioblastoma. The UPenn-GBM dataset includes (a) advanced multi-parametric magnetic resonance imaging scans acquired during routine clinical practice, at the University of Pennsylvania Health System, (b) accompanying clinical, demographic, and molecular information, (d) perfusion and diffusion derivative volumes, (e) computationally-derived and manually-revised expert annotations of tumor sub-regions, as well as (f) quantitative imaging (also known as radiomic) features corresponding to each of these regions. This collection describes our contribution towards repeatable, reproducible, and comparative quantitative studies leading to new predictive, prognostic, and diagnostic assessments.
Collapse
|
Dataset |
3 |
54 |
14
|
Spicer RA, Salek R, Steinbeck C. Compliance with minimum information guidelines in public metabolomics repositories. Sci Data 2017; 4:170137. [PMID: 28949328 PMCID: PMC5613734 DOI: 10.1038/sdata.2017.137] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Accepted: 08/29/2017] [Indexed: 12/16/2022] Open
Abstract
The Metabolomics Standards Initiative (MSI) guidelines were first published in 2007. These guidelines provided reporting standards for all stages of metabolomics analysis: experimental design, biological context, chemical analysis and data processing. Since 2012, a series of public metabolomics databases and repositories, which accept the deposition of metabolomic datasets, have arisen. In this study, the compliance of 399 public data sets, from four major metabolomics data repositories, to the biological context MSI reporting standards was evaluated. None of the reporting standards were complied with in every publicly available study, although adherence rates varied greatly, from 0 to 97%. The plant minimum reporting standards were the most complied with and the microbial and in vitro were the least. Our results indicate the need for reassessment and revision of the existing MSI reporting standards.
Collapse
|
other |
8 |
52 |
15
|
Bull S, Cheah PY, Denny S, Jao I, Marsh V, Merson L, Shah More N, Nhan LNT, Osrin D, Tangseefa D, Wassenaar D, Parker M. Best Practices for Ethical Sharing of Individual-Level Health Research Data From Low- and Middle-Income Settings. J Empir Res Hum Res Ethics 2015; 10:302-13. [PMID: 26297751 PMCID: PMC4547207 DOI: 10.1177/1556264615594606] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Sharing individual-level data from clinical and public health research is increasingly being seen as a core requirement for effective and efficient biomedical research. This article discusses the results of a systematic review and multisite qualitative study of key stakeholders' perspectives on best practices in ethical data sharing in low- and middle-income settings. Our research suggests that for data sharing to be effective and sustainable, multiple social and ethical requirements need to be met. An effective model of data sharing will be one in which considered judgments will need to be made about how best to achieve scientific progress, minimize risks of harm, promote fairness and reciprocity, and build and sustain trust.
Collapse
|
Review |
10 |
52 |
16
|
Bull S, Roberts N, Parker M. Views of Ethical Best Practices in Sharing Individual-Level Data From Medical and Public Health Research: A Systematic Scoping Review. J Empir Res Hum Res Ethics 2015; 10:225-38. [PMID: 26297745 PMCID: PMC4548478 DOI: 10.1177/1556264615594767] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
There is increasing support for sharing individual-level data generated by medical and public health research. This scoping review of empirical research and conceptual literature examined stakeholders' perspectives of ethical best practices in data sharing, particularly in low- and middle-income settings. Sixty-nine empirical and conceptual articles were reviewed, of which, only five were empirical studies and eight were conceptual articles focusing on low- and middle-income settings. We conclude that support for sharing individual-level data is contingent on the development and implementation of international and local policies and processes to support ethical best practices. Further conceptual and empirical research is needed to ensure data sharing policies and processes in low- and middle-income settings are appropriately informed by stakeholders' perspectives.
Collapse
|
Scoping Review |
10 |
50 |
17
|
de Montjoye YA, Gambs S, Blondel V, Canright G, de Cordes N, Deletaille S, Engø-Monsen K, Garcia-Herranz M, Kendall J, Kerry C, Krings G, Letouzé E, Luengo-Oroz M, Oliver N, Rocher L, Rutherford A, Smoreda Z, Steele J, Wetter E, Pentland A“S, Bengtsson L. On the privacy-conscientious use of mobile phone data. Sci Data 2018; 5:180286. [PMID: 30532052 PMCID: PMC6289108 DOI: 10.1038/sdata.2018.286] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Accepted: 10/26/2018] [Indexed: 11/08/2022] Open
Abstract
The breadcrumbs we leave behind when using our mobile phones—who somebody calls, for how long, and from where—contain unprecedented insights about us and our societies. Researchers have compared the recent availability of large-scale behavioral datasets, such as the ones generated by mobile phones, to the invention of the microscope, giving rise to the new field of computational social science.
Collapse
|
Editorial |
7 |
49 |
18
|
Friedman J, Liu P, Troeger CE, Carter A, Reiner RC, Barber RM, Collins J, Lim SS, Pigott DM, Vos T, Hay SI, Murray CJL, Gakidou E. Predictive performance of international COVID-19 mortality forecasting models. Nat Commun 2021; 12:2609. [PMID: 33972512 PMCID: PMC8110547 DOI: 10.1038/s41467-021-22457-w] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 03/10/2021] [Indexed: 12/25/2022] Open
Abstract
Forecasts and alternative scenarios of COVID-19 mortality have been critical inputs for pandemic response efforts, and decision-makers need information about predictive performance. We screen n = 386 public COVID-19 forecasting models, identifying n = 7 that are global in scope and provide public, date-versioned forecasts. We examine their predictive performance for mortality by weeks of extrapolation, world region, and estimation month. We additionally assess prediction of the timing of peak daily mortality. Globally, models released in October show a median absolute percent error (MAPE) of 7 to 13% at six weeks, reflecting surprisingly good performance despite the complexities of modelling human behavioural responses and government interventions. Median absolute error for peak timing increased from 8 days at one week of forecasting to 29 days at eight weeks and is similar for first and subsequent peaks. The framework and public codebase ( https://github.com/pyliu47/covidcompare ) can be used to compare predictions and evaluate predictive performance going forward.
Collapse
|
Comparative Study |
4 |
47 |
19
|
Peirsman A, Blondeel E, Ahmed T, Anckaert J, Audenaert D, Boterberg T, Buzas K, Carragher N, Castellani G, Castro F, Dangles-Marie V, Dawson J, De Tullio P, De Vlieghere E, Dedeyne S, Depypere H, Diosdi A, Dmitriev RI, Dolznig H, Fischer S, Gespach C, Goossens V, Heino J, Hendrix A, Horvath P, Kunz-Schughart LA, Maes S, Mangodt C, Mestdagh P, Michlíková S, Oliveira MJ, Pampaloni F, Piccinini F, Pinheiro C, Rahn J, Robbins SM, Siljamäki E, Steigemann P, Sys G, Takayama S, Tesei A, Tulkens J, Van Waeyenberge M, Vandesompele J, Wagemans G, Weindorfer C, Yigit N, Zablowsky N, Zanoni M, Blondeel P, De Wever O. MISpheroID: a knowledgebase and transparency tool for minimum information in spheroid identity. Nat Methods 2021; 18:1294-1303. [PMID: 34725485 PMCID: PMC8566242 DOI: 10.1038/s41592-021-01291-4] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2020] [Accepted: 09/09/2021] [Indexed: 01/21/2023]
Abstract
Spheroids are three-dimensional cellular models with widespread basic and translational application across academia and industry. However, methodological transparency and guidelines for spheroid research have not yet been established. The MISpheroID Consortium developed a crowdsourcing knowledgebase that assembles the experimental parameters of 3,058 published spheroid-related experiments. Interrogation of this knowledgebase identified heterogeneity in the methodological setup of spheroids. Empirical evaluation and interlaboratory validation of selected variations in spheroid methodology revealed diverse impacts on spheroid metrics. To facilitate interpretation, stimulate transparency and increase awareness, the Consortium defines the MISpheroID string, a minimum set of experimental parameters required to report spheroid research. Thus, MISpheroID combines a valuable resource and a tool for three-dimensional cellular models to mine experimental parameters and to improve reproducibility.
Collapse
|
Research Support, N.I.H., Extramural |
4 |
47 |
20
|
Royer J, Rodríguez-Cruces R, Tavakol S, Larivière S, Herholz P, Li Q, Vos de Wael R, Paquola C, Benkarim O, Park BY, Lowe AJ, Margulies D, Smallwood J, Bernasconi A, Bernasconi N, Frauscher B, Bernhardt BC. An Open MRI Dataset For Multiscale Neuroscience. Sci Data 2022; 9:569. [PMID: 36109562 PMCID: PMC9477866 DOI: 10.1038/s41597-022-01682-y] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 08/24/2022] [Indexed: 12/17/2022] Open
Abstract
Multimodal neuroimaging grants a powerful window into the structure and function of the human brain at multiple scales. Recent methodological and conceptual advances have enabled investigations of the interplay between large-scale spatial trends (also referred to as gradients) in brain microstructure and connectivity, offering an integrative framework to study multiscale brain organization. Here, we share a multimodal MRI dataset for Microstructure-Informed Connectomics (MICA-MICs) acquired in 50 healthy adults (23 women; 29.54 ± 5.62 years) who underwent high-resolution T1-weighted MRI, myelin-sensitive quantitative T1 relaxometry, diffusion-weighted MRI, and resting-state functional MRI at 3 Tesla. In addition to raw anonymized MRI data, this release includes brain-wide connectomes derived from (i) resting-state functional imaging, (ii) diffusion tractography, (iii) microstructure covariance analysis, and (iv) geodesic cortical distance, gathered across multiple parcellation scales. Alongside, we share large-scale gradients estimated from each modality and parcellation scale. Our dataset will facilitate future research examining the coupling between brain microstructure, connectivity, and function. MICA-MICs is available on the Canadian Open Neuroscience Platform data portal ( https://portal.conp.ca ) and the Open Science Framework ( https://osf.io/j532r/ ).
Collapse
|
Dataset |
3 |
42 |
21
|
Maumet C, Auer T, Bowring A, Chen G, Das S, Flandin G, Ghosh S, Glatard T, Gorgolewski KJ, Helmer KG, Jenkinson M, Keator DB, Nichols BN, Poline JB, Reynolds R, Sochat V, Turner J, Nichols TE. Sharing brain mapping statistical results with the neuroimaging data model. Sci Data 2016; 3:160102. [PMID: 27922621 PMCID: PMC5139675 DOI: 10.1038/sdata.2016.102] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2016] [Accepted: 09/21/2016] [Indexed: 11/16/2022] Open
Abstract
Only a tiny fraction of the data and metadata produced by an fMRI study is finally conveyed to the community. This lack of transparency not only hinders the reproducibility of neuroimaging results but also impairs future meta-analyses. In this work we introduce NIDM-Results, a format specification providing a machine-readable description of neuroimaging statistical results along with key image data summarising the experiment. NIDM-Results provides a unified representation of mass univariate analyses including a level of detail consistent with available best practices. This standardized representation allows authors to relay methods and results in a platform-independent regularized format that is not tied to a particular neuroimaging software package. Tools are available to export NIDM-Result graphs and associated files from the widely used SPM and FSL software packages, and the NeuroVault repository can import NIDM-Results archives. The specification is publically available at: http://nidm.nidash.org/specs/nidm-results.html.
Collapse
|
Research Support, N.I.H., Extramural |
9 |
35 |
22
|
Hebart MN, Contier O, Teichmann L, Rockter AH, Zheng CY, Kidder A, Corriveau A, Vaziri-Pashkam M, Baker CI. THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior. eLife 2023; 12:e82580. [PMID: 36847339 PMCID: PMC10038662 DOI: 10.7554/elife.82580] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 02/25/2023] [Indexed: 03/01/2023] Open
Abstract
Understanding object representations requires a broad, comprehensive sampling of the objects in our visual world with dense measurements of brain activity and behavior. Here, we present THINGS-data, a multimodal collection of large-scale neuroimaging and behavioral datasets in humans, comprising densely sampled functional MRI and magnetoencephalographic recordings, as well as 4.70 million similarity judgments in response to thousands of photographic images for up to 1,854 object concepts. THINGS-data is unique in its breadth of richly annotated objects, allowing for testing countless hypotheses at scale while assessing the reproducibility of previous findings. Beyond the unique insights promised by each individual dataset, the multimodality of THINGS-data allows combining datasets for a much broader view into object processing than previously possible. Our analyses demonstrate the high quality of the datasets and provide five examples of hypothesis-driven and data-driven applications. THINGS-data constitutes the core public release of the THINGS initiative (https://things-initiative.org) for bridging the gap between disciplines and the advancement of cognitive neuroscience.
Collapse
|
Research Support, N.I.H., Intramural |
2 |
34 |
23
|
Röösli E, Bozkurt S, Hernandez-Boussard T. Peeking into a black box, the fairness and generalizability of a MIMIC-III benchmarking model. Sci Data 2022; 9:24. [PMID: 35075160 PMCID: PMC8786878 DOI: 10.1038/s41597-021-01110-7] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 12/10/2021] [Indexed: 11/13/2022] Open
Abstract
As artificial intelligence (AI) makes continuous progress to improve quality of care for some patients by leveraging ever increasing amounts of digital health data, others are left behind. Empirical evaluation studies are required to keep biased AI models from reinforcing systemic health disparities faced by minority populations through dangerous feedback loops. The aim of this study is to raise broad awareness of the pervasive challenges around bias and fairness in risk prediction models. We performed a case study on a MIMIC-trained benchmarking model using a broadly applicable fairness and generalizability assessment framework. While open-science benchmarks are crucial to overcome many study limitations today, this case study revealed a strong class imbalance problem as well as fairness concerns for Black and publicly insured ICU patients. Therefore, we advocate for the widespread use of comprehensive fairness and performance assessment frameworks to effectively monitor and validate benchmark pipelines built on open data resources.
Collapse
|
Research Support, N.I.H., Extramural |
3 |
32 |
24
|
Wang C, Martins-Bach AB, Alfaro-Almagro F, Douaud G, Klein JC, Llera A, Fiscone C, Bowtell R, Elliott LT, Smith SM, Tendler BC, Miller KL. Phenotypic and genetic associations of quantitative magnetic susceptibility in UK Biobank brain imaging. Nat Neurosci 2022; 25:818-831. [PMID: 35606419 PMCID: PMC9174052 DOI: 10.1038/s41593-022-01074-w] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Accepted: 04/11/2022] [Indexed: 12/17/2022]
Abstract
A key aim in epidemiological neuroscience is identification of markers to assess brain health and monitor therapeutic interventions. Quantitative susceptibility mapping (QSM) is an emerging magnetic resonance imaging technique that measures tissue magnetic susceptibility and has been shown to detect pathological changes in tissue iron, myelin and calcification. We present an open resource of QSM-based imaging measures of multiple brain structures in 35,273 individuals from the UK Biobank prospective epidemiological study. We identify statistically significant associations of 251 phenotypes with magnetic susceptibility that include body iron, disease, diet and alcohol consumption. Genome-wide associations relate magnetic susceptibility to 76 replicating clusters of genetic variants with biological functions involving iron, calcium, myelin and extracellular matrix. These patterns of associations include relationships that are unique to QSM, in particular being complementary to T2* signal decay time measures. These new imaging phenotypes are being integrated into the core UK Biobank measures provided to researchers worldwide, creating the potential to discover new, non-invasive markers of brain health.
Collapse
|
research-article |
3 |
31 |
25
|
Webb MA, Tangney JP. Too Good to Be True: Bots and Bad Data From Mechanical Turk. PERSPECTIVES ON PSYCHOLOGICAL SCIENCE 2024; 19:887-890. [PMID: 36343213 DOI: 10.1177/17456916221120027] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2024]
Abstract
Psychology is moving increasingly toward digital sources of data, with Amazon's Mechanical Turk (MTurk) at the forefront of that charge. In 2015, up to an estimated 45% of articles published in the top behavioral and social science journals included at least one study conducted on MTurk. In this article, I summarize my own experience with MTurk and how I deduced that my sample was-at best-only 2.6% valid, by my estimate. I share these results as a warning and call for caution. Recently, I conducted an online study via Amazon's MTurk, eager and excited to collect my own data for the first time as a doctoral student. What resulted has prompted me to write this as a warning: it is indeed too good to be true. This is a summary of how I determined that, at best, I had gathered valid data from 14 human beings-2.6% of my participant sample (N = 529).
Collapse
|
|
1 |
29 |