1
|
Tecuatl C, Ljungquist B, Ascoli GA. Accelerating the continuous community sharing of digital neuromorphology data. FASEB Bioadv 2024; 6:207-221. [PMID: 38974113 PMCID: PMC11226999 DOI: 10.1096/fba.2024-00048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 05/28/2024] [Accepted: 06/03/2024] [Indexed: 07/09/2024] Open
Abstract
The tree-like morphology of neurons and glia is a key cellular determinant of circuit connectivity and metabolic function in the nervous system of essentially all animals. To elucidate the contribution of specific cell types to both physiological and pathological brain states, it is important to access detailed neuroanatomy data for quantitative analysis and computational modeling. NeuroMorpho.Org is the largest online collection of freely available digital neural reconstructions and related metadata and is continuously updated with new uploads. Earlier in the project, we released multiple datasets together yearly, but this process caused an average delay of several months in making the data public. Moreover, in the past 5 years, >80% of invited authors agreed to share their data with the community via NeuroMorpho.Org, up from <20% in the first 5 years of the project. In the same period, the average number of reconstructions per publication increased 600%, creating the need for automatic processing to release more reconstructions in less time. The progressive automation of our pipeline enabled the transition to agile releases of individual datasets as soon as they are ready. The overall time from data identification to public sharing decreased by 63.7%; 78% of the datasets are now released in less than 3 months with an average workflow duration below 40 days. Furthermore, the mean processing time per reconstruction dropped from 3 h to 2 min. With these continuous improvements, NeuroMorpho.Org strives to forge a positive culture of open data. Most importantly, the new, original research enabled through reuse of datasets across the world has a multiplicative effect on science discovery, benefiting both authors and users.
Collapse
Affiliation(s)
- Carolina Tecuatl
- Bioengineering Department and Center for Neural Informatics, Structures and Plasticity, College of Engineering and ComputingGeorge Mason UniversityFairfaxVirginiaUSA
| | - Bengt Ljungquist
- Bioengineering Department and Center for Neural Informatics, Structures and Plasticity, College of Engineering and ComputingGeorge Mason UniversityFairfaxVirginiaUSA
| | - Giorgio A. Ascoli
- Bioengineering Department and Center for Neural Informatics, Structures and Plasticity, College of Engineering and ComputingGeorge Mason UniversityFairfaxVirginiaUSA
- Interdisciplinary Program in Neuroscience, College of ScienceGeorge Mason UniversityFairfaxVirginiaUSA
| |
Collapse
|
2
|
Tecuatl C, Ljungquist B, Ascoli GA. Accelerating the continuous community sharing of digital neuromorphology data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.15.585306. [PMID: 38562736 PMCID: PMC10983892 DOI: 10.1101/2024.03.15.585306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
The tree-like morphology of neurons and glia is a key cellular determinant of circuit connectivity and metabolic function in the nervous system of essentially all animals. To elucidate the contribution of specific cell types to both physiological and pathological brain states, it is important to access detailed neuroanatomy data for quantitative analysis and computational modeling. NeuroMorpho.Org is the largest online collection of freely available digital neural reconstructions and related metadata and is continuously updated with new uploads. Earlier in the project, we released multiple datasets together yearly, but this process caused an average delay of several months in making the data public. Moreover, in the past 5 years, >80% of invited authors agreed to share their data with the community via NeuroMorpho.Org, up from <20% in the first 5 years of the project. In the same period, the average number of reconstructions per publication increased 600%, creating the need for automatic processing to release more reconstructions in less time. The progressive automation of our pipeline enabled the transition to agile releases of individual datasets as soon as they are ready. The overall time from data identification to public sharing decreased by 63.7%; 78% of the datasets are now released in less than 3 months with an average workflow duration below 40 days. Furthermore, the mean processing time per reconstruction dropped from 3 hours to 2 minutes. With these continuous improvements, NeuroMorpho.Org strives to forge a positive culture of open data. Most importantly, the new, original research enabled through reuse of datasets across the world has a multiplicative effect on science discovery, benefiting both authors and users.
Collapse
Affiliation(s)
- Carolina Tecuatl
- Bioengineering Department and Center for Neural Informatics, Structures, & Plasticity; College of Engineering and Computing; George Mason University, Fairfax, VA, USA
| | - Bengt Ljungquist
- Bioengineering Department and Center for Neural Informatics, Structures, & Plasticity; College of Engineering and Computing; George Mason University, Fairfax, VA, USA
| | - Giorgio A. Ascoli
- Bioengineering Department and Center for Neural Informatics, Structures, & Plasticity; College of Engineering and Computing; George Mason University, Fairfax, VA, USA
- Interdisciplinary Program in Neuroscience; College of Science; George Mason University, Fairfax, VA, USA
| |
Collapse
|
3
|
Van Horn JD. Editorial: On the Economics of Neuroscientific Data Sharing. Neuroinformatics 2024; 22:1-4. [PMID: 37966621 DOI: 10.1007/s12021-023-09649-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Affiliation(s)
- John Darrell Van Horn
- Department of Psychology, University of Virginia, Charlottesville, VA, USA.
- School of Data Science, University of Virginia, Charlottesville, VA, USA.
| |
Collapse
|
4
|
Emanuele E, Minoretti P. Measuring the Impact of Data Sharing: From Author-Level Metrics to Quantification of Economic and Non-tangible Benefits. Cureus 2023; 15:e50308. [PMID: 38205488 PMCID: PMC10777335 DOI: 10.7759/cureus.50308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/10/2023] [Indexed: 01/12/2024] Open
Abstract
In early 2023, the National Institutes of Health (NIH) implemented its Data Management and Sharing (DMS) Policy, requiring researchers to share scientific data produced with NIH funding. The policy's objective is to amplify the benefits of public investment in research by promoting the dissemination and reusability of primary data. Given this backdrop, identifying a robust methodology to assess the impact of data sharing across diverse research domains is essential. In this review, we adopted two methodological paradigms, the bottom-up and top-down strategies, and employed content analysis to pinpoint established methodologies and innovative practices within this intricate field. Although numerous author-level metrics are available to gauge the impact of data sharing, their application is still limited. Non-traditional metrics, encompassing economic (e.g., cost savings) and intangible benefits, presently appear to hold more potential for evaluating the impact of primary data sharing. Finally, we address the primary obstacles encountered by open data policies and introduce an innovative "Shared model for shared data" framework to bolster data sharing practices and refine evaluation metrics.
Collapse
|
5
|
Newton AJH, Chartash D, Kleinstein SH, McDougal RA. A pipeline for the retrieval and extraction of domain-specific information with application to COVID-19 immune signatures. BMC Bioinformatics 2023; 24:292. [PMID: 37474900 PMCID: PMC10357743 DOI: 10.1186/s12859-023-05397-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 06/23/2023] [Indexed: 07/22/2023] Open
Abstract
BACKGROUND The accelerating pace of biomedical publication has made it impractical to manually, systematically identify papers containing specific information and extract this information. This is especially challenging when the information itself resides beyond titles or abstracts. For emerging science, with a limited set of known papers of interest and an incomplete information model, this is of pressing concern. A timely example in retrospect is the identification of immune signatures (coherent sets of biomarkers) driving differential SARS-CoV-2 infection outcomes. IMPLEMENTATION We built a classifier to identify papers containing domain-specific information from the document embeddings of the title and abstract. To train this classifier with limited data, we developed an iterative process leveraging pre-trained SPECTER document embeddings, SVM classifiers and web-enabled expert review to iteratively augment the training set. This training set was then used to create a classifier to identify papers containing domain-specific information. Finally, information was extracted from these papers through a semi-automated system that directly solicited the paper authors to respond via a web-based form. RESULTS We demonstrate a classifier that retrieves papers with human COVID-19 immune signatures with a positive predictive value of 86%. The type of immune signature (e.g., gene expression vs. other types of profiling) was also identified with a positive predictive value of 74%. Semi-automated queries to the corresponding authors of these publications requesting signature information achieved a 31% response rate. CONCLUSIONS Our results demonstrate the efficacy of using a SVM classifier with document embeddings of the title and abstract, to retrieve papers with domain-specific information, even when that information is rarely present in the abstract. Targeted author engagement based on classifier predictions offers a promising pathway to build a semi-structured representation of such information. Through this approach, partially automated literature mining can help rapidly create semi-structured knowledge repositories for automatic analysis of emerging health threats.
Collapse
Affiliation(s)
- Adam J H Newton
- Department of Physiology and Pharmacology, SUNY Downstate Health Sciences University, Brooklyn, NY, 11203, USA
- Yale Center for Medical Informatics, Yale School of Medicine, Yale University, New Haven, CT, 06511, USA
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT, 06511, USA
- Department of Pathology, Yale School of Medicine, Yale University, New Haven, CT, 06511, USA
| | - David Chartash
- Yale Center for Medical Informatics, Yale School of Medicine, Yale University, New Haven, CT, 06511, USA
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT, 06511, USA
- School of Medicine, University College Dublin - National University of Ireland, Dublin, Co. Dublin, Republic of Ireland
| | - Steven H Kleinstein
- Department of Pathology, Yale School of Medicine, Yale University, New Haven, CT, 06511, USA
- Department of Immunobiology, Yale School of Medicine, Yale University, New Haven, CT, 06511, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06511, USA
| | - Robert A McDougal
- Yale Center for Medical Informatics, Yale School of Medicine, Yale University, New Haven, CT, 06511, USA.
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT, 06511, USA.
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06511, USA.
| |
Collapse
|
6
|
Anderson KR, Harris JA, Ng L, Prins P, Memar S, Ljungquist B, Fürth D, Williams RW, Ascoli GA, Dumitriu D. Highlights from the Era of Open Source Web-Based Tools. J Neurosci 2021; 41:927-936. [PMID: 33472826 PMCID: PMC7880282 DOI: 10.1523/jneurosci.1657-20.2020] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 11/22/2020] [Accepted: 11/29/2020] [Indexed: 12/20/2022] Open
Abstract
High digital connectivity and a focus on reproducibility are contributing to an open science revolution in neuroscience. Repositories and platforms have emerged across the whole spectrum of subdisciplines, paving the way for a paradigm shift in the way we share, analyze, and reuse vast amounts of data collected across many laboratories. Here, we describe how open access web-based tools are changing the landscape and culture of neuroscience, highlighting six free resources that span subdisciplines from behavior to whole-brain mapping, circuits, neurons, and gene variants.
Collapse
Affiliation(s)
- Kristin R Anderson
- Departments of Pediatrics and Psychiatry, Columbia University, New York, New York 10032
- Division of Developmental Psychobiology, New York State Psychiatric Institute, New York, New York 10032
- The Sackler Institute for Developmental Psychobiology, Columbia University, New York, New York 10032
- Columbia Population Research Center, Columbia University, New York, New York 10027
- Zuckerman Institute, Columbia University, New York, New York 10027
| | - Julie A Harris
- Allen Institute for Brain Science, Seattle, Washington 98109
| | - Lydia Ng
- Allen Institute for Brain Science, Seattle, Washington 98109
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, Center for Integrative and Translational Genomics, University of Tennessee Health Science Center, Memphis, Tennessee 38163
| | - Sara Memar
- Robarts Research Institute, BrainsCAN, Schulich School of Medicine & Dentistry, Western University, London, Ontario N6A 3K7, Canada
| | - Bengt Ljungquist
- Center for Neural Informatics, Structures, and Plasticity, Krasnow Institute for Advanced Study; and Department of Bioengineering, Volgenau School of Engineering, George Mason University, Fairfax, Virginia 22030
| | - Daniel Fürth
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, Center for Integrative and Translational Genomics, University of Tennessee Health Science Center, Memphis, Tennessee 38163
| | - Giorgio A Ascoli
- Center for Neural Informatics, Structures, and Plasticity, Krasnow Institute for Advanced Study; and Department of Bioengineering, Volgenau School of Engineering, George Mason University, Fairfax, Virginia 22030
| | - Dani Dumitriu
- Departments of Pediatrics and Psychiatry, Columbia University, New York, New York 10032
- Division of Developmental Psychobiology, New York State Psychiatric Institute, New York, New York 10032
- The Sackler Institute for Developmental Psychobiology, Columbia University, New York, New York 10032
- Columbia Population Research Center, Columbia University, New York, New York 10027
- Zuckerman Institute, Columbia University, New York, New York 10027
| |
Collapse
|