1
|
Hamzah N, Malim NHAH, Abdullah JM, Sumari P, Mokhtar AM, Rosli SNS, Ibrahim SAS, Idris Z. Big Brain Data Initiatives in Universiti Sains Malaysia: Data Stewardship to Data Repository and Data Sharing. Neuroinformatics 2023; 21:589-600. [PMID: 37344699 DOI: 10.1007/s12021-023-09637-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/30/2023] [Indexed: 06/23/2023]
Abstract
The sharing of open-access neuroimaging data has increased significantly during the last few years. Sharing neuroimaging data is crucial to accelerating scientific advancement, particularly in the field of neuroscience. A number of big initiatives that will increase the amount of available neuroimaging data are currently in development. The Big Brain Data Initiative project was started by Universiti Sains Malaysia as the first neuroimaging data repository platform in Malaysia for the purpose of data sharing. In order to ensure that the neuroimaging data in this project is accessible, usable, and secure, as well as to offer users high-quality data that can be consistently accessed, we first came up with good data stewardship practices. Then, we developed MyneuroDB, an online repository database system for data sharing purposes. Here, we describe the Big Brain Data Initiative and MyneuroDB, a data repository that provides the ability to openly share neuroimaging data, currently including magnetic resonance imaging (MRI), electroencephalography (EEG), and magnetoencephalography (MEG), following the FAIR principles for data sharing.
Collapse
Affiliation(s)
- Nurfaten Hamzah
- Department of Neurosciences, School of Medical Sciences, Health Campus, Universiti Sains Malaysia, 16150 Kubang Kerian, Kelantan, Malaysia
- Brain and Behaviour Cluster, School of Medical Sciences, Health Campus, Universiti Sains Malaysia, 16150 Kubang Kerian, Kelantan, Malaysia
| | | | - Jafri Malin Abdullah
- Department of Neurosciences, School of Medical Sciences, Health Campus, Universiti Sains Malaysia, 16150 Kubang Kerian, Kelantan, Malaysia
- Brain and Behaviour Cluster, School of Medical Sciences, Health Campus, Universiti Sains Malaysia, 16150 Kubang Kerian, Kelantan, Malaysia
- Department of Neurosciences & Brain Behaviour Cluster, Hospital Universiti Sains Malaysia, Health Campus, Universiti Sains Malaysia, 16150 Kubang Kerian, Kelantan, Malaysia
| | - Putra Sumari
- School of Computer Sciences, Universiti Sains Malaysia, 11800, Gelugor, Pulau Pinang, Malaysia
| | - Ariffin Marzuki Mokhtar
- Hospital Management System Unit, Hospital Universiti Sains Malaysia, Health Campus, Universiti Sains Malaysia, 16150 Kubang Kerian, Kelantan, Malaysia
| | - Siti Nur Syamila Rosli
- Department of Neurosciences, School of Medical Sciences, Health Campus, Universiti Sains Malaysia, 16150 Kubang Kerian, Kelantan, Malaysia
| | | | - Zamzuri Idris
- Department of Neurosciences, School of Medical Sciences, Health Campus, Universiti Sains Malaysia, 16150 Kubang Kerian, Kelantan, Malaysia
- Brain and Behaviour Cluster, School of Medical Sciences, Health Campus, Universiti Sains Malaysia, 16150 Kubang Kerian, Kelantan, Malaysia
| |
Collapse
|
2
|
Mittal D, Mease R, Kuner T, Flor H, Kuner R, Andoh J. Data management strategy for a collaborative research center. Gigascience 2022; 12:giad049. [PMID: 37401720 PMCID: PMC10318494 DOI: 10.1093/gigascience/giad049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 02/20/2023] [Accepted: 06/11/2023] [Indexed: 07/05/2023] Open
Abstract
The importance of effective research data management (RDM) strategies to support the generation of Findable, Accessible, Interoperable, and Reusable (FAIR) neuroscience data grows with each advance in data acquisition techniques and research methods. To maximize the impact of diverse research strategies, multidisciplinary, large-scale neuroscience research consortia face a number of unsolved challenges in RDM. While open science principles are largely accepted, it is practically difficult for researchers to prioritize RDM over other pressing demands. The implementation of a coherent, executable RDM plan for consortia spanning animal, human, and clinical studies is becoming increasingly challenging. Here, we present an RDM strategy implemented for the Heidelberg Collaborative Research Consortium. Our consortium combines basic and clinical research in diverse populations (animals and humans) and produces highly heterogeneous and multimodal research data (e.g., neurophysiology, neuroimaging, genetics, behavior). We present a concrete strategy for initiating early-stage RDM and FAIR data generation for large-scale collaborative research consortia, with a focus on sustainable solutions that incentivize incremental RDM while respecting research-specific requirements.
Collapse
Affiliation(s)
- Deepti Mittal
- Institute of Pharmacology, Heidelberg University, 69120 Heidelberg, Germany
| | - Rebecca Mease
- Institute of Physiology and Pathophysiology, Heidelberg University, 69120 Heidelberg, Germany
| | - Thomas Kuner
- Institute for Anatomy and Cell Biology, Heidelberg University, 69120 Mannheim, Germany
| | - Herta Flor
- Department of Cognitive and Clinical Neuroscience, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159 Mannheim, Germany
| | - Rohini Kuner
- Institute of Pharmacology, Heidelberg University, 69120 Heidelberg, Germany
| | - Jamila Andoh
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159 Mannheim, Germany
| |
Collapse
|
3
|
Hill DL, Stephenson D, Brayanov J, Claes K, Badawy R, Sardar S, Fisher K, Lee SJ, Bannon A, Roussos G, Kangarloo T, Terebaite V, Müller MLTM, Bhatnagar R, Adams JL, Dorsey ER, Cosman J. Metadata Framework to Support Deployment of Digital Health Technologies in Clinical Trials in Parkinson's Disease. SENSORS (BASEL, SWITZERLAND) 2022; 22:2136. [PMID: 35336307 PMCID: PMC8954603 DOI: 10.3390/s22062136] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 02/14/2022] [Accepted: 02/17/2022] [Indexed: 06/14/2023]
Abstract
Sensor data from digital health technologies (DHTs) used in clinical trials provides a valuable source of information, because of the possibility to combine datasets from different studies, to combine it with other data types, and to reuse it multiple times for various purposes. To date, there exist no standards for capturing or storing DHT biosensor data applicable across modalities and disease areas, and which can also capture the clinical trial and environment-specific aspects, so-called metadata. In this perspectives paper, we propose a metadata framework that divides the DHT metadata into metadata that is independent of the therapeutic area or clinical trial design (concept of interest and context of use), and metadata that is dependent on these factors. We demonstrate how this framework can be applied to data collected with different types of DHTs deployed in the WATCH-PD clinical study of Parkinson's disease. This framework provides a means to pre-specify and therefore standardize aspects of the use of DHTs, promoting comparability of DHTs across future studies.
Collapse
Affiliation(s)
- Derek L. Hill
- Panoramic Digital Health, 38000 Grenoble, France
- Centre for Medical Imaging, University College London (UCL), London WC1E 6BT, UK
| | - Diane Stephenson
- Critical Path Institute, Tucson, AZ 85718, USA; (D.S.); (S.S.); (M.L.T.M.M.); (R.B.)
| | - Jordan Brayanov
- Takeda Development Center Americas, Inc., Deerfield, IL 60015, USA; (J.B.); (T.K.)
| | | | - Reham Badawy
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, UK;
| | - Sakshi Sardar
- Critical Path Institute, Tucson, AZ 85718, USA; (D.S.); (S.S.); (M.L.T.M.M.); (R.B.)
| | | | | | | | - George Roussos
- Birkbeck College, University of London, London WC1E 7HX, UK;
| | - Tairmae Kangarloo
- Takeda Development Center Americas, Inc., Deerfield, IL 60015, USA; (J.B.); (T.K.)
| | | | | | - Roopal Bhatnagar
- Critical Path Institute, Tucson, AZ 85718, USA; (D.S.); (S.S.); (M.L.T.M.M.); (R.B.)
| | - Jamie L. Adams
- Department of Neurology, University of Rochester, Rochester, NY 14642, USA; (J.L.A.); (E.R.D.)
| | - E. Ray Dorsey
- Department of Neurology, University of Rochester, Rochester, NY 14642, USA; (J.L.A.); (E.R.D.)
| | - Josh Cosman
- AbbVie, North Chicago, IL 60064, USA; (A.B.); (J.C.)
| |
Collapse
|
4
|
Ježek P, Teeters JL, Sommer FT. NWB Query Engines: Tools to Search Data Stored in Neurodata Without Borders Format. Front Neuroinform 2020; 14:27. [PMID: 33041776 PMCID: PMC7526650 DOI: 10.3389/fninf.2020.00027] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Accepted: 05/20/2020] [Indexed: 11/19/2022] Open
Abstract
The Neurodata Without Borders (abbreviation NWB) format is a current technology for storing neurophysiology data along with the associated metadata. Data stored in the format is organized into separate HDF5 files, each file usually storing the data associated with a single recording session. While the NWB format provides a structured method for storing data, so far there have not been tools which enable searching a collection of NWB files in order to find data of interest for a particular purpose. We describe here three tools to enable searching NWB files. The tools have different features making each of them most useful for a particular task. The first tool, called the NWB Query Engine, is written in Java. It allows searching the complete content of NWB files. It was designed for the first version of NWB (NWB 1) and supports most (but not all) features of the most recent version (NWB 2). For some searches, it is the fastest tool. The second tool, called “search_nwb” is written in Python and also allow searching the complete contents of NWB files. It works with both NWB 1 and NWB 2, as does the third tool. The third tool, called “nwbindexer” enables searching a collection of NWB files using a two-step process. In the first step, a utility is run which creates an SQLite database containing the metadata in a collection of NWB files. This database is then searched in the second step, using another utility. Once the index is built, this two-step processes allows faster searches than are done by the other tools, but does not enable as complete of searches. All three tools use a simple query language which was developed for this project. Software integrating the three tools into a web-interface is provided which enables searching NWB files by submitting a web form.
Collapse
Affiliation(s)
- Petr Ježek
- Faculty of Applied Sciences, New Technologies for the Information Society, University of West Bohemia, Plzeň, Czechia
| | - Jeffery L Teeters
- Redwood Center for Theoretical Neuroscience & Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, United States
| | - Friedrich T Sommer
- Redwood Center for Theoretical Neuroscience & Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, United States
| |
Collapse
|
5
|
Tritt AJ, Rübel O, Dichter B, Ly R, Kang D, Chang EF, Frank LM, Bouchard K. HDMF: Hierarchical Data Modeling Framework for Modern Science Data Standards. PROCEEDINGS : ... IEEE INTERNATIONAL CONFERENCE ON BIG DATA. IEEE INTERNATIONAL CONFERENCE ON BIG DATA 2019; 2019:165-179. [PMID: 34632466 PMCID: PMC8500680 DOI: 10.1109/bigdata47090.2019.9005648] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
A ubiquitous problem in aggregating data across different experimental and observational data sources is a lack of software infrastructure that enables flexible and extensible standardization of data and metadata. To address this challenge, we developed HDMF, a hierarchical data modeling framework for modern science data standards. With HDMF, we separate the process of data standardization into three main components: (1) data modeling and specification, (2) data I/O and storage, and (3) data interaction and data APIs. To enable standards to support the complex requirements and varying use cases throughout the data life cycle, HDMF provides object mapping infrastructure to insulate and integrate these various components. This approach supports the flexible development of data standards and extensions, optimized storage backends, and data APIs, while allowing the other components of the data standards ecosystem to remain stable. To meet the demands of modern, large-scale science data, HDMF provides advanced data I/O functionality for iterative data write, lazy data load, and parallel I/O. It also supports optimization of data storage via support for chunking, compression, linking, and modular data storage. We demonstrate the application of HDMF in practice to design NWB 2.0 [13], a modern data standard for collaborative science across the neurophysiology community.
Collapse
Affiliation(s)
- Andrew J Tritt
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Oliver Rübel
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Benjamin Dichter
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Ryan Ly
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Donghe Kang
- Computer Science and Engineering, Ohio State University, Columbus, OH, USA
| | - Edward F Chang
- Department of Neurological Surgery and the Center for Integrative Neuroscience, University of California, San Francisco, San Francisco, CA, USA
| | - Loren M Frank
- Howard Hughes Medical Institute, Kavli Institute for Fundamental Neuroscience, Department of Physiology, University of California, San Francisco, San Francisco, CA
| | - Kristofer Bouchard
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| |
Collapse
|
6
|
Farrell B, Bengtson J. Scientist and data architect collaborate to curate and archive an inner ear electrophysiology data collection. PLoS One 2019; 14:e0223984. [PMID: 31626635 PMCID: PMC6799921 DOI: 10.1371/journal.pone.0223984] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 10/02/2019] [Indexed: 11/19/2022] Open
Abstract
In the past scientists reported summaries of their findings; they did not provide their original data collections. Many stakeholders (e.g., funding agencies) are now requesting that such data be made publicly available. This mandate is being adopted to facilitate further discovery, and to mitigate waste and deficits in the research process. At the same time, the necessary infrastructure for data curation (e.g., repositories) has been evolving. The current target is to make research products FAIR (Findable, Accessible, Interoperable, Reusable), resulting in data that are curated and archived to be both human and machine compatible. However, most scientists have little training in data curation. Specifically, they are ill-equipped to annotate their data collections at a level that facilitates discoverability, aggregation, and broad reuse in a context separate from their creation or sub-field. To circumvent these deficits data architects may collaborate with scientists to transform and curate data. This paper's example of a data collection describes the electrical properties of outer hair cells isolated from the mammalian cochlea. The data is expressed with a variant of The Ontology for Biomedical Investigations (OBI), mirrored to provide the metadata and nested data architecture used within the Hierarchical Data Format version 5 (HDF5) format. Each digital specimen is displayed in a tree configuration (like directories in a computer) and consists of six main branches based on the ontology classes. The data collections, scripts, and ontological OWL file (OBI based Inner Ear Electrophysiology (OBI_IEE)) are deposited in three repositories. We discuss the impediments to producing such data collections for public use, and the tools and processes required for effective implementation. This work illustrates the impact that small collaborations can have on the curation of our publicly-funded collections, and is particularly salient for fields where data is sparse, throughput is low, and sacrifice of animals is required for discovery.
Collapse
Affiliation(s)
- Brenda Farrell
- Bobby R Alford Department of Otolaryngology and Head & Neck Surgery, Baylor College of Medicine, Houston, Texas, United States of America
| | - Jason Bengtson
- K-State Libraries, Kansas State University, Manhattan, Kansas, United States of America
| |
Collapse
|
7
|
Impact of Multi-Sensor Technology for Enhancing Global Security in Closed Environments Using Cloud-Based Resources. JOURNAL OF SENSOR AND ACTUATOR NETWORKS 2019. [DOI: 10.3390/jsan8010004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
By nature, some jobs are always in closed environments and employees may stay for long periods. This is the case for many professional activities such as military watch tours of borders, civilian buildings and facilities that need efficient control processes. The role assigned to personnel in such environments is usually sensitive and of high importance, especially in terms of security and protection. With this in mind, we proposed in our research a novel approach using multi-sensor technology to monitor many safety and security parameters including the health status of indoor workers, such as those in watchtowers and at guard posts. In addition, the data gathered for those employees (heart rate, temperature, eye movement, human motion, etc.) combined with the room’s sensor data (temperature, oxygen ratio, toxic gases, air quality, etc.) were saved by appropriate cloud services, which ensured easy access to the data without ignoring the privacy protection aspect of such critical material. This information can be used later by specialists to monitor the evolution of the worker’s health status as well as its cost-effectiveness, which gives the possibility to improve productivity in the workplace and general employee health.
Collapse
|
8
|
Viswan NA, HarshaRani GV, Stefan MI, Bhalla US. FindSim: A Framework for Integrating Neuronal Data and Signaling Models. Front Neuroinform 2018; 12:38. [PMID: 29997492 PMCID: PMC6028806 DOI: 10.3389/fninf.2018.00038] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 06/05/2018] [Indexed: 12/30/2022] Open
Abstract
Current experiments touch only small but overlapping parts of very complex subcellular signaling networks in neurons. Even with modern optical reporters and pharmacological manipulations, a given experiment can only monitor and control a very small subset of the diverse, multiscale processes of neuronal signaling. We have developed FindSim (Framework for Integrating Neuronal Data and SIgnaling Models) to anchor models to structured experimental datasets. FindSim is a framework for integrating many individual electrophysiological and biochemical experiments with large, multiscale models so as to systematically refine and validate the model. We use a structured format for encoding the conditions of many standard physiological and pharmacological experiments, specifying which parts of the model are involved, and comparing experiment outcomes with model output. A database of such experiments is run against successive generations of composite cellular models to iteratively improve the model against each experiment, while retaining global model validity. We suggest that this toolchain provides a principled and scalable way to tackle model complexity and diversity of data sources.
Collapse
Affiliation(s)
- Nisha A Viswan
- National Centre for Biological Sciences, Bangalore, India.,Tata Institute of Fundamental Research, The University of Trans-Disciplinary Health Sciences and Technology, Bangalore, India
| | | | - Melanie I Stefan
- Centre for Discovery Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom.,ZJU-UoE Institute, Zhejiang University, Hangzhou, China
| | | |
Collapse
|