1
|
Lee KH, Denovellis EL, Ly R, Magland J, Soules J, Comrie AE, Gramling DP, Guidera JA, Nevers R, Adenekan P, Brozdowski C, Bray SR, Monroe E, Bak JH, Coulter ME, Sun X, Broyles E, Shin D, Chiang S, Holobetz C, Tritt A, Rübel O, Nguyen T, Yatsenko D, Chu J, Kemere C, Garcia S, Buccino A, Frank LM. Spyglass: a framework for reproducible and shareable neuroscience research. bioRxiv 2024:2024.01.25.577295. [PMID: 38328074 PMCID: PMC10849637 DOI: 10.1101/2024.01.25.577295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Scientific progress depends on reliable and reproducible results. Progress can also be accelerated when data are shared and re-analyzed to address new questions. Current approaches to storing and analyzing neural data typically involve bespoke formats and software that make replication, as well as the subsequent reuse of data, difficult if not impossible. To address these challenges, we created Spyglass, an open-source software framework that enables reproducible analyses and sharing of data and both intermediate and final results within and across labs. Spyglass uses the Neurodata Without Borders (NWB) standard and includes pipelines for several core analyses in neuroscience, including spectral filtering, spike sorting, pose tracking, and neural decoding. It can be easily extended to apply both existing and newly developed pipelines to datasets from multiple sources. We demonstrate these features in the context of a cross-laboratory replication by applying advanced state space decoding algorithms to publicly available data. New users can try out Spyglass on a Jupyter Hub hosted by HHMI and 2i2c: https://spyglass.hhmi.2i2c.cloud/.
Collapse
Affiliation(s)
- Kyu Hyun Lee
- Department of Physiology, University of California, San Francisco
- Howard Hughes Medical Institute, University of California, San Francisco
- Kavli Institute for Fundamental Neuroscience, University of California, San Francisco
| | - Eric L. Denovellis
- Department of Physiology, University of California, San Francisco
- Howard Hughes Medical Institute, University of California, San Francisco
- Kavli Institute for Fundamental Neuroscience, University of California, San Francisco
| | - Ryan Ly
- Scientific Data Division, Lawrence Berkeley National Laboratory
| | - Jeremy Magland
- Center for Computational Mathematics, Flatiron Institute
| | - Jeff Soules
- Center for Computational Mathematics, Flatiron Institute
| | - Alison E. Comrie
- Department of Physiology, University of California, San Francisco
- Kavli Institute for Fundamental Neuroscience, University of California, San Francisco
| | - Daniel P. Gramling
- Graudate Program in Neural and Behavioral Sciences, University of Tübingen
| | - Jennifer A. Guidera
- Department of Physiology, University of California, San Francisco
- Kavli Institute for Fundamental Neuroscience, University of California, San Francisco
- UCSF-UC Berkeley Graduate Program in Bioengineering, University of California, San Francisco
- Medical Scientist Training Program, University of California, San Francisco
| | - Rhino Nevers
- Department of Physiology, University of California, San Francisco
- Kavli Institute for Fundamental Neuroscience, University of California, San Francisco
| | - Philip Adenekan
- Department of Physiology, University of California, San Francisco
- Kavli Institute for Fundamental Neuroscience, University of California, San Francisco
| | - Chris Brozdowski
- Department of Physiology, University of California, San Francisco
- Kavli Institute for Fundamental Neuroscience, University of California, San Francisco
| | - Samuel R. Bray
- Department of Physiology, University of California, San Francisco
- Kavli Institute for Fundamental Neuroscience, University of California, San Francisco
| | - Emily Monroe
- Department of Physiology, University of California, San Francisco
| | - Ji Hyun Bak
- Department of Physiology, University of California, San Francisco
| | - Michael E. Coulter
- Department of Physiology, University of California, San Francisco
- Kavli Institute for Fundamental Neuroscience, University of California, San Francisco
| | - Xulu Sun
- Department of Physiology, University of California, San Francisco
- Howard Hughes Medical Institute, University of California, San Francisco
- Kavli Institute for Fundamental Neuroscience, University of California, San Francisco
| | - Emrey Broyles
- Department of Physiology, University of California, San Francisco
- Kavli Institute for Fundamental Neuroscience, University of California, San Francisco
| | - Donghoon Shin
- Department of Physiology, University of California, San Francisco
- Kavli Institute for Fundamental Neuroscience, University of California, San Francisco
- UCSF-UC Berkeley Graduate Program in Bioengineering, University of California, San Francisco
| | - Sharon Chiang
- Department of Neurology, University of California, San Francisco
| | | | - Andrew Tritt
- Scientific Data Division, Lawrence Berkeley National Laboratory
| | - Oliver Rübel
- Scientific Data Division, Lawrence Berkeley National Laboratory
| | | | | | - Joshua Chu
- Department of Electrical and Computer Engineering, Rice University
| | - Caleb Kemere
- Department of Electrical and Computer Engineering, Rice University
| | | | | | - Loren M. Frank
- Department of Physiology, University of California, San Francisco
- Howard Hughes Medical Institute, University of California, San Francisco
- Kavli Institute for Fundamental Neuroscience, University of California, San Francisco
| |
Collapse
|
2
|
Ly R, Avaylon M, Wulf M, Kepecs A, Rübel O. Structured behavioral data format: An NWB extension standard for task-based behavioral neuroscience experiments. bioRxiv 2024:2024.01.08.574597. [PMID: 38260593 PMCID: PMC10802442 DOI: 10.1101/2024.01.08.574597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Understanding brain function necessitates linking neural activity with corresponding behavior. Structured behavioral experiments are crucial for probing the neural computations and dynamics underlying behavior; however, adequately representing their complex data is a significant challenge. Currently, a comprehensive data standard that fully encapsulates task-based experiments, integrating neural activity with the richness of behavioral context, is lacking. We designed a data model, as an extension to the NWB neurophysiology data standard, to represent structured behavioral neuroscience experiments, spanning stimulus delivery, timestamped events and responses, and simultaneous neural recordings. This data format is validated through its application to a variety of experimental designs, showcasing its potential to advance integrative analyses of neural circuits and complex behaviors. This work introduces a comprehensive data standard designed to capture and store a spectrum of behavioral data, encapsulating the multifaceted nature of modern neuroscience experiments.
Collapse
|
3
|
Huerta EA, Blaiszik B, Brinson LC, Bouchard KE, Diaz D, Doglioni C, Duarte JM, Emani M, Foster I, Fox G, Harris P, Heinrich L, Jha S, Katz DS, Kindratenko V, Kirkpatrick CR, Lassila-Perini K, Madduri RK, Neubauer MS, Psomopoulos FE, Roy A, Rübel O, Zhao Z, Zhu R. FAIR for AI: An interdisciplinary and international community building perspective. Sci Data 2023; 10:487. [PMID: 37495591 PMCID: PMC10372139 DOI: 10.1038/s41597-023-02298-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 06/09/2023] [Indexed: 07/28/2023] Open
Affiliation(s)
- E A Huerta
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois, 60439, USA.
- Department of Computer Science, University of Chicago, Chicago, Illinois, 60637, USA.
| | - Ben Blaiszik
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois, 60439, USA
- Globus, University of Chicago, Chicago, Illinois, 60637, USA
| | - L Catherine Brinson
- Department of Mechanical Engineering and Materials Science, Duke University, Durham, North Carolina, 27708, USA
| | - Kristofer E Bouchard
- Scientific Data Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Biological Systems & Engineering, Lawrence Berkeley National Laboratory, Berkeley, California, 94720, USA
- Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, California, 94720, USA
| | - Daniel Diaz
- Department of Physics, University of California San Diego, La Jolla, California, 92093, USA
| | - Caterina Doglioni
- Lund University, Department of Physics, Box 118, 221 00, Lund, Sweden
- School of Physics & Astronomy, The University of Manchester, Manchester, M13 9PL, UK
| | - Javier M Duarte
- Department of Physics, University of California San Diego, La Jolla, California, 92093, USA
| | - Murali Emani
- Leadership Computing Facility, Argonne National Laboratory, Lemont, Illinois, 60439, USA
| | - Ian Foster
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois, 60439, USA
- Department of Computer Science, University of Chicago, Chicago, Illinois, 60637, USA
| | - Geoffrey Fox
- Biocomplexity Institute and Department of Computer Science, University of Virginia, Charlottesville, Virginia, 22904, USA
| | - Philip Harris
- Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts, 02139, USA
| | - Lukas Heinrich
- Technical University Munich, Arcisstraβe 21, 80333, München, Germany
| | - Shantenu Jha
- Computational Science Initiative Brookhaven National Laboratory Upton, New York, 11973, USA
- Electrical and Computer Engineering, Rutgers, The State University of New Jersey, Piscataway, New Jersey, 08854, USA
| | - Daniel S Katz
- National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign, Urbana, Illinois, 61801, USA
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, 61801, USA
- Department of Electrical & Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, 61801, USA
- School of Information Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, 61801, USA
| | - Volodymyr Kindratenko
- National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign, Urbana, Illinois, 61801, USA
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, 61801, USA
- Department of Electrical & Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, 61801, USA
| | - Christine R Kirkpatrick
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California, 92093, USA
| | - Kati Lassila-Perini
- Helsinki Institute of Physics, University of Helsinki, P.O. Box 64, Helsinki, 00014, Finland
| | - Ravi K Madduri
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois, 60439, USA
| | - Mark S Neubauer
- National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign, Urbana, Illinois, 61801, USA
- Department of Electrical & Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, 61801, USA
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois, 61801, USA
| | - Fotis E Psomopoulos
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, 57001, Greece
| | - Avik Roy
- National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign, Urbana, Illinois, 61801, USA
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois, 61801, USA
| | - Oliver Rübel
- Scientific Data Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Zhizhen Zhao
- National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign, Urbana, Illinois, 61801, USA
- Department of Electrical & Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, 61801, USA
| | - Ruike Zhu
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, 61801, USA
| |
Collapse
|
4
|
Rübel O, Tritt A, Ly R, Dichter BK, Ghosh S, Niu L, Baker P, Soltesz I, Ng L, Svoboda K, Frank L, Bouchard KE. The Neurodata Without Borders ecosystem for neurophysiological data science. eLife 2022; 11:e78362. [PMID: 36193886 PMCID: PMC9531949 DOI: 10.7554/elife.78362] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 05/13/2022] [Indexed: 01/21/2023] Open
Abstract
The neurophysiology of cells and tissues are monitored electrophysiologically and optically in diverse experiments and species, ranging from flies to humans. Understanding the brain requires integration of data across this diversity, and thus these data must be findable, accessible, interoperable, and reusable (FAIR). This requires a standard language for data and metadata that can coevolve with neuroscience. We describe design and implementation principles for a language for neurophysiology data. Our open-source software (Neurodata Without Borders, NWB) defines and modularizes the interdependent, yet separable, components of a data language. We demonstrate NWB's impact through unified description of neurophysiology data across diverse modalities and species. NWB exists in an ecosystem, which includes data management, analysis, visualization, and archive tools. Thus, the NWB data language enables reproduction, interchange, and reuse of diverse neurophysiology data. More broadly, the design principles of NWB are generally applicable to enhance discovery across biology through data FAIRness.
Collapse
Affiliation(s)
- Oliver Rübel
- Scientific Data Division, Lawrence Berkeley National LaboratoryBerkeleyUnited States
| | - Andrew Tritt
- Applied Mathematics and Computational Research Division, Lawrence Berkeley National LaboratoryBerkeleyUnited States
| | - Ryan Ly
- Scientific Data Division, Lawrence Berkeley National LaboratoryBerkeleyUnited States
| | | | - Satrajit Ghosh
- McGovern Institute for Brain Research, Massachusetts Institute of TechnologyCambridgeUnited States
- Department of Otolaryngology - Head and Neck Surgery, Harvard Medical SchoolBostonUnited States
| | | | - Pamela Baker
- Allen Institute for Brain ScienceSeattleUnited States
| | - Ivan Soltesz
- Department of Neurosurgery, Stanford UniversityStanfordUnited States
| | - Lydia Ng
- Allen Institute for Brain ScienceSeattleUnited States
| | - Karel Svoboda
- Allen Institute for Brain ScienceSeattleUnited States
- Janelia Research Campus, Howard Hughes Medical InstituteAshburnUnited States
| | - Loren Frank
- Janelia Research Campus, Howard Hughes Medical InstituteAshburnUnited States
- Kavli Institute for Fundamental NeuroscienceSan FranciscoUnited States
- Departments of Physiology and Psychiatry University of California, San FranciscoSan FranciscoUnited States
| | - Kristofer E Bouchard
- Scientific Data Division, Lawrence Berkeley National LaboratoryBerkeleyUnited States
- Kavli Institute for Fundamental NeuroscienceSan FranciscoUnited States
- Biological Systems and Engineering Division, Lawrence Berkeley National LaboratoryBerkeleyUnited States
- Helen Wills Neuroscience Institute and Redwood Center for Theoretical Neuroscience, University of California, BerkeleyBerkeleyUnited States
- Weill NeurohubBerkeleyUnited States
| |
Collapse
|
5
|
Abstract
Many applications are increasingly becoming I/O-bound. To improve scalability, analytical models of parallel I/O performance are often consulted to determine possible I/O optimizations. However, I/O performance modeling has predominantly focused on applications that directly issue I/O requests to a parallel file system or a local storage device. These I/O models are not directly usable by applications that access data through standardized I/O libraries, such as HDF5, FITS, and NetCDF, because a single I/O request to an object can trigger a cascade of I/O operations to different storage blocks. The I/O performance characteristics of applications that rely on these libraries is a complex function of the underlying data storage model, user-configurable parameters and object-level access patterns. As a consequence, I/O optimization is predominantly an ad-hoc process that is performed by application developers, who are often domain scientists with limited desire to delve into nuances of the storage hierarchy of modern computers. This paper presents an analytical cost model to predict the end-to-end execution time of applications that perform I/O through established array management libraries. The paper focuses on the HDF5 and Zarr array libraries, as examples of I/O libraries with radically different storage models: HDF5 stores every object in one file, while Zarr creates multiple files to store different objects. We find that accessing array objects via these I/O libraries introduces new overheads and optimizations. Specifically, in addition to I/O time, it is crucial to model the cost of transforming data to a particular storage layout (memory copy cost), as well as model the benefit of accessing a software cache. We evaluate the model on real applications that process observations (neuroscience) and simulation results (plasma physics). The evaluation on three HPC clusters reveals that I/O accounts for as little as 10% of the execution time in some cases, and hence models that only focus on I/O performance cannot accurately capture the performance of applications that use standard array storage libraries. In parallel experiments, our model correctly predicts the fastest storage library between HDF5 and Zarr 94% of the time, in contrast with 70% of the time for a cutting-edge I/O model.
Collapse
|
6
|
Tritt AJ, Rübel O, Dichter B, Ly R, Kang D, Chang EF, Frank LM, Bouchard K. HDMF: Hierarchical Data Modeling Framework for Modern Science Data Standards. Proc IEEE Int Conf Big Data 2019; 2019:165-179. [PMID: 34632466 PMCID: PMC8500680 DOI: 10.1109/bigdata47090.2019.9005648] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
A ubiquitous problem in aggregating data across different experimental and observational data sources is a lack of software infrastructure that enables flexible and extensible standardization of data and metadata. To address this challenge, we developed HDMF, a hierarchical data modeling framework for modern science data standards. With HDMF, we separate the process of data standardization into three main components: (1) data modeling and specification, (2) data I/O and storage, and (3) data interaction and data APIs. To enable standards to support the complex requirements and varying use cases throughout the data life cycle, HDMF provides object mapping infrastructure to insulate and integrate these various components. This approach supports the flexible development of data standards and extensions, optimized storage backends, and data APIs, while allowing the other components of the data standards ecosystem to remain stable. To meet the demands of modern, large-scale science data, HDMF provides advanced data I/O functionality for iterative data write, lazy data load, and parallel I/O. It also supports optimization of data storage via support for chunking, compression, linking, and modular data storage. We demonstrate the application of HDMF in practice to design NWB 2.0 [13], a modern data standard for collaborative science across the neurophysiology community.
Collapse
Affiliation(s)
- Andrew J Tritt
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Oliver Rübel
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Benjamin Dichter
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Ryan Ly
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Donghe Kang
- Computer Science and Engineering, Ohio State University, Columbus, OH, USA
| | - Edward F Chang
- Department of Neurological Surgery and the Center for Integrative Neuroscience, University of California, San Francisco, San Francisco, CA, USA
| | - Loren M Frank
- Howard Hughes Medical Institute, Kavli Institute for Fundamental Neuroscience, Department of Physiology, University of California, San Francisco, San Francisco, CA
| | - Kristofer Bouchard
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| |
Collapse
|
7
|
Erbilgin O, Rübel O, Louie KB, Trinh M, Raad MD, Wildish T, Udwary D, Hoover C, Deutsch S, Northen TR, Bowen BP. MAGI: A Method for Metabolite Annotation and Gene Integration. ACS Chem Biol 2019; 14:704-714. [PMID: 30896917 DOI: 10.1021/acschembio.8b01107] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Metabolomics is a widely used technology for obtaining direct measures of metabolic activities from diverse biological systems. However, ambiguous metabolite identifications are a common challenge and biochemical interpretation is often limited by incomplete and inaccurate genome-based predictions of enzyme activities (that is, gene annotations). Metabolite Annotation and Gene Integration (MAGI) generates a metabolite-gene association score using a biochemical reaction network. This is calculated by a method that emphasizes consensus between metabolites and genes via biochemical reactions. To demonstrate the potential of this method, we applied MAGI to integrate sequence data and metabolomics data collected from Streptomyces coelicolor A3(2), an extensively characterized bacterium that produces diverse secondary metabolites. Our findings suggest that coupling metabolomics and genomics data by scoring consensus between the two increases the quality of both metabolite identifications and gene annotations in this organism. MAGI also made biochemical predictions for poorly annotated genes that were consistent with the extensive literature on this important organism. This limited analysis suggests that using metabolomics data has the potential to improve annotations in sequenced organisms and also provides testable hypotheses for specific biochemical functions. MAGI is freely available for academic use both as an online tool at https://magi.nersc.gov and with source code available at https://github.com/biorack/magi .
Collapse
Affiliation(s)
- Onur Erbilgin
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Oliver Rübel
- Data Analytics and Visualization Group, Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Katherine B. Louie
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Matthew Trinh
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Markus de Raad
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Tony Wildish
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
- National Energy Research Scientific Computing Center, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Daniel Udwary
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
- National Energy Research Scientific Computing Center, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Cindi Hoover
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Samuel Deutsch
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Trent R. Northen
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Benjamin P. Bowen
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| |
Collapse
|
8
|
de Raad M, de Rond T, Rübel O, Keasling JD, Northen TR, Bowen BP. OpenMSI Arrayed Analysis Toolkit: Analyzing Spatially Defined Samples Using Mass Spectrometry Imaging. Anal Chem 2017; 89:5818-5823. [PMID: 28467051 DOI: 10.1021/acs.analchem.6b05004] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Mass spectrometry imaging (MSI) has primarily been applied in localizing biomolecules within biological matrices. Although well-suited, the application of MSI for comparing thousands of spatially defined spotted samples has been limited. One reason for this is a lack of suitable and accessible data processing tools for the analysis of large arrayed MSI sample sets. The OpenMSI Arrayed Analysis Toolkit (OMAAT) is a software package that addresses the challenges of analyzing spatially defined samples in MSI data sets. OMAAT is written in Python and is integrated with OpenMSI ( http://openmsi.nersc.gov ), a platform for storing, sharing, and analyzing MSI data. By using a web-based python notebook (Jupyter), OMAAT is accessible to anyone without programming experience yet allows experienced users to leverage all features. OMAAT was evaluated by analyzing an MSI data set of a high-throughput glycoside hydrolase activity screen comprising 384 samples arrayed onto a NIMS surface at a 450 μm spacing, decreasing analysis time >100-fold while maintaining robust spot-finding. The utility of OMAAT was demonstrated for screening metabolic activities of different sized soil particles, including hydrolysis of sugars, revealing a pattern of size dependent activities. These results introduce OMAAT as an effective toolkit for analyzing spatially defined samples in MSI. OMAAT runs on all major operating systems, and the source code can be obtained from the following GitHub repository: https://github.com/biorack/omaat .
Collapse
Affiliation(s)
- Markus de Raad
- Environmental Genomics and Systems Biology, Biosciences, Lawrence Berkeley National Laboratory , 1 Cyclotron Road, Berkeley, California 94720, United States
| | - Tristan de Rond
- Department of Chemistry, University of California , Berkeley, California 94720, United States
| | - Oliver Rübel
- Computational Research Division, Lawrence Berkeley National Laboratory , 1 Cyclotron Road, Berkeley, California 94720, United States
| | - Jay D Keasling
- Department of Chemical and Biomolecular Engineering, Department of Bioengineering, and California Institute for Quantitative Biosciences, University of California , Berkeley, California 94720, United States.,DOE Joint BioEnergy Institute , Emeryville, California 94608, United States.,Biological Systems and Engineering Division, Lawrence Berkeley National Lab , Berkeley, California 94720, United States.,Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark , Hørsholm 2970, Denmark
| | - Trent R Northen
- Environmental Genomics and Systems Biology, Biosciences, Lawrence Berkeley National Laboratory , 1 Cyclotron Road, Berkeley, California 94720, United States.,Joint Genome Institute , Department of Energy, 2800 Mitchell Drive, Walnut Creek, California 94598, United States
| | - Benjamin P Bowen
- Environmental Genomics and Systems Biology, Biosciences, Lawrence Berkeley National Laboratory , 1 Cyclotron Road, Berkeley, California 94720, United States.,Joint Genome Institute , Department of Energy, 2800 Mitchell Drive, Walnut Creek, California 94598, United States
| |
Collapse
|
9
|
Rübel O, Dougherty M, Prabhat, Denes P, Conant D, Chang EF, Bouchard K. Methods for Specifying Scientific Data Standards and Modeling Relationships with Applications to Neuroscience. Front Neuroinform 2016; 10:48. [PMID: 27867355 PMCID: PMC5095137 DOI: 10.3389/fninf.2016.00048] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2016] [Accepted: 10/18/2016] [Indexed: 11/18/2022] Open
Abstract
Neuroscience continues to experience a tremendous growth in data; in terms of the volume and variety of data, the velocity at which data is acquired, and in turn the veracity of data. These challenges are a serious impediment to sharing of data, analyses, and tools within and across labs. Here, we introduce BRAINformat, a novel data standardization framework for the design and management of scientific data formats. The BRAINformat library defines application-independent design concepts and modules that together create a general framework for standardization of scientific data. We describe the formal specification of scientific data standards, which facilitates sharing and verification of data and formats. We introduce the concept of Managed Objects, enabling semantic components of data formats to be specified as self-contained units, supporting modular and reusable design of data format components and file storage. We also introduce the novel concept of Relationship Attributes for modeling and use of semantic relationships between data objects. Based on these concepts we demonstrate the application of our framework to design and implement a standard format for electrophysiology data and show how data standardization and relationship-modeling facilitate data analysis and sharing. The format uses HDF5, enabling portable, scalable, and self-describing data storage and integration with modern high-performance computing for data-driven discovery. The BRAINformat library is open source, easy-to-use, and provides detailed user and developer documentation and is freely available at: https://bitbucket.org/oruebel/brainformat.
Collapse
Affiliation(s)
- Oliver Rübel
- Computational Research Division, Lawrence Berkeley National Laboratory Berkeley, CA, USA
| | - Max Dougherty
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory Berkeley, CA, USA
| | - Prabhat
- National Energy Research Scientific Computing Center, Lawrence Berkeley National Laboratory Berkeley, CA, USA
| | - Peter Denes
- Physical Sciences Division, Lawrence Berkeley National Laboratory Berkeley, CA, USA
| | - David Conant
- Neuroscience, University of California, San Francisco Medical Center, University of California, San Francisco San Francisco, CA, USA
| | - Edward F Chang
- Neuroscience, University of California, San Francisco Medical Center, University of California, San Francisco San Francisco, CA, USA
| | - Kristofer Bouchard
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory Berkeley, CA, USA
| |
Collapse
|
10
|
Dalisay DS, Kim KW, Lee C, Yang H, Rübel O, Bowen BP, Davin LB, Lewis NG. Dirigent Protein-Mediated Lignan and Cyanogenic Glucoside Formation in Flax Seed: Integrated Omics and MALDI Mass Spectrometry Imaging. J Nat Prod 2015; 78:1231-42. [PMID: 25981198 DOI: 10.1021/acs.jnatprod.5b00023] [Citation(s) in RCA: 75] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
An integrated omics approach using genomics, transcriptomics, metabolomics (MALDI mass spectrometry imaging, MSI), and bioinformatics was employed to study spatiotemporal formation and deposition of health-protecting polymeric lignans and plant defense cyanogenic glucosides. Intact flax (Linum usitatissimum) capsules and seed tissues at different development stages were analyzed. Transcriptome analyses indicated distinct expression patterns of dirigent protein (DP) gene family members encoding (-)- and (+)-pinoresinol-forming DPs and their associated downstream metabolic processes, respectively, with the former expressed at early seed coat development stages. Genes encoding (+)-pinoresinol-forming DPs were, in contrast, expressed at later development stages. Recombinant DP expression and DP assays also unequivocally established their distinct stereoselective biochemical functions. Using MALDI MSI and ion mobility separation analyses, the pinoresinol downstream derivatives, secoisolariciresinol diglucoside (SDG) and SDG hydroxymethylglutaryl ester, were localized and detectable only in early seed coat development stages. SDG derivatives were then converted into higher molecular weight phenolics during seed coat maturation. By contrast, the plant defense cyanogenic glucosides, the monoglucosides linamarin/lotaustralin, were detected throughout the flax capsule, whereas diglucosides linustatin/neolinustatin only accumulated in endosperm and embryo tissues. A putative biosynthetic pathway to the cyanogens is proposed on the basis of transcriptome coexpression data. Localization of all metabolites was at ca. 20 μm resolution, with the web based tool OpenMSI enabling not only resolution enhancement but also an interactive system for real-time searching for any ion in the tissue under analysis.
Collapse
Affiliation(s)
- Doralyn S Dalisay
- †Institute of Biological Chemistry, Washington State University, Pullman, Washington 99164-6340, United States
| | - Kye Won Kim
- †Institute of Biological Chemistry, Washington State University, Pullman, Washington 99164-6340, United States
| | - Choonseok Lee
- †Institute of Biological Chemistry, Washington State University, Pullman, Washington 99164-6340, United States
| | - Hong Yang
- †Institute of Biological Chemistry, Washington State University, Pullman, Washington 99164-6340, United States
| | - Oliver Rübel
- ‡Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, California 94720, United States
| | - Benjamin P Bowen
- §Life Sciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, California 94720, United States
| | - Laurence B Davin
- †Institute of Biological Chemistry, Washington State University, Pullman, Washington 99164-6340, United States
| | - Norman G Lewis
- †Institute of Biological Chemistry, Washington State University, Pullman, Washington 99164-6340, United States
| |
Collapse
|
11
|
Yang J, Rübel O, Prabhat, Mahoney MW, Bowen BP. Identifying Important Ions and Positions in Mass Spectrometry Imaging Data Using CUR Matrix Decompositions. Anal Chem 2015; 87:4658-66. [DOI: 10.1021/ac5040264] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Jiyan Yang
- Institute
for Computational and Mathematical Engineering, Stanford University, Stanford, California 94305, United States
| | - Oliver Rübel
- Computational
Research Division, Lawrence Berkeley Lab, One Cyclotron Road, Berkeley, California 94720, United States
| | - Prabhat
- Computational
Research Division, Lawrence Berkeley Lab, One Cyclotron Road, Berkeley, California 94720, United States
| | - Michael W. Mahoney
- International
Computer Science Institute and Department of Statistics, University of California, Berkeley, California 94720, United States
| | - Benjamin P. Bowen
- Life
Sciences Division, Lawrence Berkeley Lab, One Cyclotron Road, Berkeley, California 94720, United States
| |
Collapse
|
12
|
Harvey W, Park IH, Rübel O, Pascucci V, Bremer PT, Li C, Wang Y. A collaborative visual analytics suite for protein folding research. J Mol Graph Model 2014; 53:59-71. [PMID: 25068440 DOI: 10.1016/j.jmgm.2014.06.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2014] [Accepted: 06/17/2014] [Indexed: 10/25/2022]
Abstract
Molecular dynamics (MD) simulation is a crucial tool for understanding principles behind important biochemical processes such as protein folding and molecular interaction. With the rapidly increasing power of modern computers, large-scale MD simulation experiments can be performed regularly, generating huge amounts of MD data. An important question is how to analyze and interpret such massive and complex data. One of the (many) challenges involved in analyzing MD simulation data computationally is the high-dimensionality of such data. Given a massive collection of molecular conformations, researchers typically need to rely on their expertise and prior domain knowledge in order to retrieve certain conformations of interest. It is not easy to make and test hypotheses as the data set as a whole is somewhat "invisible" due to its high dimensionality. In other words, it is hard to directly access and examine individual conformations from a sea of molecular structures, and to further explore the entire data set. There is also no easy and convenient way to obtain a global view of the data or its various modalities of biochemical information. To this end, we present an interactive, collaborative visual analytics tool for exploring massive, high-dimensional molecular dynamics simulation data sets. The most important utility of our tool is to provide a platform where researchers can easily and effectively navigate through the otherwise "invisible" simulation data sets, exploring and examining molecular conformations both as a whole and at individual levels. The visualization is based on the concept of a topological landscape, which is a 2D terrain metaphor preserving certain topological and geometric properties of the high dimensional protein energy landscape. In addition to facilitating easy exploration of conformations, this 2D terrain metaphor also provides a platform where researchers can visualize and analyze various properties (such as contact density) overlayed on the top of the 2D terrain. Finally, the software provides a collaborative environment where multiple researchers can assemble observations and biochemical events into storyboards and share them in real time over the Internet via a client-server architecture. The software is written in Scala and runs on the cross-platform Java Virtual Machine. Binaries and source code are available at http://www.aylasoftware.org and have been released under the GNU General Public License.
Collapse
Affiliation(s)
- William Harvey
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, United States.
| | - In-Hee Park
- Chemical Physics Program, The Ohio State University, Columbus, OH, United States
| | - Oliver Rübel
- Visualization Group, Lawrence Berkeley National Laboratory, Berkeley, CA, United States
| | - Valerio Pascucci
- Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT, United States
| | - Peer-Timo Bremer
- Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, United States
| | - Chenglong Li
- Chemical Physics Program and College of Pharmacy, The Ohio State University, Columbus, OH, United States
| | - Yusu Wang
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, United States.
| |
Collapse
|
13
|
Rübel O, Geddes CGR, Chen M, Cormier-Michel E, Bethel EW. Feature-based analysis of plasma-based particle acceleration data. IEEE Trans Vis Comput Graph 2014; 20:196-210. [PMID: 24356363 DOI: 10.1109/tvcg.2013.107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Plasma-based particle accelerators can produce and sustain thousands of times stronger acceleration fields than conventional particle accelerators, providing a potential solution to the problem of the growing size and cost of conventional particle accelerators. To facilitate scientific knowledge discovery from the ever growing collections of accelerator simulation data generated by accelerator physicists to investigate next-generation plasma-based particle accelerator designs, we describe a novel approach for automatic detection and classification of particle beams and beam substructures due to temporal differences in the acceleration process, here called acceleration features. The automatic feature detection in combination with a novel visualization tool for fast, intuitive, query-based exploration of acceleration features enables an effective top-down data exploration process, starting from a high-level, feature-based view down to the level of individual particles. We describe the application of our analysis in practice to analyze simulations of single pulse and dual and triple colliding pulse accelerator designs, and to study the formation and evolution of particle beams, to compare substructures of a beam, and to investigate transverse particle loss.
Collapse
Affiliation(s)
| | - Cameron G R Geddes
- Lasers, Optical Accelerator Systems Integrated Studies (LOASIS) program of the Accelerator and Fusion Research Division, LBNL
| | - Min Chen
- Lasers, Optical Accelerator Systems Integrated Studies (LOASIS) program of the Accelerator and Fusion Research Division, LBNL
| | | | | |
Collapse
|
14
|
Rübel O, Greiner A, Cholia S, Louie K, Bethel EW, Northen TR, Bowen BP. OpenMSI: a high-performance web-based platform for mass spectrometry imaging. Anal Chem 2013; 85:10354-61. [PMID: 24087878 DOI: 10.1021/ac402540a] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Mass spectrometry imaging (MSI) enables researchers to directly probe endogenous molecules directly within the architecture of the biological matrix. Unfortunately, efficient access, management, and analysis of the data generated by MSI approaches remain major challenges to this rapidly developing field. Despite the availability of numerous dedicated file formats and software packages, it is a widely held viewpoint that the biggest challenge is simply opening, sharing, and analyzing a file without loss of information. Here we present OpenMSI, a software framework and platform that addresses these challenges via an advanced, high-performance, extensible file format and Web API for remote data access (http://openmsi.nersc.gov). The OpenMSI file format supports storage of raw MSI data, metadata, and derived analyses in a single, self-describing format based on HDF5 and is supported by a large range of analysis software (e.g., Matlab and R) and programming languages (e.g., C++, Fortran, and Python). Careful optimization of the storage layout of MSI data sets using chunking, compression, and data replication accelerates common, selective data access operations while minimizing data storage requirements and are critical enablers of rapid data I/O. The OpenMSI file format has shown to provide >2000-fold improvement for image access operations, enabling spectrum and image retrieval in less than 0.3 s across the Internet even for 50 GB MSI data sets. To make remote high-performance compute resources accessible for analysis and to facilitate data sharing and collaboration, we describe an easy-to-use yet powerful Web API, enabling fast and convenient access to MSI data, metadata, and derived analysis results stored remotely to facilitate high-performance data analysis and enable implementation of Web based data sharing, visualization, and analysis.
Collapse
Affiliation(s)
- Oliver Rübel
- Lawrence Berkeley National Laboratory , One Cyclotron Road, Berkeley, California, 94720, United States
| | | | | | | | | | | | | |
Collapse
|
15
|
Abstract
This paper introduces a novel partition-based regression approach that incorporates topological information. Partition-based regression typically introduce a quality-of-fit-driven decomposition of the domain. The emphasis in this work is on a topologically meaningful segmentation. Thus, the proposed regression approach is based on a segmentation induced by a discrete approximation of the Morse-Smale complex. This yields a segmentation with partitions corresponding to regions of the function with a single minimum and maximum that are often well approximated by a linear model. This approach yields regression models that are amenable to interpretation and have good predictive capacity. Typically, regression estimates are quantified by their geometrical accuracy. For the proposed regression, an important aspect is the quality of the segmentation itself. Thus, this paper introduces a new criterion that measures the topological accuracy of the estimate. The topological accuracy provides a complementary measure to the classical geometrical error measures and is very sensitive to over-fitting. The Morse-Smale regression is compared to state-of-the-art approaches in terms of geometry and topology and yields comparable or improved fits in many cases. Finally, a detailed study on climate-simulation data demonstrates the application of the Morse-Smale regression. Supplementary materials are available online and contain an implementation of the proposed approach in the R package msr, an analysis and simulations on the stability of the Morse-Smale complex approximation and additional tables for the climate-simulation study.
Collapse
|
16
|
Kim J, Martin RL, Rübel O, Haranczyk M, Smit B. High-Throughput Characterization of Porous Materials Using Graphics Processing Units. J Chem Theory Comput 2012; 8:1684-93. [DOI: 10.1021/ct200787v] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Affiliation(s)
- Jihan Kim
- Material Sciences
Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Richard L. Martin
- Computational Research
Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Oliver Rübel
- Computational Research
Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Maciej Haranczyk
- Computational Research
Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Berend Smit
- Departments of Chemical
and Biomolecular Engineering and Chemistry, University of California, Berkeley, California 94720,
United States
| |
Collapse
|
17
|
Harvey W, Rübel O, Pascucci V, Bremer PT, Wang Y. Enhanced Topology-Sensitive Clustering by Reeb Graph Shattering. Mathematics and Visualization 2012. [DOI: 10.1007/978-3-642-23175-9_6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
18
|
Rübel O, Ahern S, Bethel EW, Biggin MD, Childs H, Cormier-Michel E, Depace A, Eisen MB, Fowlkes CC, Geddes CGR, Hagen H, Hamann B, Huang MY, Keränen SVE, Knowles DW, Hendriks CLL, Malik J, Meredith J, Messmer P, Prabhat, Ushizima D, Weber GH, Wu K. Coupling visualization and data analysis for knowledge discovery from multi-dimensional scientific data. ACTA ACUST UNITED AC 2010; 1:1757-1764. [PMID: 23762211 DOI: 10.1016/j.procs.2010.04.197] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Knowledge discovery from large and complex scientific data is a challenging task. With the ability to measure and simulate more processes at increasingly finer spatial and temporal scales, the growing number of data dimensions and data objects presents tremendous challenges for effective data analysis and data exploration methods and tools. The combination and close integration of methods from scientific visualization, information visualization, automated data analysis, and other enabling technologies -such as efficient data management- supports knowledge discovery from multi-dimensional scientific data. This paper surveys two distinct applications in developmental biology and accelerator physics, illustrating the effectiveness of the described approach.
Collapse
Affiliation(s)
- Oliver Rübel
- Computational Research Division, Lawrence Berkeley National Laboratory (LBNL), One Cyclotron Road, Berkeley, CA, 94720, USA ; International Research Training Group 1131, University of Kaiserslautern, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Rübel O, Weber GH, Huang MY, Bethel EW, Biggin MD, Fowlkes CC, Luengo Hendriks CL, Keränen SVE, Eisen MB, Knowles DW, Malik J, Hagen H, Hamann B. Integrating data clustering and visualization for the analysis of 3D gene expression data. IEEE/ACM Trans Comput Biol Bioinform 2010; 7:64-79. [PMID: 20150669 DOI: 10.1109/tcbb.2008.49] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex data sets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss 1) the integration of data clustering and visualization into one framework, 2) the application of data clustering to 3D gene expression data, 3) the evaluation of the number of clusters k in the context of 3D gene expression clustering, and 4) the improvement of overall analysis quality via dedicated postprocessing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.
Collapse
Affiliation(s)
- Oliver Rübel
- Lawrence Berkeley National Laboratory, 1 CyclotronRoad, Berkeley, CA 94720, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Weber GH, Rübel O, Huang MY, DePace AH, Fowlkes CC, Keränen SVE, Luengo Hendriks CL, Hagen H, Knowles DW, Malik J, Biggin MD, Hamann B. Visual exploration of three-dimensional gene expression using physical views and linked abstract views. IEEE/ACM Trans Comput Biol Bioinform 2009; 6:296-309. [PMID: 19407353 PMCID: PMC3045837 DOI: 10.1109/tcbb.2007.70249] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
During animal development, complex patterns of gene expression provide positional information within the embryo. To better understand the underlying gene regulatory networks, the Berkeley Drosophila Transcription Network Project (BDTNP) has developed methods that support quantitative computational analysis of three-dimensional (3D) gene expression in early Drosophila embryos at cellular resolution. We introduce PointCloudXplore (PCX), an interactive visualization tool that supports visual exploration of relationships between different genes' expression using a combination of established visualization techniques. Two aspects of gene expression are of particular interest: 1) gene expression patterns defined by the spatial locations of cells expressing a gene and 2) relationships between the expression levels of multiple genes. PCX provides users with two corresponding classes of data views: 1) Physical Views based on the spatial relationships of cells in the embryo and 2) Abstract Views that discard spatial information and plot expression levels of multiple genes with respect to each other. Cell Selectors highlight data associated with subsets of embryo cells within a View. Using linking, these selected cells can be viewed in multiple representations. We describe PCX as a 3D gene expression visualization tool and provide examples of how it has been used by BDTNP biologists to generate new hypotheses.
Collapse
Affiliation(s)
- Gunther H. Weber
- Computational Research Divisition, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720
| | - Oliver Rübel
- Computational Research Divisition, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720
- International Research Training Group “Visualization of Large and Unstructured Data Sets – Applications in Geospatial Planning, Modeling, and Engineering,” Technische Universität Kaiserslautern, Erwin-Schrödinger-Straße, D-67653 Kaiserslautern, Germany
| | - Min-Yu Huang
- Institute for Data Analysis and Visualization (IDAV) and the Department of Computer Science, University of California, Davis, One Shields Avenue, Davis. CA 95616, USA
| | - Angela H. DePace
- Department of Molecular and Cellular Biology and the Center for Integrative Genomics, University of California, Berkeley, 142 LSA #3200, Berkeley, CA 94720, USA
| | - Charless C. Fowlkes
- Computer Science Division, University of California, Berkeley, CA 94720, USA
| | - Soile V. E. Keränen
- Genomics Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720
| | - Cris L. Luengo Hendriks
- Life Sciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720
| | - Hans Hagen
- International Research Training Group “Visualization of Large and Unstructured Data Sets – Applications in Geospatial Planning, Modeling, and Engineering,” Technische Universität Kaiserslautern, Erwin-Schrödinger-Straße, D-67653 Kaiserslautern, Germany
| | - David W. Knowles
- Life Sciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720
| | - Jitendra Malik
- Computer Science Division, University of California, Berkeley, CA 94720, USA
| | - Mark D. Biggin
- Genomics Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720
| | - Bernd Hamann
- Institute for Data Analysis and Visualization (IDAV) and the Department of Computer Science, University of California, Davis, One Shields Avenue, Davis. CA 95616, USA
| |
Collapse
|
21
|
Fowlkes CC, Hendriks CLL, Keränen SVE, Weber GH, Rübel O, Huang MY, Chatoor S, DePace AH, Simirenko L, Henriquez C, Beaton A, Weiszmann R, Celniker S, Hamann B, Knowles DW, Biggin MD, Eisen MB, Malik J. A quantitative spatiotemporal atlas of gene expression in the Drosophila blastoderm. Cell 2008; 133:364-74. [PMID: 18423206 DOI: 10.1016/j.cell.2008.01.053] [Citation(s) in RCA: 230] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2007] [Revised: 11/27/2007] [Accepted: 01/31/2008] [Indexed: 11/16/2022]
Abstract
To fully understand animal transcription networks, it is essential to accurately measure the spatial and temporal expression patterns of transcription factors and their targets. We describe a registration technique that takes image-based data from hundreds of Drosophila blastoderm embryos, each costained for a reference gene and one of a set of genes of interest, and builds a model VirtualEmbryo. This model captures in a common framework the average expression patterns for many genes in spite of significant variation in morphology and expression between individual embryos. We establish the method's accuracy by showing that relationships between a pair of genes' expression inferred from the model are nearly identical to those measured in embryos costained for the pair. We present a VirtualEmbryo containing data for 95 genes at six time cohorts. We show that known gene-regulatory interactions can be automatically recovered from this data set and predict hundreds of new interactions.
Collapse
Affiliation(s)
- Charless C Fowlkes
- Berkeley Drosophila Transcription Network Project, University of California, Irvine, CA 92697, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|