1
|
Smith CL, Lan Y, Jain R, Epstein JA, Poleshko A. Global chromatin relabeling accompanies spatial inversion of chromatin in rod photoreceptors. SCIENCE ADVANCES 2021; 7:eabj3035. [PMID: 34559565 PMCID: PMC8462898 DOI: 10.1126/sciadv.abj3035] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 08/04/2021] [Indexed: 06/13/2023]
Abstract
The nuclear architecture of rod photoreceptor cells in nocturnal mammals is unlike that of other animal cells. Murine rod cells have an “inverted” chromatin organization with euchromatin at the nuclear periphery and heterochromatin packed in the center of the nucleus. In conventional nuclear architecture, euchromatin is mostly in the interior, and heterochromatin is largely at the nuclear periphery. We demonstrate that inverted nuclear architecture is achieved through global relabeling of the rod cell epigenome. During rod cell maturation, H3K9me2-labeled nuclear peripheral heterochromatin is relabeled with H3K9me3 and repositioned to the nuclear center, while transcriptionally active euchromatin is labeled with H3K9me2 and positioned at the nuclear periphery. Global chromatin relabeling is correlated with spatial rearrangement, suggesting a critical role for histone modifications, specifically H3K9 methylation, in nuclear architecture. These results reveal a dramatic example of genome-wide epigenetic relabeling of chromatin that accompanies altered nuclear architecture in a postnatal, postmitotic cell.
Collapse
Affiliation(s)
- Cheryl L. Smith
- Department of Cell and Developmental Biology, Penn Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yemin Lan
- Department of Cell and Developmental Biology, Penn Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Rajan Jain
- Department of Cell and Developmental Biology, Penn Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Medicine, Penn Cardiovascular Institute, and Institute of Regenerative Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jonathan A. Epstein
- Department of Cell and Developmental Biology, Penn Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Medicine, Penn Cardiovascular Institute, and Institute of Regenerative Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Andrey Poleshko
- Department of Cell and Developmental Biology, Penn Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
2
|
Bijari K, Akram MA, Ascoli GA. An open-source framework for neuroscience metadata management applied to digital reconstructions of neuronal morphology. Brain Inform 2020; 7:2. [PMID: 32219575 PMCID: PMC7098402 DOI: 10.1186/s40708-020-00103-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Accepted: 03/14/2020] [Indexed: 12/21/2022] Open
Abstract
Research advancements in neuroscience entail the production of a substantial amount of data requiring interpretation, analysis, and integration. The complexity and diversity of neuroscience data necessitate the development of specialized databases and associated standards and protocols. NeuroMorpho.Org is an online repository of over one hundred thousand digitally reconstructed neurons and glia shared by hundreds of laboratories worldwide. Every entry of this public resource is associated with essential metadata describing animal species, anatomical region, cell type, experimental condition, and additional information relevant to contextualize the morphological content. Until recently, the lack of a user-friendly, structured metadata annotation system relying on standardized terminologies constituted a major hindrance in this effort, limiting the data release pace. Over the past 2 years, we have transitioned the original spreadsheet-based metadata annotation system of NeuroMorpho.Org to a custom-developed, robust, web-based framework for extracting, structuring, and managing neuroscience information. Here we release the metadata portal publicly and explain its functionality to enable usage by data contributors. This framework facilitates metadata annotation, improves terminology management, and accelerates data sharing. Moreover, its open-source development provides the opportunity of adapting and extending the code base to other related research projects with similar requirements. This metadata portal is a beneficial web companion to NeuroMorpho.Org which saves time, reduces errors, and aims to minimize the barrier for direct knowledge sharing by domain experts. The underlying framework can be progressively augmented with the integration of increasingly autonomous machine intelligence components.
Collapse
Affiliation(s)
- Kayvan Bijari
- Krasnow Institute for Advanced Study, George Mason University, Fairfax, VA USA
| | - Masood A. Akram
- Krasnow Institute for Advanced Study, George Mason University, Fairfax, VA USA
| | - Giorgio A. Ascoli
- Krasnow Institute for Advanced Study, George Mason University, Fairfax, VA USA
| |
Collapse
|
3
|
Bubier JA, Sutphin GL, Reynolds TJ, Korstanje R, Fuksman-Kumpa A, Baker EJ, Langston MA, Chesler EJ. Integration of heterogeneous functional genomics data in gerontology research to find genes and pathway underlying aging across species. PLoS One 2019; 14:e0214523. [PMID: 30978202 PMCID: PMC6461221 DOI: 10.1371/journal.pone.0214523] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 03/15/2019] [Indexed: 11/18/2022] Open
Abstract
Understanding the biological mechanisms behind aging, lifespan and healthspan is becoming increasingly important as the proportion of the world's population over the age of 65 grows, along with the cost and complexity of their care. BigData oriented approaches and analysis methods enable current and future bio-gerontologists to synthesize, distill and interpret vast, heterogeneous data from functional genomics studies of aging. GeneWeaver is an analysis system for integration of data that allows investigators to store, search, and analyze immense amounts of data including user-submitted experimental data, data from primary publications, and data in other databases. Aging related genome-wide gene sets from primary publications were curated into this system in concert with data from other model-organism and aging-specific databases, and applied to several questions in genrontology using. For example, we identified Cd63 as a frequently represented gene among aging-related genome-wide results. To evaluate the role of Cd63 in aging, we performed RNAi knockdown of the C. elegans ortholog, tsp-7, demonstrating that this manipulation is capable of extending lifespan. The tools in GeneWeaver enable aging researchers to make new discoveries into the associations between the genes, normal biological processes, and diseases that affect aging, healthspan, and lifespan.
Collapse
Affiliation(s)
- Jason A. Bubier
- The Jackson Laboratory, Bar Harbor ME, United States of America
| | - George L. Sutphin
- The University of Arizona, Molecular and Cellular Biology, United States of America
| | | | - Ron Korstanje
- The Jackson Laboratory Nathan Shock Center of Excellence in the Basic Biology of Aging, The Jackson Laboratory, Bar Harbor, ME, United States of America
| | | | | | | | - Elissa J. Chesler
- The Jackson Laboratory, Bar Harbor ME, United States of America
- The Jackson Laboratory Nathan Shock Center of Excellence in the Basic Biology of Aging, The Jackson Laboratory, Bar Harbor, ME, United States of America
- * E-mail:
| |
Collapse
|
4
|
Lee K, Famiglietti ML, McMahon A, Wei CH, MacArthur JAL, Poux S, Breuza L, Bridge A, Cunningham F, Xenarios I, Lu Z. Scaling up data curation using deep learning: An application to literature triage in genomic variation resources. PLoS Comput Biol 2018; 14:e1006390. [PMID: 30102703 PMCID: PMC6107285 DOI: 10.1371/journal.pcbi.1006390] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Revised: 08/23/2018] [Accepted: 07/24/2018] [Indexed: 11/18/2022] Open
Abstract
Manually curating biomedical knowledge from publications is necessary to build a knowledge based service that provides highly precise and organized information to users. The process of retrieving relevant publications for curation, which is also known as document triage, is usually carried out by querying and reading articles in PubMed. However, this query-based method often obtains unsatisfactory precision and recall on the retrieved results, and it is difficult to manually generate optimal queries. To address this, we propose a machine-learning assisted triage method. We collect previously curated publications from two databases UniProtKB/Swiss-Prot and the NHGRI-EBI GWAS Catalog, and used them as a gold-standard dataset for training deep learning models based on convolutional neural networks. We then use the trained models to classify and rank new publications for curation. For evaluation, we apply our method to the real-world manual curation process of UniProtKB/Swiss-Prot and the GWAS Catalog. We demonstrate that our machine-assisted triage method outperforms the current query-based triage methods, improves efficiency, and enriches curated content. Our method achieves a precision 1.81 and 2.99 times higher than that obtained by the current query-based triage methods of UniProtKB/Swiss-Prot and the GWAS Catalog, respectively, without compromising recall. In fact, our method retrieves many additional relevant publications that the query-based method of UniProtKB/Swiss-Prot could not find. As these results show, our machine learning-based method can make the triage process more efficient and is being implemented in production so that human curators can focus on more challenging tasks to improve the quality of knowledge bases. As the volume of literature on genomic variants continues to grow at an increasing rate, it is becoming more difficult for a curator of a variant knowledge base to keep up with and curate all the published papers. Here, we suggest a deep learning-based literature triage method for genomic variation resources. Our method achieves state-of-the-art performance on the triage task. Moreover, our model does not require any laborious preprocessing or feature engineering steps, which are required for traditional machine learning triage methods. We applied our method to the literature triage process of UniProtKB/Swiss-Prot and the NHGRI-EBI GWAS Catalog for genomic variation by collaborating with the database curators. Both the manual curation teams confirmed that our method achieved higher precision than their previous query-based triage methods without compromising recall. Both results show that our method is more efficient and can replace the traditional query-based triage methods of manually curated databases. Our method can give human curators more time to focus on more challenging tasks such as actual curation as well as the discovery of novel papers/experimental techniques to consider for inclusion.
Collapse
Affiliation(s)
- Kyubum Lee
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | | | - Aoife McMahon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Chih-Hsuan Wei
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Jacqueline Ann Langdon MacArthur
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Sylvain Poux
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Lionel Breuza
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Alan Bridge
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Ioannis Xenarios
- Center for Integrative Genomics, University of Lausanne, Lausanne Switzerland.,Department of Chemistry and Biochemistry, University of Geneva, Geneva, Switzerland
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| |
Collapse
|
5
|
Chen X, Gururaj AE, Ozyurt B, Liu R, Soysal E, Cohen T, Tiryaki F, Li Y, Zong N, Jiang M, Rogith D, Salimi M, Kim HE, Rocca-Serra P, Gonzalez-Beltran A, Farcas C, Johnson T, Margolis R, Alter G, Sansone SA, Fore IM, Ohno-Machado L, Grethe JS, Xu H. DataMed - an open source discovery index for finding biomedical datasets. J Am Med Inform Assoc 2018; 25:300-308. [PMID: 29346583 PMCID: PMC7378878 DOI: 10.1093/jamia/ocx121] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 09/20/2017] [Accepted: 09/28/2017] [Indexed: 12/17/2022] Open
Abstract
Objective Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain. Materials and Methods DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health–funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine. Results and Conclusion Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community.
Collapse
Affiliation(s)
- Xiaoling Chen
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Anupama E Gururaj
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | | | - Ruiling Liu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Ergin Soysal
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Trevor Cohen
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Firat Tiryaki
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Yueling Li
- Center for Research in Biological Systems
| | - Nansu Zong
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Min Jiang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Deevakar Rogith
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Mandana Salimi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Hyeon-Eui Kim
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | | | | | - Claudiu Farcas
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Todd Johnson
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Ron Margolis
- National Institutes of Health, Bethesda, MD, USA
| | | | | | - Ian M Fore
- National Institutes of Health, Bethesda, MD, USA
| | - Lucila Ohno-Machado
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | | | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
6
|
Ozyurt IB, Grethe JS, Martone ME, Bandrowski AE. Resource Disambiguator for the Web: Extracting Biomedical Resources and Their Citations from the Scientific Literature. PLoS One 2016; 11:e0146300. [PMID: 26730820 PMCID: PMC5156472 DOI: 10.1371/journal.pone.0146300] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 12/15/2015] [Indexed: 11/19/2022] Open
Abstract
The NIF Registry developed and maintained by the Neuroscience Information Framework is a cooperative project aimed at cataloging research resources, e.g., software tools, databases and tissue banks, funded largely by governments and available as tools to research scientists. Although originally conceived for neuroscience, the NIF Registry has over the years broadened in the scope to include research resources of general relevance to biomedical research. The current number of research resources listed by the Registry numbers over 13K. The broadening in scope to biomedical science led us to re-christen the NIF Registry platform as SciCrunch. The NIF/SciCrunch Registry has been cataloging the resource landscape since 2006; as such, it serves as a valuable dataset for tracking the breadth, fate and utilization of these resources. Our experience shows research resources like databases are dynamic objects, that can change location and scope over time. Although each record is entered manually and human-curated, the current size of the registry requires tools that can aid in curation efforts to keep content up to date, including when and where such resources are used. To address this challenge, we have developed an open source tool suite, collectively termed RDW: Resource Disambiguator for the (Web). RDW is designed to help in the upkeep and curation of the registry as well as in enhancing the content of the registry by automated extraction of resource candidates from the literature. The RDW toolkit includes a URL extractor from papers, resource candidate screen, resource URL change tracker, resource content change tracker. Curators access these tools via a web based user interface. Several strategies are used to optimize these tools, including supervised and unsupervised learning algorithms as well as statistical text analysis. The complete tool suite is used to enhance and maintain the resource registry as well as track the usage of individual resources through an innovative literature citation index honed for research resources. Here we present an overview of the Registry and show how the RDW tools are used in curation and usage tracking.
Collapse
|
7
|
Affiliation(s)
- Toni Kazic
- Dept. of Computer Science Missouri Maize Center, Missouri Informatics Institute, and Interdisciplinary Plant Group, University of Missouri, Columbia, Missouri, United States of America
| |
Collapse
|
8
|
Nielson JL, Haefeli J, Salegio EA, Liu AW, Guandique CF, Stück ED, Hawbecker S, Moseanko R, Strand SC, Zdunowski S, Brock JH, Roy RR, Rosenzweig ES, Nout-Lomas YS, Courtine G, Havton LA, Steward O, Reggie Edgerton V, Tuszynski MH, Beattie MS, Bresnahan JC, Ferguson AR. Leveraging biomedical informatics for assessing plasticity and repair in primate spinal cord injury. Brain Res 2014; 1619:124-38. [PMID: 25451131 DOI: 10.1016/j.brainres.2014.10.048] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Revised: 10/22/2014] [Accepted: 10/23/2014] [Indexed: 11/18/2022]
Abstract
Recent preclinical advances highlight the therapeutic potential of treatments aimed at boosting regeneration and plasticity of spinal circuitry damaged by spinal cord injury (SCI). With several promising candidates being considered for translation into clinical trials, the SCI community has called for a non-human primate model as a crucial validation step to test efficacy and validity of these therapies prior to human testing. The present paper reviews the previous and ongoing efforts of the California Spinal Cord Consortium (CSCC), a multidisciplinary team of experts from 5 University of California medical and research centers, to develop this crucial translational SCI model. We focus on the growing volumes of high resolution data collected by the CSCC, and our efforts to develop a biomedical informatics framework aimed at leveraging multidimensional data to monitor plasticity and repair targeting recovery of hand and arm function. Although the main focus of many researchers is the restoration of voluntary motor control, we also describe our ongoing efforts to add assessments of sensory function, including pain, vital signs during surgery, and recovery of bladder and bowel function. By pooling our multidimensional data resources and building a unified database infrastructure for this clinically relevant translational model of SCI, we are now in a unique position to test promising therapeutic strategies' efficacy on the entire syndrome of SCI. We review analyses highlighting the intersection between motor, sensory, autonomic and pathological contributions to the overall restoration of function. This article is part of a Special Issue entitled SI: Spinal cord injury.
Collapse
Affiliation(s)
- Jessica L Nielson
- Brain and Spinal Injury Center (BASIC), Department of Neurological Surgery, University of California, San Francisco, CA (UCSF), United States
| | - Jenny Haefeli
- Brain and Spinal Injury Center (BASIC), Department of Neurological Surgery, University of California, San Francisco, CA (UCSF), United States
| | - Ernesto A Salegio
- Brain and Spinal Injury Center (BASIC), Department of Neurological Surgery, University of California, San Francisco, CA (UCSF), United States
| | - Aiwen W Liu
- Brain and Spinal Injury Center (BASIC), Department of Neurological Surgery, University of California, San Francisco, CA (UCSF), United States
| | - Cristian F Guandique
- Brain and Spinal Injury Center (BASIC), Department of Neurological Surgery, University of California, San Francisco, CA (UCSF), United States
| | - Ellen D Stück
- Brain and Spinal Injury Center (BASIC), Department of Neurological Surgery, University of California, San Francisco, CA (UCSF), United States
| | - Stephanie Hawbecker
- California National Primate Research Center (CNPRC), University of California, Davis, CA (UCD), United States
| | - Rod Moseanko
- California National Primate Research Center (CNPRC), University of California, Davis, CA (UCD), United States
| | - Sarah C Strand
- California National Primate Research Center (CNPRC), University of California, Davis, CA (UCD), United States
| | - Sharon Zdunowski
- Department of Integrative Biology and Physiology, University of California, Los Angeles, CA (UCLA), United States
| | - John H Brock
- Center for Neural Repair, Department of Neurosciences, University of California, San Diego, La Jolla, CA (UCSD), United States
| | - Roland R Roy
- Department of Integrative Biology and Physiology, University of California, Los Angeles, CA (UCLA), United States
| | - Ephron S Rosenzweig
- Center for Neural Repair, Department of Neurosciences, University of California, San Diego, La Jolla, CA (UCSD), United States
| | - Yvette S Nout-Lomas
- Department of Clinical Sciences, College of Veterinary Medicine and Biomedical Sciences, Colorado State University, United States
| | - Gregoire Courtine
- Center for Neuroprosthetics and Brain Mind Institute, Swiss Federal Institute of Technology (EPFL), United States
| | - Leif A Havton
- Reeve-Irvine Research Center (RIRC), University of California, Irvine, CA (UCI), United States; Departments of Anesthesiology & Perioperative Care, Neurology, and Anatomy & Neurobiology, University of California, Irvine, CA, United States
| | - Oswald Steward
- Reeve-Irvine Research Center (RIRC), University of California, Irvine, CA (UCI), United States; Departments of Anatomy & Neurobiology, Neurobiology & Behavior, and Neurosurgery, University of California, Irvine, CA, United States
| | - V Reggie Edgerton
- Department of Integrative Biology and Physiology, University of California, Los Angeles, CA (UCLA), United States
| | - Mark H Tuszynski
- Departments of Anesthesiology & Perioperative Care, Neurology, and Anatomy & Neurobiology, University of California, Irvine, CA, United States; Veterans Administration Medical Center, La Jolla, CA, United States
| | - Michael S Beattie
- Brain and Spinal Injury Center (BASIC), Department of Neurological Surgery, University of California, San Francisco, CA (UCSF), United States
| | - Jacqueline C Bresnahan
- Brain and Spinal Injury Center (BASIC), Department of Neurological Surgery, University of California, San Francisco, CA (UCSF), United States
| | - Adam R Ferguson
- Brain and Spinal Injury Center (BASIC), Department of Neurological Surgery, University of California, San Francisco, CA (UCSF), United States.
| |
Collapse
|
9
|
Henry VJ, Bandrowski AE, Pepin AS, Gonzalez BJ, Desfeux A. OMICtools: an informative directory for multi-omic data analysis. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau069. [PMID: 25024350 PMCID: PMC4095679 DOI: 10.1093/database/bau069] [Citation(s) in RCA: 136] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Recent advances in ‘omic’ technologies have created unprecedented opportunities for biological research, but current software and database resources are extremely fragmented. OMICtools is a manually curated metadatabase that provides an overview of more than 4400 web-accessible tools related to genomics, transcriptomics, proteomics and metabolomics. All tools have been classified by omic technologies (next-generation sequencing, microarray, mass spectrometry and nuclear magnetic resonance) associated with published evaluations of tool performance. Information about each tool is derived either from a diverse set of developers, the scientific literature or from spontaneous submissions. OMICtools is expected to serve as a useful didactic resource not only for bioinformaticians but also for experimental researchers and clinicians. Database URL:http://omictools.com/
Collapse
Affiliation(s)
- Vincent J Henry
- Haute-Normandie-INSERM ERI-28, Institute for Research and Innovation in Biomedicine of Rouen University, 76183 Rouen, France, Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Dr. La Jolla, CA 92093, USA and STATSARRAY, 76300 Sotteville-lès-Rouen, France
| | - Anita E Bandrowski
- Haute-Normandie-INSERM ERI-28, Institute for Research and Innovation in Biomedicine of Rouen University, 76183 Rouen, France, Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Dr. La Jolla, CA 92093, USA and STATSARRAY, 76300 Sotteville-lès-Rouen, France
| | - Anne-Sophie Pepin
- Haute-Normandie-INSERM ERI-28, Institute for Research and Innovation in Biomedicine of Rouen University, 76183 Rouen, France, Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Dr. La Jolla, CA 92093, USA and STATSARRAY, 76300 Sotteville-lès-Rouen, France
| | - Bruno J Gonzalez
- Haute-Normandie-INSERM ERI-28, Institute for Research and Innovation in Biomedicine of Rouen University, 76183 Rouen, France, Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Dr. La Jolla, CA 92093, USA and STATSARRAY, 76300 Sotteville-lès-Rouen, France
| | - Arnaud Desfeux
- Haute-Normandie-INSERM ERI-28, Institute for Research and Innovation in Biomedicine of Rouen University, 76183 Rouen, France, Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Dr. La Jolla, CA 92093, USA and STATSARRAY, 76300 Sotteville-lès-Rouen, France
| |
Collapse
|
10
|
Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications. J Biomed Semantics 2014; 5:28. [PMID: 26261718 PMCID: PMC4530550 DOI: 10.1186/2041-1480-5-28] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Accepted: 06/16/2014] [Indexed: 11/10/2022] Open
Abstract
Background Scientific publications are documentary representations of defeasible arguments, supported by data and repeatable methods. They are the essential mediating artifacts in the ecosystem of scientific communications. The institutional “goal” of science is publishing results. The linear document publication format, dating from 1665, has survived transition to the Web. Intractable publication volumes; the difficulty of verifying evidence; and observed problems in evidence and citation chains suggest a need for a web-friendly and machine-tractable model of scientific publications. This model should support: digital summarization, evidence examination, challenge, verification and remix, and incremental adoption. Such a model must be capable of expressing a broad spectrum of representational complexity, ranging from minimal to maximal forms. Results The micropublications semantic model of scientific argument and evidence provides these features. Micropublications support natural language statements; data; methods and materials specifications; discussion and commentary; challenge and disagreement; as well as allowing many kinds of statement formalization. The minimal form of a micropublication is a statement with its attribution. The maximal form is a statement with its complete supporting argument, consisting of all relevant evidence, interpretations, discussion and challenges brought forward in support of or opposition to it. Micropublications may be formalized and serialized in multiple ways, including in RDF. They may be added to publications as stand-off metadata. An OWL 2 vocabulary for micropublications is available at http://purl.org/mp. A discussion of this vocabulary along with RDF examples from the case studies, appears as OWL Vocabulary and RDF Examples in Additional file
1. Conclusion Micropublications, because they model evidence and allow qualified, nuanced assertions, can play essential roles in the scientific communications ecosystem in places where simpler, formalized and purely statement-based models, such as the nanopublications model, will not be sufficient. At the same time they will add significant value to, and are intentionally compatible with, statement-based formalizations. We suggest that micropublications, generated by useful software tools supporting such activities as writing, editing, reviewing, and discussion, will be of great value in improving the quality and tractability of biomedical communications.
Collapse
|
11
|
Marenco LN, Wang R, Bandrowski AE, Grethe JS, Shepherd GM, Miller PL. Extending the NIF DISCO framework to automate complex workflow: coordinating the harvest and integration of data from diverse neuroscience information resources. Front Neuroinform 2014; 8:58. [PMID: 25018728 PMCID: PMC4071641 DOI: 10.3389/fninf.2014.00058] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2014] [Accepted: 05/06/2014] [Indexed: 11/15/2022] Open
Abstract
This paper describes how DISCO, the data aggregator that supports the Neuroscience Information Framework (NIF), has been extended to play a central role in automating the complex workflow required to support and coordinate the NIF’s data integration capabilities. The NIF is an NIH Neuroscience Blueprint initiative designed to help researchers access the wealth of data related to the neurosciences available via the Internet. A central component is the NIF Federation, a searchable database that currently contains data from 231 data and information resources regularly harvested, updated, and warehoused in the DISCO system. In the past several years, DISCO has greatly extended its functionality and has evolved to play a central role in automating the complex, ongoing process of harvesting, validating, integrating, and displaying neuroscience data from a growing set of participating resources. This paper provides an overview of DISCO’s current capabilities and discusses a number of the challenges and future directions related to the process of coordinating the integration of neuroscience data within the NIF Federation.
Collapse
Affiliation(s)
- Luis N Marenco
- Center for Medical Informatics, Yale University School of Medicine New Haven, CT, USA ; VA Connecticut Healthcare System, US Department of Veterans Affairs West Haven, CT, USA ; Department of Neurobiology, Yale University School of Medicine New Haven, CT, USA
| | - Rixin Wang
- Center for Medical Informatics, Yale University School of Medicine New Haven, CT, USA
| | - Anita E Bandrowski
- Department of Neurosciences, Center for Research in Biological Systems, University of California at San Diego La Jolla, CA, USA
| | - Jeffrey S Grethe
- Department of Neurosciences, Center for Research in Biological Systems, University of California at San Diego La Jolla, CA, USA
| | - Gordon M Shepherd
- Department of Neurobiology, Yale University School of Medicine New Haven, CT, USA
| | - Perry L Miller
- Center for Medical Informatics, Yale University School of Medicine New Haven, CT, USA ; VA Connecticut Healthcare System, US Department of Veterans Affairs West Haven, CT, USA ; Department of Anesthesiology, Yale University School of Medicine New Haven, CT, USA ; Department of Molecular, Cellular and Developmental Biology, Yale University New Haven, CT, USA
| |
Collapse
|
12
|
Cockfield J, Su K, Robbins KA. MOBBED: a computational data infrastructure for handling large collections of event-rich time series datasets in MATLAB. Front Neuroinform 2013; 7:20. [PMID: 24124417 PMCID: PMC3794442 DOI: 10.3389/fninf.2013.00020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2013] [Accepted: 09/05/2013] [Indexed: 11/21/2022] Open
Abstract
Experiments to monitor human brain activity during active behavior record a variety of modalities (e.g., EEG, eye tracking, motion capture, respiration monitoring) and capture a complex environmental context leading to large, event-rich time series datasets. The considerable variability of responses within and among subjects in more realistic behavioral scenarios requires experiments to assess many more subjects over longer periods of time. This explosion of data requires better computational infrastructure to more systematically explore and process these collections. MOBBED is a lightweight, easy-to-use, extensible toolkit that allows users to incorporate a computational database into their normal MATLAB workflow. Although capable of storing quite general types of annotated data, MOBBED is particularly oriented to multichannel time series such as EEG that have event streams overlaid with sensor data. MOBBED directly supports access to individual events, data frames, and time-stamped feature vectors, allowing users to ask questions such as what types of events or features co-occur under various experimental conditions. A database provides several advantages not available to users who process one dataset at a time from the local file system. In addition to archiving primary data in a central place to save space and avoid inconsistencies, such a database allows users to manage, search, and retrieve events across multiple datasets without reading the entire dataset. The database also provides infrastructure for handling more complex event patterns that include environmental and contextual conditions. The database can also be used as a cache for expensive intermediate results that are reused in such activities as cross-validation of machine learning algorithms. MOBBED is implemented over PostgreSQL, a widely used open source database, and is freely available under the GNU general public license at http://visual.cs.utsa.edu/mobbed. Source and issue reports for MOBBED are maintained at http://vislab.github.com/MobbedMatlab/
Collapse
Affiliation(s)
| | | | - Kay A. Robbins
- Department of Computer Science, University of Texas at San AntonioSan Antonio, TX, USA
| |
Collapse
|
13
|
Vasilevsky NA, Brush MH, Paddock H, Ponting L, Tripathy SJ, Larocca GM, Haendel MA. On the reproducibility of science: unique identification of research resources in the biomedical literature. PeerJ 2013; 1:e148. [PMID: 24032093 PMCID: PMC3771067 DOI: 10.7717/peerj.148] [Citation(s) in RCA: 148] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2013] [Accepted: 08/12/2013] [Indexed: 12/24/2022] Open
Abstract
Scientific reproducibility has been at the forefront of many news stories and there exist numerous initiatives to help address this problem. We posit that a contributor is simply a lack of specificity that is required to enable adequate research reproducibility. In particular, the inability to uniquely identify research resources, such as antibodies and model organisms, makes it difficult or impossible to reproduce experiments even where the science is otherwise sound. In order to better understand the magnitude of this problem, we designed an experiment to ascertain the “identifiability” of research resources in the biomedical literature. We evaluated recent journal articles in the fields of Neuroscience, Developmental Biology, Immunology, Cell and Molecular Biology and General Biology, selected randomly based on a diversity of impact factors for the journals, publishers, and experimental method reporting guidelines. We attempted to uniquely identify model organisms (mouse, rat, zebrafish, worm, fly and yeast), antibodies, knockdown reagents (morpholinos or RNAi), constructs, and cell lines. Specific criteria were developed to determine if a resource was uniquely identifiable, and included examining relevant repositories (such as model organism databases, and the Antibody Registry), as well as vendor sites. The results of this experiment show that 54% of resources are not uniquely identifiable in publications, regardless of domain, journal impact factor, or reporting requirements. For example, in many cases the organism strain in which the experiment was performed or antibody that was used could not be identified. Our results show that identifiability is a serious problem for reproducibility. Based on these results, we provide recommendations to authors, reviewers, journal editors, vendors, and publishers. Scientific efficiency and reproducibility depend upon a research-wide improvement of this substantial problem in science today.
Collapse
Affiliation(s)
- Nicole A Vasilevsky
- Ontology Development Group, Library, Oregon Health & Science University , Portland, OR , USA
| | | | | | | | | | | | | |
Collapse
|
14
|
A survey of the neuroscience resource landscape: perspectives from the neuroscience information framework. INTERNATIONAL REVIEW OF NEUROBIOLOGY 2013. [PMID: 23195120 DOI: 10.1016/b978-0-12-388408-4.00003-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/05/2024]
Abstract
The number of available neuroscience resources (databases, tools, materials, and networks) available via the Web continues to expand, particularly in light of newly implemented data sharing policies required by funding agencies and journals. However, the nature of dense, multifaceted neuroscience data and the design of classic search engine systems make efficient, reliable, and relevant discovery of such resources a significant challenge. This challenge is especially pertinent for online databases, whose dynamic content is largely opaque to contemporary search engines. The Neuroscience Information Framework was initiated to address this problem of finding and utilizing neuroscience-relevant resources. Since its first production release in 2008, NIF has been surveying the resource landscape for the neurosciences, identifying relevant resources and working to make them easily discoverable by the neuroscience community. In this chapter, we provide a survey of the resource landscape for neuroscience: what types of resources are available, how many there are, what they contain, and most importantly, ways in which these resources can be utilized by the research community to advance neuroscience research.
Collapse
|
15
|
Abstract
Databases are, at their core, abstractions of data and their intentionally derived relationships. They serve as a central organizing metaphor and repository, supporting or augmenting nearly all bioinformatics. Behavioral domains provide a unique stage for contemporary databases, as research in this area spans diverse data types, locations, and data relationships. This chapter provides foundational information on the diversity and prevalence of databases, how data structures support the various needs of behavioral neuroscience analysis and interpretation. The focus is on the classes of databases, data curation, and advanced applications in bioinformatics using examples largely drawn from research efforts in behavioral neuroscience.
Collapse
|