1
|
Bijari K, Zoubi Y, Ascoli GA. Assisted neuroscience knowledge extraction via machine learning applied to neural reconstruction metadata on NeuroMorpho.Org. Brain Inform 2022; 9:26. [PMID: 36344713 PMCID: PMC9640520 DOI: 10.1186/s40708-022-00174-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 10/06/2022] [Indexed: 11/09/2022] Open
Abstract
The amount of unstructured text produced daily in scholarly journals is enormous. Systematically identifying, sorting, and structuring information from such a volume of data is increasingly challenging for researchers even in delimited domains. Named entity recognition is a fundamental natural language processing tool that can be trained to annotate, structure, and extract information from scientific articles. Here, we harness state-of-the-art machine learning techniques and develop a smart neuroscience metadata suggestion system accessible by both humans through a user-friendly graphical interface and machines via Application Programming Interface. We demonstrate a practical application to the public repository of neural reconstructions, NeuroMorpho.Org, thus expanding the existing web-based metadata management system currently in use. Quantitative analysis indicates that the suggestion system reduces personnel labor by at least 50%. Moreover, our results show that larger training datasets with the same software architecture are unlikely to further improve performance without ad-hoc heuristics due to intrinsic ambiguities in neuroscience nomenclature. All components of this project are released open source for community enhancement and extensions to additional applications.
Collapse
Affiliation(s)
- Kayvan Bijari
- College of Science, Neuroscience Program, George Mason University, Fairfax, USA
- Center for Neural Informatics, Structures, & Plasticity, Krasnow Institute for Advanced Study, George Mason University, Fairfax, USA
| | - Yasmeen Zoubi
- College of Science, Neuroscience Program, George Mason University, Fairfax, USA
- Center for Neural Informatics, Structures, & Plasticity, Krasnow Institute for Advanced Study, George Mason University, Fairfax, USA
| | - Giorgio A. Ascoli
- College of Science, Neuroscience Program, George Mason University, Fairfax, USA
- Center for Neural Informatics, Structures, & Plasticity, Krasnow Institute for Advanced Study, George Mason University, Fairfax, USA
- Bioengineering Department, Volgenau School of Engineering, George Mason University, Fairfax, USA
| |
Collapse
|
2
|
The brainstem connectome database. Sci Data 2022; 9:168. [PMID: 35414055 PMCID: PMC9005652 DOI: 10.1038/s41597-022-01219-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 02/25/2022] [Indexed: 11/29/2022] Open
Abstract
Connectivity data of the nervous system and subdivisions, such as the brainstem, cerebral cortex and subcortical nuclei, are necessary to understand connectional structures, predict effects of connectional disorders and simulate network dynamics. For that purpose, a database was built and analyzed which comprises all known directed and weighted connections within the rat brainstem. A longterm metastudy of original research publications describing tract tracing results form the foundation of the brainstem connectome (BC) database which can be analyzed directly in the framework neuroVIISAS. The BC database can be accessed directly by connectivity tables, a web-based tool and the framework. Analysis of global and local network properties, a motif analysis, and a community analysis of the brainstem connectome provides insight into its network organization. For example, we found that BC is a scale-free network with a small-world connectivity. The Louvain modularity and weighted stochastic block matching resulted in partially matching of functions and connectivity. BC modeling was performed to demonstrate signal propagation through the somatosensory pathway which is affected in Multiple sclerosis. Measurement(s) | brainstem | Technology Type(s) | tract tracing metastudy | Factor Type(s) | brain region | Sample Characteristic - Organism | Rattus rattus | Sample Characteristic - Environment | Experimental setup | Sample Characteristic - Location | Germany |
Collapse
|
3
|
Sharma A, Jayakumar J, Mitra PP, Chakraborti S, Kumar PS. Application of Supervised Machine Learning to Extract Brain Connectivity Information from Neuroscience Research Articles. Interdiscip Sci 2021; 13:731-750. [PMID: 34076859 DOI: 10.1007/s12539-021-00443-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 05/15/2021] [Accepted: 05/18/2021] [Indexed: 10/21/2022]
Abstract
Understanding the complex connectivity structure of the brain is a major challenge in neuroscience. Vast and ever-expanding literature about neuronal connectivity between brain regions already exists in published research articles and databases. However, with the ever-expanding increase in published articles and repositories, it becomes difficult for a neuroscientist to engage with the breadth and depth of any given field within neuroscience. Natural Language Processing (NLP) techniques can be used to mine 'Brain Region Connectivity' information from published articles to build a centralized connectivity resource helping neuroscience researchers to gain quick access to research findings. Manually curating and continuously updating such a resource involves significant time and effort. This paper presents an application of supervised machine learning algorithms that perform shallow and deep linguistic analysis of text to automatically extract connectivity between brain region mentions. Our proposed algorithms are evaluated using benchmark datasets collated from PubMed and our own dataset of full text articles annotated by a domain expert. We also present a comparison with state-of-the-art methods including BioBERT. Proposed methods achieve best recall and [Formula: see text] scores negating the need for any domain-specific predefined linguistic patterns. Our paper presents a novel effort towards automatically generating interpretable patterns of connectivity for extracting connected brain region mentions from text and can be expanded to include any other domain-specific information.
Collapse
Affiliation(s)
- Ashika Sharma
- Center for Artificial Intelligence and Robotics, DRDO Complex, Bangalore, 560093, India. .,Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, 600036, India.
| | | | - Partha P Mitra
- Cold Spring Harbour Laboratory, Cold Spring Harbour, New York, 11724, USA
| | - Sutanu Chakraborti
- Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, 600036, India
| | - P Sreenivasa Kumar
- Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, 600036, India
| |
Collapse
|
4
|
Azer K, Kaddi CD, Barrett JS, Bai JPF, McQuade ST, Merrill NJ, Piccoli B, Neves-Zaph S, Marchetti L, Lombardo R, Parolo S, Immanuel SRC, Baliga NS. History and Future Perspectives on the Discipline of Quantitative Systems Pharmacology Modeling and Its Applications. Front Physiol 2021; 12:637999. [PMID: 33841175 PMCID: PMC8027332 DOI: 10.3389/fphys.2021.637999] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 01/25/2021] [Indexed: 12/24/2022] Open
Abstract
Mathematical biology and pharmacology models have a long and rich history in the fields of medicine and physiology, impacting our understanding of disease mechanisms and the development of novel therapeutics. With an increased focus on the pharmacology application of system models and the advances in data science spanning mechanistic and empirical approaches, there is a significant opportunity and promise to leverage these advancements to enhance the development and application of the systems pharmacology field. In this paper, we will review milestones in the evolution of mathematical biology and pharmacology models, highlight some of the gaps and challenges in developing and applying systems pharmacology models, and provide a vision for an integrated strategy that leverages advances in adjacent fields to overcome these challenges.
Collapse
Affiliation(s)
- Karim Azer
- Quantitative Sciences, Bill and Melinda Gates Medical Research Institute, Cambridge, MA, United States
| | - Chanchala D. Kaddi
- Quantitative Sciences, Bill and Melinda Gates Medical Research Institute, Cambridge, MA, United States
| | | | - Jane P. F. Bai
- Office of Clinical Pharmacology, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD, United States
| | - Sean T. McQuade
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, United States
| | - Nathaniel J. Merrill
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, United States
| | - Benedetto Piccoli
- Department of Mathematical Sciences and Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, United States
| | - Susana Neves-Zaph
- Translational Disease Modeling, Data and Data Science, Sanofi, Bridgewater, NJ, United States
| | - Luca Marchetti
- Fondazione the Microsoft Research – University of Trento Centre for Computational and Systems Biology (COSBI), Rovereto, Italy
| | - Rosario Lombardo
- Fondazione the Microsoft Research – University of Trento Centre for Computational and Systems Biology (COSBI), Rovereto, Italy
| | - Silvia Parolo
- Fondazione the Microsoft Research – University of Trento Centre for Computational and Systems Biology (COSBI), Rovereto, Italy
| | | | | |
Collapse
|
5
|
ConnExt-BioBERT: Leveraging Transfer Learning for Brain-Connectivity Extraction from Neuroscience Articles. Brain Inform 2021. [DOI: 10.1007/978-3-030-86993-9_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
6
|
Bjerke IE, Puchades MA, Bjaalie JG, Leergaard TB. Database of literature derived cellular measurements from the murine basal ganglia. Sci Data 2020; 7:211. [PMID: 32632099 PMCID: PMC7338524 DOI: 10.1038/s41597-020-0550-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 06/04/2020] [Indexed: 11/09/2022] Open
Abstract
Quantitative measurements and descriptive statistics of different cellular elements in the brain are typically published in journal articles as text, tables, and example figures, and represent an important basis for the creation of biologically constrained computational models, design of intervention studies, and comparison of subject groups. Such data can be challenging to extract from publications and difficult to normalise and compare across studies, and few studies have so far attempted to integrate quantitative information available in journal articles. We here present a database of quantitative information about cellular parameters in the frequently studied murine basal ganglia. The database holds a curated and normalised selection of currently available data collected from the literature and public repositories, providing the most comprehensive collection of quantitative neuroanatomical data from the basal ganglia to date. The database is shared as a downloadable resource from the EBRAINS Knowledge Graph (https://kg.ebrains.eu), together with a workflow that allows interested researchers to update and expand the database with data from future reports.
Collapse
Affiliation(s)
- Ingvild E Bjerke
- Department of Molecular Medicine, Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway
| | - Maja A Puchades
- Department of Molecular Medicine, Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway
| | - Jan G Bjaalie
- Department of Molecular Medicine, Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway
| | - Trygve B Leergaard
- Department of Molecular Medicine, Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway.
| |
Collapse
|
7
|
DES-ROD: Exploring Literature to Develop New Links between RNA Oxidation and Human Diseases. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2020; 2020:5904315. [PMID: 32308806 PMCID: PMC7142358 DOI: 10.1155/2020/5904315] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 02/21/2020] [Indexed: 12/27/2022]
Abstract
Normal cellular physiology and biochemical processes require undamaged RNA molecules. However, RNAs are frequently subjected to oxidative damage. Overproduction of reactive oxygen species (ROS) leads to RNA oxidation and disturbs redox (oxidation-reduction reaction) homeostasis. When oxidation damage affects RNA carrying protein-coding information, this may result in the synthesis of aberrant proteins as well as a lower efficiency of translation. Both of these, as well as imbalanced redox homeostasis, may lead to numerous human diseases. The number of studies on the effects of RNA oxidative damage in mammals is increasing by year due to the understanding that this oxidation fundamentally leads to numerous human diseases. To enable researchers in this field to explore information relevant to RNA oxidation and effects on human diseases, we developed DES-ROD, an online knowledgebase that contains processed information from 298,603 relevant documents that consist of PubMed abstracts and PubMed Central full-text articles. The system utilizes concepts/terms from 38 curated thematic dictionaries mapped to the analyzed documents. Researchers can explore enriched concepts, as well as enriched pairs of putatively associated concepts. In this way, one can explore mutual relationships between any combinations of two concepts from used dictionaries. Dictionaries cover a wide range of biomedical topics, such as human genes and proteins, pathways, Gene Ontology categories, mutations, noncoding RNAs, enzymes, toxins, metabolites, and diseases. This makes insights into different facets of the effects of RNA oxidation and the control of this process possible. The usefulness of the DES-ROD system is demonstrated by case studies on some known information, as well as potentially novel information involving RNA oxidation and diseases. DES-ROD is the first knowledgebase based on text and data mining that focused on the exploration of RNA oxidation and human diseases.
Collapse
|
8
|
Schwanke S, Jenssen J, Eipert P, Schmitt O. Towards Differential Connectomics with NeuroVIISAS. Neuroinformatics 2019; 17:163-179. [PMID: 30014279 DOI: 10.1007/s12021-018-9389-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
The comparison of connectomes is an essential step to identify changes in structural and functional neuronal networks. However, the connectomes themselves as well as the comparisons of connectomes could be manifold. In most applications, comparisons of connectomes are applied to specific sets of data. In many studies collections of scripts are applied optimized for certain species (non-generic approaches) or diseases (control versus disease group connectomes). These collections of scripts have a limited functionality which do not support functional and topographic mappings of connectomes (hemispherical asymmetries, peripheral nervous system). The platform-independent and generic neuroVIISAS framework is built to circumvent limitations that come with variants of nomenclatures, connectivity lists and connectional hierarchies as well as restrictions to structural connectome analyses. A new analytical module is introduced into the framework to compare different types of connectomes and different representations of the same connectome within a unique software environment. As an example a differential analysis of the partial connectome of the laboratory rat that is based on virus tract tracing with the same regions of non-virus tract tracing has been performed. A relatively large connectional coherence between the two different techniques was found. However, some detected connections are described by virus tract-tracing only.
Collapse
Affiliation(s)
- Sebastian Schwanke
- Department of Anatomy, University of Rostock, Gertrudenstr. 9, 18057, Rostock, Germany
| | - Jörg Jenssen
- Department of Anatomy, University of Rostock, Gertrudenstr. 9, 18057, Rostock, Germany
| | - Peter Eipert
- Department of Anatomy, University of Rostock, Gertrudenstr. 9, 18057, Rostock, Germany
| | - Oliver Schmitt
- Department of Anatomy, University of Rostock, Gertrudenstr. 9, 18057, Rostock, Germany.
| |
Collapse
|
9
|
Essack M, Salhi A, Stanimirovic J, Tifratene F, Bin Raies A, Hungler A, Uludag M, Van Neste C, Trpkovic A, Bajic VP, Bajic VB, Isenovic ER. Literature-Based Enrichment Insights into Redox Control of Vascular Biology. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2019; 2019:1769437. [PMID: 31223421 PMCID: PMC6542245 DOI: 10.1155/2019/1769437] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 04/11/2019] [Accepted: 05/02/2019] [Indexed: 02/07/2023]
Abstract
In cellular physiology and signaling, reactive oxygen species (ROS) play one of the most critical roles. ROS overproduction leads to cellular oxidative stress. This may lead to an irrecoverable imbalance of redox (oxidation-reduction reaction) function that deregulates redox homeostasis, which itself could lead to several diseases including neurodegenerative disease, cardiovascular disease, and cancers. In this study, we focus on the redox effects related to vascular systems in mammals. To support research in this domain, we developed an online knowledge base, DES-RedoxVasc, which enables exploration of information contained in the biomedical scientific literature. The DES-RedoxVasc system analyzed 233399 documents consisting of PubMed abstracts and PubMed Central full-text articles related to different aspects of redox biology in vascular systems. It allows researchers to explore enriched concepts from 28 curated thematic dictionaries, as well as literature-derived potential associations of pairs of such enriched concepts, where associations themselves are statistically enriched. For example, the system allows exploration of associations of pathways, diseases, mutations, genes/proteins, miRNAs, long ncRNAs, toxins, drugs, biological processes, molecular functions, etc. that allow for insights about different aspects of redox effects and control of processes related to the vascular system. Moreover, we deliver case studies about some existing or possibly novel knowledge regarding redox of vascular biology demonstrating the usefulness of DES-RedoxVasc. DES-RedoxVasc is the first compiled knowledge base using text mining for the exploration of this topic.
Collapse
Affiliation(s)
- Magbubah Essack
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Adil Salhi
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Julijana Stanimirovic
- Vinca Institute, University of Belgrade, Laboratory for Molecular Endocrinology and Radiobiology, Belgrade, Serbia
| | - Faroug Tifratene
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Arwa Bin Raies
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Arnaud Hungler
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Mahmut Uludag
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Christophe Van Neste
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Andreja Trpkovic
- Vinca Institute, University of Belgrade, Laboratory for Molecular Endocrinology and Radiobiology, Belgrade, Serbia
| | - Vladan P. Bajic
- Vinca Institute, University of Belgrade, Laboratory for Molecular Endocrinology and Radiobiology, Belgrade, Serbia
| | - Vladimir B. Bajic
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Esma R. Isenovic
- Vinca Institute, University of Belgrade, Laboratory for Molecular Endocrinology and Radiobiology, Belgrade, Serbia
| |
Collapse
|
10
|
Meng G, Huang Y, Yu Q, Ding Y, Wild D, Zhao Y, Liu X, Song M. Adopting Text Mining on Rehabilitation Therapy Repositioning for Stroke. Front Neuroinform 2019; 13:17. [PMID: 30941028 PMCID: PMC6433708 DOI: 10.3389/fninf.2019.00017] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2018] [Accepted: 03/05/2019] [Indexed: 12/30/2022] Open
Abstract
Stroke is a common disabling disease that severely affects the daily life of patients. Accumulating evidence indicates that rehabilitation therapy can improve movement function. However, no clear guidelines have specific and effective rehabilitation therapy schemes, and the development of new rehabilitation techniques has been relatively slow. This study used a text mining approach, the ABC model, to identify an existing rehabilitation candidate therapy method that is most likely to be repositioned for stroke. In the model, we built the internal links of stroke (A), assessment scales (B), and rehabilitation therapies (C) in PubMed and the links were related to upper limb function measurements for patients with stroke. In the first step, using E-utility, we retrieved both stroke-related assessment scales and rehabilitation therapy records and then compiled two datasets, which were called Stroke_Scales and Stroke_Therapies, respectively. In the next step, we crawled all rehabilitation therapies co-occurring with the Stroke_Therapies and then named them as All_Therapies. Therapies that were already included in Stroke_Therapies were deleted from All_Therapies; therefore, the remaining therapies were the potential rehabilitation therapies, which could be repositioned for stroke after subsequent filtration by a manual check. We identified the top-ranked repositioning rehabilitation therapy and subsequently examined its clinical validation. Hand-arm bimanual intensive training (HABIT) was ranked the first in our repositioning rehabilitation therapies and had the most interaction links with Stroke_Scales. HABIT significantly improved clinical scores on assessment scales [Fugl-Meyer Assessment (FMA) and action research arm test (ARAT)] in the clinical validation study for acute stroke patients with upper limb dysfunction. Therefore, based on the ABC model and clinical validation, HABIT is a promising repositioned rehabilitation therapy for stroke, and the ABC model is an effective text mining approach for rehabilitation therapy repositioning. The findings in this study would be helpful in clinical knowledge discovery.
Collapse
Affiliation(s)
- Guilin Meng
- Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, China.,School of Informatics Computing and Engineering, Indiana University, Bloomington, IN, United States
| | - Yong Huang
- School of Informatics Computing and Engineering, Indiana University, Bloomington, IN, United States.,School of Information Management, Wuhan University, Wuhan, China
| | - Qi Yu
- School of Management, Shanxi Medical University, Shanxi, China
| | - Ying Ding
- School of Informatics Computing and Engineering, Indiana University, Bloomington, IN, United States
| | - David Wild
- School of Informatics Computing and Engineering, Indiana University, Bloomington, IN, United States
| | - Yanxin Zhao
- Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Xueyuan Liu
- Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Min Song
- School of Informatics, Yonsei University, Seoul, South Korea
| |
Collapse
|
11
|
Automated Metadata Suggestion During Repository Submission. Neuroinformatics 2018; 17:361-371. [PMID: 30382537 DOI: 10.1007/s12021-018-9403-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Knowledge discovery via an informatics resource is constrained by the completeness of the resource, both in terms of the amount of data it contains and in terms of the metadata that exists to describe the data. Increasing completeness in one of these categories risks reducing completeness in the other because manually curating metadata is time consuming and is restricted by familiarity with both the data and the metadata annotation scheme. The diverse interests of a research community may drive a resource to have hundreds of metadata tags with few examples for each making it challenging for humans or machine learning algorithms to learn how to assign metadata tags properly. We demonstrate with ModelDB, a computational neuroscience model discovery resource, that using manually-curated regular-expression based rules can overcome this challenge by parsing existing texts from data providers during user data entry to suggest metadata annotations and prompt them to suggest other related metadata annotations rather than leaving the task to a curator. In the ModelDB implementation, analyzing the abstract identified 6.4 metadata tags per abstract at 79% precision. Using the full-text produced higher recall with low precision (41%), and the title alone produced few (1.3) metadata annotations per entry; we thus recommend data providers use their abstract during upload. Grouping the possible metadata annotations into categories (e.g. cell type, biological topic) revealed that precision and recall for the different text sources varies by category. Given this proof-of-concept, other bioinformatics resources can likewise improve the quality of their metadata by adopting our approach of prompting data uploaders with relevant metadata at the minimal cost of formalizing rules for each potential metadata annotation.
Collapse
|
12
|
O'Reilly C, Iavarone E, Hill SL. A Framework for Collaborative Curation of Neuroscientific Literature. Front Neuroinform 2017; 11:27. [PMID: 28469570 PMCID: PMC5395614 DOI: 10.3389/fninf.2017.00027] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2016] [Accepted: 03/29/2017] [Indexed: 11/13/2022] Open
Abstract
Large models of complex neuronal circuits require specifying numerous parameters, with values that often need to be extracted from the literature, a tedious and error-prone process. To help establishing shareable curated corpora of annotations, we have developed a literature curation framework comprising an annotation format, a Python API (NeuroAnnotation Toolbox; NAT), and a user-friendly graphical interface (NeuroCurator). This framework allows the systematic annotation of relevant statements and model parameters. The context of the annotated content is made explicit in a standard way by associating it with ontological terms (e.g., species, cell types, brain regions). The exact position of the annotated content within a document is specified by the starting character of the annotated text, or the number of the figure, the equation, or the table, depending on the context. Alternatively, the provenance of parameters can also be specified by bounding boxes. Parameter types are linked to curated experimental values so that they can be systematically integrated into models. We demonstrate the use of this approach by releasing a corpus describing different modeling parameters associated with thalamo-cortical circuitry. The proposed framework supports a rigorous management of large sets of parameters, solving common difficulties in their traceability. Further, it allows easier classification of literature information and more efficient and systematic integration of such information into models and analyses.
Collapse
Affiliation(s)
- Christian O'Reilly
- Blue Brain Project, École Polytechnique Fédérale de LausanneGeneva, Switzerland
| | - Elisabetta Iavarone
- Blue Brain Project, École Polytechnique Fédérale de LausanneGeneva, Switzerland
| | - Sean L Hill
- Blue Brain Project, École Polytechnique Fédérale de LausanneGeneva, Switzerland
| |
Collapse
|
13
|
Gökdeniz E, Özgür A, Canbeyli R. Automated Neuroanatomical Relation Extraction: A Linguistically Motivated Approach with a PVT Connectivity Graph Case Study. Front Neuroinform 2016; 10:39. [PMID: 27708573 PMCID: PMC5030238 DOI: 10.3389/fninf.2016.00039] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2016] [Accepted: 08/23/2016] [Indexed: 11/13/2022] Open
Abstract
Identifying the relations among different regions of the brain is vital for a better understanding of how the brain functions. While a large number of studies have investigated the neuroanatomical and neurochemical connections among brain structures, their specific findings are found in publications scattered over a large number of years and different types of publications. Text mining techniques have provided the means to extract specific types of information from a large number of publications with the aim of presenting a larger, if not necessarily an exhaustive picture. By using natural language processing techniques, the present paper aims to identify connectivity relations among brain regions in general and relations relevant to the paraventricular nucleus of the thalamus (PVT) in particular. We introduce a linguistically motivated approach based on patterns defined over the constituency and dependency parse trees of sentences. Besides the presence of a relation between a pair of brain regions, the proposed method also identifies the directionality of the relation, which enables the creation and analysis of a directional brain region connectivity graph. The approach is evaluated over the manually annotated data sets of the WhiteText Project. In addition, as a case study, the method is applied to extract and analyze the connectivity graph of PVT, which is an important brain region that is considered to influence many functions ranging from arousal, motivation, and drug-seeking behavior to attention. The results of the PVT connectivity graph show that PVT may be a new target of research in mood assessment.
Collapse
Affiliation(s)
- Erinç Gökdeniz
- Department of Computer Engineering, Boğaziçi University İstanbul, Turkey
| | - Arzucan Özgür
- Department of Computer Engineering, Boğaziçi University İstanbul, Turkey
| | - Reşit Canbeyli
- Department of Psychology, Boğaziçi University İstanbul, Turkey
| |
Collapse
|