1
|
Gomes Moreira D, Jan A. A beginner's guide into curated analyses of open access datasets for biomarker discovery in neurodegeneration. Sci Data 2023; 10:432. [PMID: 37414779 PMCID: PMC10325954 DOI: 10.1038/s41597-023-02338-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 06/27/2023] [Indexed: 07/08/2023] Open
Abstract
The discovery of surrogate biomarkers reflecting neuronal dysfunction in neurodegenerative diseases (NDDs) remains an active area of research. To boost these efforts, we demonstrate the utility of publicly available datasets for probing the pathogenic relevance of candidate markers in NDDs. As a starting point, we introduce the readers to several open access resources, which contain gene expression profiles and proteomics datasets from patient studies in common NDDs, including proteomics analyses of cerebrospinal fluid (CSF). Then, we illustrate the method for curated gene expression analyses across select brain regions from four cohorts of Parkinson disease patients (and from one study in common NDDs), probing glutathione biogenesis, calcium signaling and autophagy. These data are complemented by findings of select markers in CSF-based studies in NDDs. Additionally, we enclose several annotated microarray studies, and summarize reports on CSF proteomics across the NDDs, which the readers can utilize for translational purposes. We anticipate that this "beginner's guide" will benefit the research community in NDDs, and would serve as a useful educational tool.
Collapse
Affiliation(s)
- Diana Gomes Moreira
- Department of Clinical Medicine, Palle Juul-Jensens Boulevard 165, DK-8200, Aarhus N, Denmark
| | - Asad Jan
- Department of Biomedicine, Aarhus University, Høegh-Guldbergs Gade 10, DK-8000, Aarhus C, Denmark.
| |
Collapse
|
2
|
Smith CM, Kadin JA, Baldarelli RM, Beal JS, Blodgett O, Giannatto SC, Richardson JE, Ringwald M. GXD's RNA-Seq and Microarray Experiment Search: using curated metadata to reliably find mouse expression studies of interest. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2020:5756137. [PMID: 32140729 PMCID: PMC7058436 DOI: 10.1093/database/baaa002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 12/09/2019] [Accepted: 01/06/2020] [Indexed: 12/28/2022]
Abstract
The Gene Expression Database (GXD), an extensive community resource of curated expression information for the mouse, has developed an RNA-Seq and Microarray Experiment Search (http://www.informatics.jax.org/gxd/htexp_index). This tool allows users to quickly and reliably find specific experiments in ArrayExpress and the Gene Expression Omnibus (GEO) that study endogenous gene expression in wild-type and mutant mice. Standardized metadata annotations, curated by GXD, allow users to specify the anatomical structure, developmental stage, mutated gene, strain and sex of samples of interest, as well as the study type and key parameters of the experiment. These searches, powered by controlled vocabularies and ontologies, can be combined with free text searching of experiment titles and descriptions. Search result summaries include link-outs to ArrayExpress and GEO, providing easy access to the expression data itself. Links to the PubMed entries for accompanying publications are also included. More information about this tool and GXD can be found at the GXD home page (http://www.informatics.jax.org/expression.shtml). Database URL:http://www.informatics.jax.org/expression.shtml
Collapse
Affiliation(s)
| | - James A Kadin
- The Jackson Laboratory 600 Main Street Bar Harbor, ME 04609, USA
| | | | - Jonathan S Beal
- The Jackson Laboratory 600 Main Street Bar Harbor, ME 04609, USA
| | - Olin Blodgett
- The Jackson Laboratory 600 Main Street Bar Harbor, ME 04609, USA
| | | | | | - Martin Ringwald
- The Jackson Laboratory 600 Main Street Bar Harbor, ME 04609, USA
| |
Collapse
|
3
|
Sara HH, Chowdhury MAB, Haque MA. Multimorbidity among elderly in Bangladesh. Aging Med (Milton) 2018; 1:267-275. [PMID: 31942503 PMCID: PMC6880734 DOI: 10.1002/agm2.12047] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2018] [Accepted: 11/06/2018] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Multimorbidity among the elderly is a major public health problem in most of the developing countries, including Bangladesh, where the population is moving towards aging. Multimorbidity was defined as the co-occurrence of at least two chronic diseases in a person whether as a coincidence or not. Little attention has been paid to the study of the prevalence of multimorbidity among the elderly in Bangladesh. OBJECTIVE The objectives of this study were to estimate the prevalence of multimorbidity among hospitalized elderly. METHODS A cross-sectional study was conducted in two tertiary level hospitals with a sample of 566 adults aged 60 years or more. Data were collected from medical examination reports at the hospital and using a semi-structured interview schedule through an in-person interview. Descriptive statistics were used to measure the prevalence of multimorbidity. RESULTS The overall prevalence of multimorbidity among the elderly was 56.4% and the prevalence was higher among females (64.18%) than males (54.17%). The most prevalent conditions were hypertension (33.0%), diabetes (27.6%), ischemic heart disease (12.0%), and chronic obstructive pulmonary disease (9%). CONCLUSION A high prevalence of multimorbidity suggests that there is an urgent need to develop geriatric health-care services. Policymakers should pay attention to developing effective intervention strategies and programs to reduce the burden of multimorbidity.
Collapse
Affiliation(s)
- Hasna Hena Sara
- Department of Population SciencesUniversity of DhakaDhakaBangladesh
| | | | - Md. Aminul Haque
- Department of Population SciencesUniversity of DhakaDhakaBangladesh
| |
Collapse
|
4
|
Kawalia SB, Raschka T, Naz M, de Matos Simoes R, Senger P, Hofmann-Apitius M. Analytical Strategy to Prioritize Alzheimer's Disease Candidate Genes in Gene Regulatory Networks Using Public Expression Data. J Alzheimers Dis 2018; 59:1237-1254. [PMID: 28800327 PMCID: PMC5611835 DOI: 10.3233/jad-170011] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Alzheimer’s disease (AD) progressively destroys cognitive abilities in the aging population with tremendous effects on memory. Despite recent progress in understanding the underlying mechanisms, high drug attrition rates have put a question mark behind our knowledge about its etiology. Re-evaluation of past studies could help us to elucidate molecular-level details of this disease. Several methods to infer such networks exist, but most of them do not elaborate on context specificity and completeness of the generated networks, missing out on lesser-known candidates. In this study, we present a novel strategy that corroborates common mechanistic patterns across large scale AD gene expression studies and further prioritizes potential biomarker candidates. To infer gene regulatory networks (GRNs), we applied an optimized version of the BC3Net algorithm, named BC3Net10, capable of deriving robust and coherent patterns. In principle, this approach initially leverages the power of literature knowledge to extract AD specific genes for generating viable networks. Our findings suggest that AD GRNs show significant enrichment for key signaling mechanisms involved in neurotransmission. Among the prioritized genes, well-known AD genes were prominent in synaptic transmission, implicated in cognitive deficits. Moreover, less intensive studied AD candidates (STX2, HLA-F, HLA-C, RAB11FIP4, ARAP3, AP2A2, ATP2B4, ITPR2, and ATP2A3) are also involved in neurotransmission, providing new insights into the underlying mechanism. To our knowledge, this is the first study to generate knowledge-instructed GRNs that demonstrates an effective way of combining literature-based knowledge and data-driven analysis to identify lesser known candidates embedded in stable and robust functional patterns across disparate datasets.
Collapse
Affiliation(s)
- Shweta Bagewadi Kawalia
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany.,Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for Information Technology, Bonn, Germany
| | - Tamara Raschka
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany.,University of Applied Sciences Koblenz, RheinAhrCampus, Remagen, Germany
| | - Mufassra Naz
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany.,Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for Information Technology, Bonn, Germany
| | | | - Philipp Senger
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany
| | - Martin Hofmann-Apitius
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany.,Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for Information Technology, Bonn, Germany
| |
Collapse
|
5
|
Chen Q, Zobel J, Verspoor K. Benchmarks for measurement of duplicate detection methods in nucleotide databases. Database (Oxford) 2017; 2023:2870676. [PMID: 28334741 PMCID: PMC10755258 DOI: 10.1093/database/baw164] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Revised: 11/17/2016] [Accepted: 11/21/2016] [Indexed: 01/01/2023]
Abstract
Duplication of information in databases is a major data quality challenge. The presence of duplicates, implying either redundancy or inconsistency, can have a range of impacts on the quality of analyses that use the data. To provide a sound basis for research on this issue in databases of nucleotide sequences, we have developed new, large-scale validated collections of duplicates, which can be used to test the effectiveness of duplicate detection methods. Previous collections were either designed primarily to test efficiency, or contained only a limited number of duplicates of limited kinds. To date, duplicate detection methods have been evaluated on separate, inconsistent benchmarks, leading to results that cannot be compared and, due to limitations of the benchmarks, of questionable generality. In this study, we present three nucleotide sequence database benchmarks, based on information drawn from a range of resources, including information derived from mapping to two data sections within the UniProt Knowledgebase (UniProtKB), UniProtKB/Swiss-Prot and UniProtKB/TrEMBL. Each benchmark has distinct characteristics. We quantify these characteristics and argue for their complementary value in evaluation. The benchmarks collectively contain a vast number of validated biological duplicates; the largest has nearly half a billion duplicate pairs (although this is probably only a tiny fraction of the total that is present). They are also the first benchmarks targeting the primary nucleotide databases. The records include the 21 most heavily studied organisms in molecular biology research. Our quantitative analysis shows that duplicates in the different benchmarks, and in different organisms, have different characteristics. It is thus unreliable to evaluate duplicate detection methods against any single benchmark. For example, the benchmark derived from UniProtKB/Swiss-Prot mappings identifies more diverse types of duplicates, showing the importance of expert curation, but is limited to coding sequences. Overall, these benchmarks form a resource that we believe will be of great value for development and evaluation of the duplicate detection or record linkage methods that are required to help maintain these essential resources. DATABASE URL : https://bitbucket.org/biodbqual/benchmarks.
Collapse
Affiliation(s)
- Qingyu Chen
- Department of Computing and Information Systems, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Justin Zobel
- Department of Computing and Information Systems, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Karin Verspoor
- Department of Computing and Information Systems, The University of Melbourne, Parkville, VIC 3010, Australia
| |
Collapse
|
6
|
Iyappan A, Kawalia SB, Raschka T, Hofmann-Apitius M, Senger P. NeuroRDF: semantic integration of highly curated data to prioritize biomarker candidates in Alzheimer's disease. J Biomed Semantics 2016; 7:45. [PMID: 27392431 PMCID: PMC4939021 DOI: 10.1186/s13326-016-0079-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2015] [Accepted: 05/23/2016] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Neurodegenerative diseases are incurable and debilitating indications with huge social and economic impact, where much is still to be learnt about the underlying molecular events. Mechanistic disease models could offer a knowledge framework to help decipher the complex interactions that occur at molecular and cellular levels. This motivates the need for the development of an approach integrating highly curated and heterogeneous data into a disease model of different regulatory data layers. Although several disease models exist, they often do not consider the quality of underlying data. Moreover, even with the current advancements in semantic web technology, we still do not have cure for complex diseases like Alzheimer's disease. One of the key reasons accountable for this could be the increasing gap between generated data and the derived knowledge. RESULTS In this paper, we describe an approach, called as NeuroRDF, to develop an integrative framework for modeling curated knowledge in the area of complex neurodegenerative diseases. The core of this strategy lies in the usage of well curated and context specific data for integration into one single semantic web-based framework, RDF. This increases the probability of the derived knowledge to be novel and reliable in a specific disease context. This infrastructure integrates highly curated data from databases (Bind, IntAct, etc.), literature (PubMed), and gene expression resources (such as GEO and ArrayExpress). We illustrate the effectiveness of our approach by asking real-world biomedical questions that link these resources to prioritize the plausible biomarker candidates. Among the 13 prioritized candidate genes, we identified MIF to be a potential emerging candidate due to its role as a pro-inflammatory cytokine. We additionally report on the effort and challenges faced during generation of such an indication-specific knowledge base comprising of curated and quality-controlled data. CONCLUSION Although many alternative approaches have been proposed and practiced for modeling diseases, the semantic web technology is a flexible and well established solution for harmonized aggregation. The benefit of this work, to use high quality and context specific data, becomes apparent in speculating previously unattended biomarker candidates around a well-known mechanism, further leveraged for experimental investigations.
Collapse
Affiliation(s)
- Anandhi Iyappan
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754, Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, 53113, Bonn, Germany
| | - Shweta Bagewadi Kawalia
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754, Sankt Augustin, Germany.
- Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, 53113, Bonn, Germany.
| | - Tamara Raschka
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754, Sankt Augustin, Germany
- University of Applied Sciences Koblenz, RheinAhrCampus, Joseph-Rovan-Allee 2, 53424, Remagen, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754, Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, 53113, Bonn, Germany
| | - Philipp Senger
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754, Sankt Augustin, Germany
| |
Collapse
|
7
|
Hofmann-Apitius M, Ball G, Gebel S, Bagewadi S, de Bono B, Schneider R, Page M, Kodamullil AT, Younesi E, Ebeling C, Tegnér J, Canard L. Bioinformatics Mining and Modeling Methods for the Identification of Disease Mechanisms in Neurodegenerative Disorders. Int J Mol Sci 2015; 16:29179-206. [PMID: 26690135 PMCID: PMC4691095 DOI: 10.3390/ijms161226148] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Revised: 11/10/2015] [Accepted: 11/12/2015] [Indexed: 12/22/2022] Open
Abstract
Since the decoding of the Human Genome, techniques from bioinformatics, statistics, and machine learning have been instrumental in uncovering patterns in increasing amounts and types of different data produced by technical profiling technologies applied to clinical samples, animal models, and cellular systems. Yet, progress on unravelling biological mechanisms, causally driving diseases, has been limited, in part due to the inherent complexity of biological systems. Whereas we have witnessed progress in the areas of cancer, cardiovascular and metabolic diseases, the area of neurodegenerative diseases has proved to be very challenging. This is in part because the aetiology of neurodegenerative diseases such as Alzheimer´s disease or Parkinson´s disease is unknown, rendering it very difficult to discern early causal events. Here we describe a panel of bioinformatics and modeling approaches that have recently been developed to identify candidate mechanisms of neurodegenerative diseases based on publicly available data and knowledge. We identify two complementary strategies-data mining techniques using genetic data as a starting point to be further enriched using other data-types, or alternatively to encode prior knowledge about disease mechanisms in a model based framework supporting reasoning and enrichment analysis. Our review illustrates the challenges entailed in integrating heterogeneous, multiscale and multimodal information in the area of neurology in general and neurodegeneration in particular. We conclude, that progress would be accelerated by increasing efforts on performing systematic collection of multiple data-types over time from each individual suffering from neurodegenerative disease. The work presented here has been driven by project AETIONOMY; a project funded in the course of the Innovative Medicines Initiative (IMI); which is a public-private partnership of the European Federation of Pharmaceutical Industry Associations (EFPIA) and the European Commission (EC).
Collapse
Affiliation(s)
- Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Institutszentrum Birlinghoven, Sankt Augustin D-53754, Germany.
- Rheinische Friedrich-Wilhelms-Universitaet Bonn, University of Bonn, Bonn 53113, Germany.
| | - Gordon Ball
- Unit of Computational Medicine, Center for Molecular Medicine, Department of Medicine, and Unit of Clinical Epidemiology, Karolinska University Hospital, Stockholm SE-171 77, Sweden.
- Science for Life Laboratories, Karolinska Institutet, Stockholm SE-171 77, Sweden.
| | - Stephan Gebel
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg.
| | - Shweta Bagewadi
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Institutszentrum Birlinghoven, Sankt Augustin D-53754, Germany.
| | - Bernard de Bono
- Institute of Health Informatics, University College London, London NW1 2DA, UK.
- Auckland Bioengineering Institute, University of Auckland, Symmonds Street, Auckland 1142, New Zealand.
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg.
| | - Matt Page
- Translational Bioinformatics, UCB Pharma, 216 Bath Rd, Slough SL1 3WE, UK.
| | - Alpha Tom Kodamullil
- Rheinische Friedrich-Wilhelms-Universitaet Bonn, University of Bonn, Bonn 53113, Germany.
| | - Erfan Younesi
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Institutszentrum Birlinghoven, Sankt Augustin D-53754, Germany.
| | - Christian Ebeling
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Institutszentrum Birlinghoven, Sankt Augustin D-53754, Germany.
| | - Jesper Tegnér
- Unit of Computational Medicine, Center for Molecular Medicine, Department of Medicine, and Unit of Clinical Epidemiology, Karolinska University Hospital, Stockholm SE-171 77, Sweden.
- Science for Life Laboratories, Karolinska Institutet, Stockholm SE-171 77, Sweden.
| | - Luc Canard
- Translational Science Unit, SANOFI Recherche & Développement, 1 Avenue Pierre Brossolette, Chilly-Mazarin Cedex 91385, France.
| |
Collapse
|