1
|
Liu Y, Yang C. Computational methods for alignment and integration of spatially resolved transcriptomics data. Comput Struct Biotechnol J 2024; 23:1094-1105. [PMID: 38495555 PMCID: PMC10940867 DOI: 10.1016/j.csbj.2024.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Revised: 03/02/2024] [Accepted: 03/04/2024] [Indexed: 03/19/2024] Open
Abstract
Most of the complex biological regulatory activities occur in three dimensions (3D). To better analyze biological processes, it is essential not only to decipher the molecular information of numerous cells but also to understand how their spatial contexts influence their behavior. With the development of spatially resolved transcriptomics (SRT) technologies, SRT datasets are being generated to simultaneously characterize gene expression and spatial arrangement information within tissues, organs or organisms. To fully leverage spatial information, the focus extends beyond individual two-dimensional (2D) slices. Two tasks known as slices alignment and data integration have been introduced to establish correlations between multiple slices, enhancing the effectiveness of downstream tasks. Currently, numerous related methods have been developed. In this review, we first elucidate the details and principles behind several representative methods. Then we report the testing results of these methods on various SRT datasets, and assess their performance in representative downstream tasks. Insights into the strengths and weaknesses of each method and the reasons behind their performance are discussed. Finally, we provide an outlook on future developments. The codes and details of experiments are now publicly available at https://github.com/YangLabHKUST/SRT_alignment_and_integration.
Collapse
Affiliation(s)
- Yuyao Liu
- Department of Automation, School of Information Science and Technology, Tsinghua University, Beijing, China
| | - Can Yang
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong, China
| |
Collapse
|
2
|
Safaei M, Goodarzi A, Abpeikar Z, Farmani AR, Kouhpayeh SA, Najafipour S, Jafari Najaf Abadi MH. Determination of key hub genes in Leishmaniasis as potential factors in diagnosis and treatment based on a bioinformatics study. Sci Rep 2024; 14:22537. [PMID: 39342024 PMCID: PMC11438978 DOI: 10.1038/s41598-024-73779-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 09/20/2024] [Indexed: 10/01/2024] Open
Abstract
Leishmaniasis is an infectious disease caused by protozoan parasites from different species of leishmania. The disease is transmitted by female sandflies that carry these parasites. In this study, datasets on leishmaniasis published in the GEO database were analyzed and summarized. The analysis in all three datasets (GSE43880, GSE55664, and GSE63931) used in this study has been performed on the skin wounds of patients infected with a clinical form of leishmania (Leishmania braziliensis), and biopsies have been taken from them. To identify differentially expressed genes (DEGs) between leishmaniasis patients and controls, the robust rank aggregation (RRA) procedure was applied. We performed gene functional annotation and protein-protein interaction (PPI) network analysis to demonstrate the putative functionalities of the DEGs. The study utilized Molecular Complex Detection (MCODE), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) to detect molecular complexes within the protein-protein interaction (PPI) network and conduct analyses on the identified functional modules. The CytoHubba plugin's results were paired with RRA analysis to determine the hub genes. Finally, the interaction between miRNAs and hub genes was predicted. Based on the RRA integrated analysis, 407 DEGs were identified (263 up-regulated genes and 144 down-regulated genes). The top three modules were listed after creating the PPI network via the MCODE plug. Seven hub genes were found using the CytoHubba app and RRA: CXCL10, GBP1, GNLY, GZMA, GZMB, NKG7, and UBD. According to our enrichment analysis, these functional modules were primarily associated with immune pathways, cytokine activity/signaling pathways, and inflammation pathways. However, a UBD hub gene is interestingly involved in the ubiquitination pathways of pathogenesis. The mirNet database predicted the hub gene's interaction with miRNAs, and results revealed that several miRNAs, including mir-146a-5p, crucial in fighting pathogenesis. The key hub genes discovered in this work may be considered as potential biomarkers in diagnosis, development of agonists/antagonist, novel vaccine design, and will greatly contribute to clinical studies in the future.
Collapse
Affiliation(s)
- Mohsen Safaei
- Department of Tissue Engineering, School of Advanced Technologies in Medicine, Fasa University of Medical Sciences, Fasa, Iran
| | - Arash Goodarzi
- Department of Tissue Engineering, School of Advanced Technologies in Medicine, Fasa University of Medical Sciences, Fasa, Iran
| | - Zahra Abpeikar
- Department of Tissue Engineering, School of Advanced Technologies in Medicine, Fasa University of Medical Sciences, Fasa, Iran.
| | - Ahmad Reza Farmani
- Department of Tissue Engineering, School of Advanced Technologies in Medicine, Fasa University of Medical Sciences, Fasa, Iran
| | - Seyed Amin Kouhpayeh
- Department of Pharmacology, School of Medicine, Fasa University of Medical Sciences, Fasa, Iran
| | - Sohrab Najafipour
- Department of Microbiology, Faculty of Medicine, Fasa University of Medical Sciences, Fasa, Iran
| | - Mohammad Hassan Jafari Najaf Abadi
- Department of Medical Biotechnology, School of Medicine, Shahid Sadoughi University of Medical Sciences and Health Services, Yazd, Iran.
- Research Center for Health Technology Assessment and Medical Informatics, School of Public Health, Shahid Sadoughi University of Medical Sciences, Yazd, Iran.
| |
Collapse
|
3
|
Son A, Kim W, Park J, Lee W, Lee Y, Choi S, Kim H. Utilizing Molecular Dynamics Simulations, Machine Learning, Cryo-EM, and NMR Spectroscopy to Predict and Validate Protein Dynamics. Int J Mol Sci 2024; 25:9725. [PMID: 39273672 PMCID: PMC11395565 DOI: 10.3390/ijms25179725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 09/06/2024] [Accepted: 09/07/2024] [Indexed: 09/15/2024] Open
Abstract
Protein dynamics play a crucial role in biological function, encompassing motions ranging from atomic vibrations to large-scale conformational changes. Recent advancements in experimental techniques, computational methods, and artificial intelligence have revolutionized our understanding of protein dynamics. Nuclear magnetic resonance spectroscopy provides atomic-resolution insights, while molecular dynamics simulations offer detailed trajectories of protein motions. Computational methods applied to X-ray crystallography and cryo-electron microscopy (cryo-EM) have enabled the exploration of protein dynamics, capturing conformational ensembles that were previously unattainable. The integration of machine learning, exemplified by AlphaFold2, has accelerated structure prediction and dynamics analysis. These approaches have revealed the importance of protein dynamics in allosteric regulation, enzyme catalysis, and intrinsically disordered proteins. The shift towards ensemble representations of protein structures and the application of single-molecule techniques have further enhanced our ability to capture the dynamic nature of proteins. Understanding protein dynamics is essential for elucidating biological mechanisms, designing drugs, and developing novel biocatalysts, marking a significant paradigm shift in structural biology and drug discovery.
Collapse
Affiliation(s)
- Ahrum Son
- Department of Molecular Medicine, Scripps Research, San Diego, CA 92037, USA
| | - Woojin Kim
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea
| | - Jongham Park
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea
| | - Wonseok Lee
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea
| | - Yerim Lee
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea
| | - Seongyun Choi
- Department of Convergent Bioscience and Informatics, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea
| | - Hyunsoo Kim
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea
- Department of Convergent Bioscience and Informatics, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea
- Protein AI Design Institute, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea
- SCICS, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea
| |
Collapse
|
4
|
Vázquez-Borsetti P. Variability in rat weight gain during development. Lab Anim 2024:236772241246370. [PMID: 39157979 DOI: 10.1177/00236772241246370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/20/2024]
Abstract
The rat is one of the most employed animal models in biomedicine. Traditionally, weight gain has been utilized to gauge development and compare across species. Numerous studies have conducted longitudinal analyses of rat development, with emphasis on weight gain analysis. Given the high variability in these patterns, experimental data from a single laboratory may not be reliable for generalized estimation. This study aimed to analyze the effect of different factors on the pattern of weight gain during rat development. A literature survey was conducted to compile a database comprising nearly 300 data points of age and weight from 15 longitudinal studies. The database comprised both pre- and postnatal data. Utilizing the Gompertz equation, the data was analyzed to formulate a comprehensive model describing rat development. Differences in growth patterns became increasingly evident at later developmental stages, when significant differences in the maximum asymptote between sexes and strains were reached.
Collapse
Affiliation(s)
- Pablo Vázquez-Borsetti
- Instituto de biología Celular y Neurociencias (IBCN) - Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| |
Collapse
|
5
|
Cunha-Oliveira T, Ioannidis JPA, Oliveira PJ. Best practices for data management and sharing in experimental biomedical research. Physiol Rev 2024; 104:1387-1408. [PMID: 38451234 PMCID: PMC11380994 DOI: 10.1152/physrev.00043.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 02/07/2024] [Accepted: 02/29/2024] [Indexed: 03/08/2024] Open
Abstract
Effective data management is crucial for scientific integrity and reproducibility, a cornerstone of scientific progress. Well-organized and well-documented data enable validation and building on results. Data management encompasses activities including organization, documentation, storage, sharing, and preservation. Robust data management establishes credibility, fostering trust within the scientific community and benefiting researchers' careers. In experimental biomedicine, comprehensive data management is vital due to the typically intricate protocols, extensive metadata, and large datasets. Low-throughput experiments, in particular, require careful management to address variations and errors in protocols and raw data quality. Transparent and accountable research practices rely on accurate documentation of procedures, data collection, and analysis methods. Proper data management ensures long-term preservation and accessibility of valuable datasets. Well-managed data can be revisited, contributing to cumulative knowledge and potential new discoveries. Publicly funded research has an added responsibility for transparency, resource allocation, and avoiding redundancy. Meeting funding agency expectations increasingly requires rigorous methodologies, adherence to standards, comprehensive documentation, and widespread sharing of data, code, and other auxiliary resources. This review provides critical insights into raw and processed data, metadata, high-throughput versus low-throughput datasets, a common language for documentation, experimental and reporting guidelines, efficient data management systems, sharing practices, and relevant repositories. We systematically present available resources and optimal practices for wide use by experimental biomedical researchers.
Collapse
Affiliation(s)
- Teresa Cunha-Oliveira
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
- Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
| | - John P A Ioannidis
- Meta-Research Innovation Center at Stanford (METRICS), Stanford, California, United States
- Department of Statistics, Stanford University, Stanford, California, United States
| | - Paulo J Oliveira
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
- Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
6
|
Zhang J, Jiang Q, Du Z, Geng Y, Hu Y, Tong Q, Song Y, Zhang HY, Yan X, Feng Z. Knowledge graph-derived feed efficiency analysis via pig gut microbiota. Sci Rep 2024; 14:13939. [PMID: 38886444 PMCID: PMC11182767 DOI: 10.1038/s41598-024-64835-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 06/13/2024] [Indexed: 06/20/2024] Open
Abstract
Feed efficiency (FE) is essential for pig production, has been reported to be partially explained by gut microbiota. Despite an extensive body of research literature to this topic, studies regarding the regulation of feed efficiency by gut microbiota remain fragmented and mostly confined to disorganized or semi-structured unrestricted texts. Meanwhile, structured databases for microbiota analysis are available, yet they often lack a comprehensive understanding of the associated biological processes. Therefore, we have devised an approach to construct a comprehensive knowledge graph by combining unstructured textual intelligence with structured database information and applied it to investigate the relationship between pig gut microbes and FE. Firstly, we created the pgmReading knowledge base and the domain ontology of pig gut microbiota by annotating, extracting, and integrating semantic information from 157 scientific publications. Secondly, we created the pgmPubtator by utilizing PubTator to expand the semantic information related to microbiota. Thirdly, we created the pgmDatabase by mapping and combining the ADDAGMA, gutMGene, and KEGG databases based on the ontology. These three knowledge bases were integrated to form the Pig Gut Microbial Knowledge Graph (PGMKG). Additionally, we created five biological query cases to validate the performance of PGMKG. These cases not only allow us to identify microbes with the most significant impact on FE but also provide insights into the metabolites produced by these microbes and the associated metabolic pathways. This study introduces PGMKG, mapping key microbes in pig feed efficiency and guiding microbiota-targeted optimization.
Collapse
Affiliation(s)
- Junmei Zhang
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Qin Jiang
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
- Yazhouwan National Laboratory (YNL), Sanya, 572025, China
| | - Zhihong Du
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yilin Geng
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yuren Hu
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Qichang Tong
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yunfeng Song
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hong-Yu Zhang
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xianghua Yan
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Zaiwen Feng
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
7
|
Cho MH, Cho KH, No KT. PhyloSophos: a high-throughput scientific name mapping algorithm augmented with explicit consideration of taxonomic science, and its application on natural product (NP) occurrence database processing. BMC Bioinformatics 2023; 24:475. [PMID: 38097955 PMCID: PMC10722791 DOI: 10.1186/s12859-023-05588-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 11/29/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND The standardization of biological data using unique identifiers is vital for seamless data integration, comprehensive interpretation, and reproducibility of research findings, contributing to advancements in bioinformatics and systems biology. Despite being widely accepted as a universal identifier, scientific names for biological species have inherent limitations, including lack of stability, uniqueness, and convertibility, hindering their effective use as identifiers in databases, particularly in natural product (NP) occurrence databases, posing a substantial obstacle to utilizing this valuable data for large-scale research applications. RESULT To address these challenges and facilitate high-throughput analysis of biological data involving scientific names, we developed PhyloSophos, a Python package that considers the properties of scientific names and taxonomic systems to accurately map name inputs to entries within a chosen reference database. We illustrate the importance of assessing multiple taxonomic databases and considering taxonomic syntax-based pre-processing using NP occurrence databases as an example, with the ultimate goal of integrating heterogeneous information into a single, unified dataset. CONCLUSIONS We anticipate PhyloSophos to significantly aid in the systematic processing of poorly digitized and curated biological data, such as biodiversity information and ethnopharmacological resources, enabling full-scale bioinformatics analysis using these valuable data resources.
Collapse
Affiliation(s)
- Min Hyung Cho
- Bioinformatics and Molecular Design Research Center (BMDRC), 209, Veritas A Hall, Yonsei University, 85 Songdogwahak-ro, Yeonsu-gu, Incheon, 21983, Republic of Korea.
| | - Kwang-Hwi Cho
- School of Systems Biomedical Science, Soongsil University, Seoul, 06978, South Korea
| | - Kyoung Tai No
- Bioinformatics and Molecular Design Research Center (BMDRC), 209, Veritas A Hall, Yonsei University, 85 Songdogwahak-ro, Yeonsu-gu, Incheon, 21983, Republic of Korea
- Department of Integrative Biotechnology and Translational Medicine, 214, Veritas A Hall, Yonsei University, 85 Songdogwahak-ro, Yeonsu-gu, Incheon, 21983, Republic of Korea
| |
Collapse
|
8
|
Noor A. Improving bioinformatics software quality through incorporation of software engineering practices. PeerJ Comput Sci 2022; 8:e839. [PMID: 35111923 PMCID: PMC8771759 DOI: 10.7717/peerj-cs.839] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 12/13/2021] [Indexed: 06/14/2023]
Abstract
BACKGROUND Bioinformatics software is developed for collecting, analyzing, integrating, and interpreting life science datasets that are often enormous. Bioinformatics engineers often lack the software engineering skills necessary for developing robust, maintainable, reusable software. This study presents review and discussion of the findings and efforts made to improve the quality of bioinformatics software. METHODOLOGY A systematic review was conducted of related literature that identifies core software engineering concepts for improving bioinformatics software development: requirements gathering, documentation, testing, and integration. The findings are presented with the aim of illuminating trends within the research that could lead to viable solutions to the struggles faced by bioinformatics engineers when developing scientific software. RESULTS The findings suggest that bioinformatics engineers could significantly benefit from the incorporation of software engineering principles into their development efforts. This leads to suggestion of both cultural changes within bioinformatics research communities as well as adoption of software engineering disciplines into the formal education of bioinformatics engineers. Open management of scientific bioinformatics development projects can result in improved software quality through collaboration amongst both bioinformatics engineers and software engineers. CONCLUSIONS While strides have been made both in identification and solution of issues of particular import to bioinformatics software development, there is still room for improvement in terms of shifts in both the formal education of bioinformatics engineers as well as the culture and approaches of managing scientific bioinformatics research and development efforts.
Collapse
|
9
|
Bernasconi A, Canakoglu A, Masseroli M, Ceri S. META-BASE: A Novel Architecture for Large-Scale Genomic Metadata Integration. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:543-557. [PMID: 32750853 DOI: 10.1109/tcbb.2020.2998954] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The integration of genomic metadata is, at the same time, an important, difficult, and well-recognized challenge. It is important because a wealth of public data repositories is available to drive biological and clinical research; combining information from various heterogeneous and widely dispersed sources is paramount to a number of biological discoveries. It is difficult because the domain is complex and there is no agreement among the various metadata definitions, which refer to different vocabularies and ontologies. It is well-recognized in the bioinformatics community because, in the common practice, repositories are accessed one-by-one, learning their specific metadata definitions as result of long and tedious efforts, and such practice is error-prone. In this paper, we describe META-BASE, an architecture for integrating metadata extracted from a variety of genomic data sources, based upon a structured transformation process. We present a variety of innovative techniques for data extraction, cleaning, normalization and enrichment. We propose a general, open and extensible pipeline that can easily incorporate any number of new data sources, and propose the resulting repository-already integrating several important sources-which is exposed by means of practical user interfaces to respond biological researchers' needs.
Collapse
|
10
|
Queirós P, Novikova P, Wilmes P, May P. Unification of functional annotation descriptions using text mining. Biol Chem 2021; 402:983-990. [PMID: 33984880 DOI: 10.1515/hsz-2021-0125] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 05/03/2021] [Indexed: 02/06/2023]
Abstract
A common approach to genome annotation involves the use of homology-based tools for the prediction of the functional role of proteins. The quality of functional annotations is dependent on the reference data used, as such, choosing the appropriate sources is crucial. Unfortunately, no single reference data source can be universally considered the gold standard, thus using multiple references could potentially increase annotation quality and coverage. However, this comes with challenges, particularly due to the introduction of redundant and exclusive annotations. Through text mining it is possible to identify highly similar functional descriptions, thus strengthening the confidence of the final protein functional annotation and providing a redundancy-free output. Here we present UniFunc, a text mining approach that is able to detect similar functional descriptions with high precision. UniFunc was built as a small module and can be independently used or integrated into protein function annotation pipelines. By removing the need to individually analyse and compare annotation results, UniFunc streamlines the complementary use of multiple reference datasets.
Collapse
Affiliation(s)
| | | | - Paul Wilmes
- Systems Ecology, Esch-sur-Alzette, Luxembourg
| | - Patrick May
- Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 4362, Esch-sur-Alzette, Luxembourg
| |
Collapse
|
11
|
Cavigelli S, Leips J, Jenny Xiang QY, Lemke D, Konow N. Next Steps in Integrative Biology: Mapping Interactive Processes Across Levels of Biological Organization. Integr Comp Biol 2021; 61:2066-2074. [PMID: 34259855 DOI: 10.1093/icb/icab161] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 07/02/2021] [Accepted: 07/12/2021] [Indexed: 01/03/2023] Open
Abstract
Emergent biological processes result from complex interactions within and across levels of biological organization, ranging from molecular to environmental dynamics. Powerful theories, database tools, and modeling methods have been designed to characterize network connections within levels, such as those among genes, proteins, biochemicals, cells, organisms and species. Here, we propose that developing integrative models of organismal function in complex environments can be facilitated by taking advantage of these methods to identify key nodes of communication across levels of organization. Mapping key drivers or connections among levels of organization will provide data and leverage to model potential rule-sets by which organisms respond and adjust to perturbations at any level of biological organization.
Collapse
Affiliation(s)
- Sonia Cavigelli
- Department of Biobehavioral Health, Pennsylvania State University, University Park PA 16802
| | - Jeff Leips
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore MD 21250
| | - Qiu-Yun Jenny Xiang
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh NC 27695
| | - Dawn Lemke
- Department of Biological and Environmental Sciences, Alabama A&M University, Huntsville AL 35811
| | - Nicolai Konow
- Department of Biological Sciences, University of Massachusetts Lowell, Lowell MA 01854
| |
Collapse
|
12
|
Thessen AE, Bogdan P, Patterson DJ, Casey TM, Hinojo-Hinojo C, de Lange O, Haendel MA. From Reductionism to Reintegration: Solving society's most pressing problems requires building bridges between data types across the life sciences. PLoS Biol 2021; 19:e3001129. [PMID: 33770077 PMCID: PMC7997011 DOI: 10.1371/journal.pbio.3001129] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Decades of reductionist approaches in biology have achieved spectacular progress, but the proliferation of subdisciplines, each with its own technical and social practices regarding data, impedes the growth of the multidisciplinary and interdisciplinary approaches now needed to address pressing societal challenges. Data integration is key to a reintegrated biology able to address global issues such as climate change, biodiversity loss, and sustainable ecosystem management. We identify major challenges to data integration and present a vision for a "Data as a Service"-oriented architecture to promote reuse of data for discovery. The proposed architecture includes standards development, new tools and services, and strategies for career-development and sustainability.
Collapse
Affiliation(s)
- Anne E. Thessen
- Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
- * E-mail:
| | - Paul Bogdan
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, United States of America
| | | | - Theresa M. Casey
- Department of Animal Sciences, Purdue University, West Lafayette, Indiana, United States of America
| | - César Hinojo-Hinojo
- Department of Earth System Science, University of California, Irvine, California, United States of America
| | - Orlando de Lange
- Department of Electrical Engineering, University of Washington, Seattle, Washington, United States of America
| | - Melissa A. Haendel
- Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
| |
Collapse
|
13
|
Irshad O, Ghani Khan MU. Formalization and Semantic Integration of Heterogeneous Omics Annotations for Exploratory Searches. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200127122818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Aim:
To facilitate researchers and practitioners for unveiling the mysterious functional aspects of human cellular system through performing exploratory searching on semantically integrated heterogeneous and geographically dispersed omics annotations.
Background:
Improving health standards of life is one of the motives which continuously instigates researchers and practitioners to strive for uncovering the mysterious aspects of human cellular system. Inferring new knowledge from known facts always requires reasonably large amount of data in well-structured, integrated and unified form. Due to the advent of especially high throughput and sensor technologies, biological data is growing heterogeneously and geographically at astronomical rate. Several data integration systems have been deployed to cope with the issues of data heterogeneity and global dispersion. Systems based on semantic data integration models are more flexible and expandable than syntax-based ones but still lack aspect-based data integration, persistence and querying. Furthermore, these systems do not fully support to warehouse biological entities in the form of semantic associations as naturally possessed by the human cell.
Objective:
To develop aspect-oriented formal data integration model for semantically integrating heterogeneous and geographically dispersed omics annotations for providing exploratory querying on integrated data.
Method:
We propose an aspect-oriented formal data integration model which uses web semantics standards to formally specify its each construct. Proposed model supports aspect-oriented representation of biological entities while addressing the issues of data heterogeneity and global dispersion. It associates and warehouses biological entities in the way they relate with
Result:
To show the significance of proposed model, we developed a data warehouse and information retrieval system based on proposed model compliant multi-layered and multi-modular software architecture. Results show that our model supports well for gathering, associating, integrating, persisting and querying each entity with respect to its all possible aspects within or across the various associated omics layers.
Conclusion:
Formal specifications better facilitate for addressing data integration issues by providing formal means for understanding omics data based on meaning instead of syntax
Collapse
Affiliation(s)
- Omer Irshad
- Department of Computer Science & Engineering, Faculty of Electrical Engineering, The University of Engineering and Technology, Lahore,Pakistan
| | - Muhammad Usman Ghani Khan
- Department of Computer Science & Engineering, Faculty of Electrical Engineering, The University of Engineering and Technology, Lahore,Pakistan
| |
Collapse
|
14
|
Raghunath A, Nagarajan R, Perumal E. ZFARED: A Database of the Antioxidant Response Elements in Zebrafish. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191018172213] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Antioxidant Response Elements (ARE) play a key role in the expression
of Nrf2 target genes by regulating the Keap1-Nrf2-ARE pathway, which offers protection against
toxic agents and oxidative stress-induced diseases.
Objective:
To develop a database of putative AREs for all the genes in the zebrafish genome. This
database will be helpful for researchers to investigate Nrf2 regulatory mechanisms in detail.
Methods:
To facilitate researchers functionally characterize zebrafish AREs, we have developed a
database of AREs, Zebrafish Antioxidant Response Element Database (ZFARED), for all the
protein-coding genes including antioxidant and mitochondrial genes in the zebrafish genome. The
front end of the database was developed using HTML, JavaScript, and CSS and tested in different
browsers. The back end of the database was developed using Perl scripts and Perl-CGI and Perl-
DBI modules.
Results:
ZFARED is the first database on the AREs in zebrafish, which facilitates fast and
efficient searching of AREs. AREs were identified using the in-house developed Perl algorithms
and the database was developed using HTML, JavaScript, and Perl-CGI scripts. From this
database, researchers can access the AREs based on chromosome number (1 to 25 and M for
mitochondria), strand (positive or negative), ARE pattern and keywords. Users can also specify the
size of the upstream/promoter regions (5 to 30 kb) from transcription start site to access the AREs
located in those specific regions.
Conclusion:
ZFARED will be useful in the investigation of the Keap1-Nrf2-ARE pathway and its
gene regulation. ZFARED is freely available at http://zfared.buc.edu.in/.
Collapse
Affiliation(s)
- Azhwar Raghunath
- Molecular Toxicology Laboratory, Department of Biotechnology, Bharathiar University, Coimbatore 641 046, Tamilnadu, India
| | - Raju Nagarajan
- Department of Biotechnology, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India
| | - Ekambaram Perumal
- Molecular Toxicology Laboratory, Department of Biotechnology, Bharathiar University, Coimbatore 641 046, Tamilnadu, India
| |
Collapse
|
15
|
Papoutsoglou EA, Faria D, Arend D, Arnaud E, Athanasiadis IN, Chaves I, Coppens F, Cornut G, Costa BV, Ćwiek-Kupczyńska H, Droesbeke B, Finkers R, Gruden K, Junker A, King GJ, Krajewski P, Lange M, Laporte MA, Michotey C, Oppermann M, Ostler R, Poorter H, Ramı Rez-Gonzalez R, Ramšak Ž, Reif JC, Rocca-Serra P, Sansone SA, Scholz U, Tardieu F, Uauy C, Usadel B, Visser RGF, Weise S, Kersey PJ, Miguel CM, Adam-Blondon AF, Pommier C. Enabling reusability of plant phenomic datasets with MIAPPE 1.1. THE NEW PHYTOLOGIST 2020. [PMID: 32171029 DOI: 10.15454/1yxvzv] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Enabling data reuse and knowledge discovery is increasingly critical in modern science, and requires an effort towards standardising data publication practices. This is particularly challenging in the plant phenotyping domain, due to its complexity and heterogeneity. We have produced the MIAPPE 1.1 release, which enhances the existing MIAPPE standard in coverage, to support perennial plants, in structure, through an explicit data model, and in clarity, through definitions and examples. We evaluated MIAPPE 1.1 by using it to express several heterogeneous phenotyping experiments in a range of different formats, to demonstrate its applicability and the interoperability between the various implementations. Furthermore, the extended coverage is demonstrated by the fact that one of the datasets could not have been described under MIAPPE 1.0. MIAPPE 1.1 marks a major step towards enabling plant phenotyping data reusability, thanks to its extended coverage, and especially the formalisation of its data model, which facilitates its implementation in different formats. Community feedback has been critical to this development, and will be a key part of ensuring adoption of the standard.
Collapse
Affiliation(s)
- Evangelia A Papoutsoglou
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Daniel Faria
- BioData.pt, Instituto Gulbenkian de Ciência, 2780-156, Oeiras, Portugal
- INESC-ID, 1000-029, Lisboa, Portugal
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Elizabeth Arnaud
- Bioversity International, Parc Scientifique Agropolis II, Montpellier Cedex 5, 34397, France
| | - Ioannis N Athanasiadis
- Geo-Information Science and Remote Sensing Laboratory, Wageningen University, Droevendaalsesteeg 3, Wageningen, 6708PB, the Netherlands
| | - Inês Chaves
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- Instituto de Biologia Experimental e Tecnológica (iBET), 2780-157, Oeiras, Portugal
| | - Frederik Coppens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
| | | | - Bruno V Costa
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal
| | - Hanna Ćwiek-Kupczyńska
- Institute of Plant Genetics, Polish Academy of Sciences, ul. Strzeszyńska 34, 60-479, Poznań, Poland
| | - Bert Droesbeke
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
| | - Richard Finkers
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Kristina Gruden
- Department of Biotechnology and Systems Biology, National Institute of Biology, SI1000, Ljubljana, Slovenia
| | - Astrid Junker
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Graham J King
- Southern Cross Plant Science, Southern Cross University, Lismore, NSW 2577, Australia
| | - Paweł Krajewski
- Institute of Plant Genetics, Polish Academy of Sciences, ul. Strzeszyńska 34, 60-479, Poznań, Poland
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Marie-Angélique Laporte
- Bioversity International, Parc Scientifique Agropolis II, Montpellier Cedex 5, 34397, France
| | - Célia Michotey
- Université Paris-Saclay, INRAE, URGI, Versailles, 78026, France
| | - Markus Oppermann
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Richard Ostler
- Computational and Analytical Sciences, Rothamsted Research, Harpenden, AL5 2JQ, UK
| | - Hendrik Poorter
- Plant Sciences (IBG-2), Forschungszentrum Jülich GmbH, D-52425, Jülich, Germany
- Department of Biological Sciences, Macquarie University, North Ryde, NSW 2109, Australia
| | | | - Živa Ramšak
- Department of Biotechnology and Systems Biology, National Institute of Biology, SI1000, Ljubljana, Slovenia
| | - Jochen C Reif
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Philippe Rocca-Serra
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - François Tardieu
- INRA, Laboratoire d'Ecophysiologie des Plantes sous Stress Environnementaux, UMR759, Montpellier, 34060, France
| | - Cristobal Uauy
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Colney, Norwich, NR4 7UH, UK
| | - Björn Usadel
- Plant Sciences (IBG-2), Forschungszentrum Jülich GmbH, D-52425, Jülich, Germany
- Institute for Biology I, BioSC, RWTH Aachen University, Worringer Weg 3, 52074, Aachen, Germany
| | - Richard G F Visser
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Stephan Weise
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | | | - Célia M Miguel
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal
| | | | - Cyril Pommier
- Université Paris-Saclay, INRAE, URGI, Versailles, 78026, France
| |
Collapse
|
16
|
Papoutsoglou EA, Faria D, Arend D, Arnaud E, Athanasiadis IN, Chaves I, Coppens F, Cornut G, Costa BV, Ćwiek‐Kupczyńska H, Droesbeke B, Finkers R, Gruden K, Junker A, King GJ, Krajewski P, Lange M, Laporte M, Michotey C, Oppermann M, Ostler R, Poorter H, Ramírez‐Gonzalez R, Ramšak Ž, Reif JC, Rocca‐Serra P, Sansone S, Scholz U, Tardieu F, Uauy C, Usadel B, Visser RGF, Weise S, Kersey PJ, Miguel CM, Adam‐Blondon A, Pommier C. Enabling reusability of plant phenomic datasets with MIAPPE 1.1. THE NEW PHYTOLOGIST 2020; 227:260-273. [PMID: 32171029 PMCID: PMC7317793 DOI: 10.1111/nph.16544] [Citation(s) in RCA: 59] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 02/24/2020] [Indexed: 05/21/2023]
Abstract
Enabling data reuse and knowledge discovery is increasingly critical in modern science, and requires an effort towards standardising data publication practices. This is particularly challenging in the plant phenotyping domain, due to its complexity and heterogeneity. We have produced the MIAPPE 1.1 release, which enhances the existing MIAPPE standard in coverage, to support perennial plants, in structure, through an explicit data model, and in clarity, through definitions and examples. We evaluated MIAPPE 1.1 by using it to express several heterogeneous phenotyping experiments in a range of different formats, to demonstrate its applicability and the interoperability between the various implementations. Furthermore, the extended coverage is demonstrated by the fact that one of the datasets could not have been described under MIAPPE 1.0. MIAPPE 1.1 marks a major step towards enabling plant phenotyping data reusability, thanks to its extended coverage, and especially the formalisation of its data model, which facilitates its implementation in different formats. Community feedback has been critical to this development, and will be a key part of ensuring adoption of the standard.
Collapse
|
17
|
Papoutsoglou EA, Faria D, Arend D, Arnaud E, Athanasiadis IN, Chaves I, Coppens F, Cornut G, Costa BV, Ćwiek-Kupczyńska H, Droesbeke B, Finkers R, Gruden K, Junker A, King GJ, Krajewski P, Lange M, Laporte MA, Michotey C, Oppermann M, Ostler R, Poorter H, Ramı Rez-Gonzalez R, Ramšak Ž, Reif JC, Rocca-Serra P, Sansone SA, Scholz U, Tardieu F, Uauy C, Usadel B, Visser RGF, Weise S, Kersey PJ, Miguel CM, Adam-Blondon AF, Pommier C. Enabling reusability of plant phenomic datasets with MIAPPE 1.1. THE NEW PHYTOLOGIST 2020. [PMID: 32171029 DOI: 10.15454/ah6u4a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Enabling data reuse and knowledge discovery is increasingly critical in modern science, and requires an effort towards standardising data publication practices. This is particularly challenging in the plant phenotyping domain, due to its complexity and heterogeneity. We have produced the MIAPPE 1.1 release, which enhances the existing MIAPPE standard in coverage, to support perennial plants, in structure, through an explicit data model, and in clarity, through definitions and examples. We evaluated MIAPPE 1.1 by using it to express several heterogeneous phenotyping experiments in a range of different formats, to demonstrate its applicability and the interoperability between the various implementations. Furthermore, the extended coverage is demonstrated by the fact that one of the datasets could not have been described under MIAPPE 1.0. MIAPPE 1.1 marks a major step towards enabling plant phenotyping data reusability, thanks to its extended coverage, and especially the formalisation of its data model, which facilitates its implementation in different formats. Community feedback has been critical to this development, and will be a key part of ensuring adoption of the standard.
Collapse
Affiliation(s)
- Evangelia A Papoutsoglou
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Daniel Faria
- BioData.pt, Instituto Gulbenkian de Ciência, 2780-156, Oeiras, Portugal
- INESC-ID, 1000-029, Lisboa, Portugal
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Elizabeth Arnaud
- Bioversity International, Parc Scientifique Agropolis II, Montpellier Cedex 5, 34397, France
| | - Ioannis N Athanasiadis
- Geo-Information Science and Remote Sensing Laboratory, Wageningen University, Droevendaalsesteeg 3, Wageningen, 6708PB, the Netherlands
| | - Inês Chaves
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- Instituto de Biologia Experimental e Tecnológica (iBET), 2780-157, Oeiras, Portugal
| | - Frederik Coppens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
| | | | - Bruno V Costa
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal
| | - Hanna Ćwiek-Kupczyńska
- Institute of Plant Genetics, Polish Academy of Sciences, ul. Strzeszyńska 34, 60-479, Poznań, Poland
| | - Bert Droesbeke
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
| | - Richard Finkers
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Kristina Gruden
- Department of Biotechnology and Systems Biology, National Institute of Biology, SI1000, Ljubljana, Slovenia
| | - Astrid Junker
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Graham J King
- Southern Cross Plant Science, Southern Cross University, Lismore, NSW 2577, Australia
| | - Paweł Krajewski
- Institute of Plant Genetics, Polish Academy of Sciences, ul. Strzeszyńska 34, 60-479, Poznań, Poland
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Marie-Angélique Laporte
- Bioversity International, Parc Scientifique Agropolis II, Montpellier Cedex 5, 34397, France
| | - Célia Michotey
- Université Paris-Saclay, INRAE, URGI, Versailles, 78026, France
| | - Markus Oppermann
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Richard Ostler
- Computational and Analytical Sciences, Rothamsted Research, Harpenden, AL5 2JQ, UK
| | - Hendrik Poorter
- Plant Sciences (IBG-2), Forschungszentrum Jülich GmbH, D-52425, Jülich, Germany
- Department of Biological Sciences, Macquarie University, North Ryde, NSW 2109, Australia
| | | | - Živa Ramšak
- Department of Biotechnology and Systems Biology, National Institute of Biology, SI1000, Ljubljana, Slovenia
| | - Jochen C Reif
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Philippe Rocca-Serra
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - François Tardieu
- INRA, Laboratoire d'Ecophysiologie des Plantes sous Stress Environnementaux, UMR759, Montpellier, 34060, France
| | - Cristobal Uauy
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Colney, Norwich, NR4 7UH, UK
| | - Björn Usadel
- Plant Sciences (IBG-2), Forschungszentrum Jülich GmbH, D-52425, Jülich, Germany
- Institute for Biology I, BioSC, RWTH Aachen University, Worringer Weg 3, 52074, Aachen, Germany
| | - Richard G F Visser
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Stephan Weise
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | | | - Célia M Miguel
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal
| | | | - Cyril Pommier
- Université Paris-Saclay, INRAE, URGI, Versailles, 78026, France
| |
Collapse
|
18
|
Canakoglu A, Bernasconi A, Colombo A, Masseroli M, Ceri S. GenoSurf: metadata driven semantic search system for integrated genomic datasets. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5670757. [PMID: 31820804 PMCID: PMC6902006 DOI: 10.1093/database/baz132] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/28/2019] [Revised: 10/04/2019] [Accepted: 10/21/2019] [Indexed: 01/18/2023]
Abstract
Many valuable resources developed by world-wide research institutions and consortia describe genomic datasets that are both open and available for secondary research, but their metadata search interfaces are heterogeneous, not interoperable and sometimes with very limited capabilities. We implemented GenoSurf, a multi-ontology semantic search system providing access to a consolidated collection of metadata attributes found in the most relevant genomic datasets; values of 10 attributes are semantically enriched by making use of the most suited available ontologies. The user of GenoSurf provides as input the search terms, sets the desired level of ontological enrichment and obtains as output the identity of matching data files at the various sources. Search is facilitated by drop-down lists of matching values; aggregate counts describing resulting files are updated in real time while the search terms are progressively added. In addition to the consolidated attributes, users can perform keyword-based searches on the original (raw) metadata, which are also imported; GenoSurf supports the interplay of attribute-based and keyword-based search through well-defined interfaces. Currently, GenoSurf integrates about 40 million metadata of several major valuable data sources, including three providers of clinical and experimental data (TCGA, ENCODE and Roadmap Epigenomics) and two sources of annotation data (GENCODE and RefSeq); it can be used as a standalone resource for targeting the genomic datasets at their original sources (identified with their accession IDs and URLs), or as part of an integrated query answering system for performing complex queries over genomic regions and metadata.
Collapse
Affiliation(s)
- Arif Canakoglu
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy
| | - Anna Bernasconi
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy
| | - Andrea Colombo
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy
| | - Marco Masseroli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy
| | - Stefano Ceri
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy
| |
Collapse
|
19
|
Sima AC, Mendes de Farias T, Zbinden E, Anisimova M, Gil M, Stockinger H, Stockinger K, Robinson-Rechavi M, Dessimoz C. Enabling semantic queries across federated bioinformatics databases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5614223. [PMID: 31697362 PMCID: PMC6836710 DOI: 10.1093/database/baz106] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 08/01/2019] [Accepted: 08/02/2019] [Indexed: 11/23/2022]
Abstract
Motivation: Data integration promises to be one of the main catalysts in enabling new insights to be drawn from the wealth of biological data available publicly. However, the heterogeneity of the different data sources, both at the syntactic and the semantic level, still poses significant challenges for achieving interoperability among biological databases. Results: We introduce an ontology-based federated approach for data integration. We applied this approach to three heterogeneous data stores that span different areas of biological knowledge: (i) Bgee, a gene expression relational database; (ii) Orthologous Matrix (OMA), a Hierarchical Data Format 5 orthology DS; and (iii) UniProtKB, a Resource Description Framework (RDF) store containing protein sequence and functional information. To enable federated queries across these sources, we first defined a new semantic model for gene expression called GenEx. We then show how the relational data in Bgee can be expressed as a virtual RDF graph, instantiating GenEx, through dedicated relational-to-RDF mappings. By applying these mappings, Bgee data are now accessible through a public SPARQL endpoint. Similarly, the materialized RDF data of OMA, expressed in terms of the Orthology ontology, is made available in a public SPARQL endpoint. We identified and formally described intersection points (i.e. virtual links) among the three data sources. These allow performing joint queries across the data stores. Finally, we lay the groundwork to enable nontechnical users to benefit from the integrated data, by providing a natural language template-based search interface.
Collapse
Affiliation(s)
- Ana Claudia Sima
- ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland.,Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Tarcisio Mendes de Farias
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
| | - Erich Zbinden
- ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Maria Anisimova
- ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Manuel Gil
- ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Heinz Stockinger
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Kurt Stockinger
- ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland
| | - Marc Robinson-Rechavi
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Department of Genetics, Evolution, and Environment, University College London, Gower St, London WC1E 6BT, UK.,Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| |
Collapse
|
20
|
do Nascimento PM, Medeiros IG, Falcão RM, Stransky B, de Souza JES. A decision tree to improve identification of pathogenic mutations in clinical practice. BMC Med Inform Decis Mak 2020; 20:52. [PMID: 32151256 PMCID: PMC7063785 DOI: 10.1186/s12911-020-1060-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 02/21/2020] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND A variant of unknown significance (VUS) is a variant form of a gene that has been identified through genetic testing, but whose significance to the organism function is not known. An actual challenge in precision medicine is to precisely identify which detected mutations from a sequencing process have a suitable role in the treatment or diagnosis of a disease. The average accuracy of pathogenicity predictors is 85%. However, there is a significant discordance about the identification of mutational impact and pathogenicity among them. Therefore, manual verification is necessary for confirming the real effect of a mutation in its casuistic. METHODS In this work, we use variables categorization and selection for building a decision tree model, and later we measure and compare its accuracy with four known mutation predictors and seventeen supervised machine-learning (ML) algorithms. RESULTS The results showed that the proposed tree reached the highest precision among all tested variables: 91% for True Neutrals, 8% for False Neutrals, 9% for False Pathogenic, and 92% for True Pathogenic. CONCLUSIONS The decision tree exceptionally demonstrated high classification precision with cancer data, producing consistently relevant forecasts for the sample tests with an accuracy close to the best ones achieved from supervised ML algorithms. Besides, the decision tree algorithm is easier to apply in clinical practice by non-IT experts. From the cancer research community perspective, this approach can be successfully applied as an alternative for the determination of potential pathogenicity of VOUS.
Collapse
Affiliation(s)
| | - Inácio Gomes Medeiros
- Bioinformatics Postgraduate Program, Metrópole Digital Institute, Federal University of Rio Grande do Norte, Natal, Brazil
| | - Raul Maia Falcão
- Bioinformatics Postgraduate Program, Metrópole Digital Institute, Federal University of Rio Grande do Norte, Natal, Brazil
| | - Beatriz Stransky
- Biomedical Engineering Department, Center of Technology, Federal University of Rio Grande do Norte, Natal, Brazil
- Bioinformatics Multidisciplinary Environment (BioME), Metrópole Digital Institute, Federal University of Rio Grande do Norte, Natal, Brazil
| | - Jorge Estefano Santana de Souza
- Bioinformatics Postgraduate Program, Metrópole Digital Institute, Federal University of Rio Grande do Norte, Natal, Brazil.
- Bioinformatics Multidisciplinary Environment (BioME), Metrópole Digital Institute, Federal University of Rio Grande do Norte, Natal, Brazil.
| |
Collapse
|
21
|
Irshad O, Khan MUG. Integration and Querying of Heterogeneous Omics Semantic Annotations for Biomedical and Biomolecular Knowledge Discovery. Curr Bioinform 2020. [DOI: 10.2174/1574893614666190409112025] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Background:Exploring various functional aspects of a biological cell system has been a focused research trend for last many decades. Biologists, scientists and researchers are continuously striving for unveiling the mysteries of these functional aspects to improve the health standards of life. For getting such understanding, astronomically growing, heterogeneous and geographically dispersed omics data needs to be critically analyzed. Currently, omics data is available in different types and formats through various data access interfaces. Applications which require offline and integrated data encounter a lot of data heterogeneity and global dispersion issues.Objective:For facilitating especially such applications, heterogeneous data must be collected, integrated and warehoused in such a loosely coupled way so that each molecular entity can computationally be understood independently or in association with other entities within or across the various cellular aspects.Methods:In this paper, we propose an omics data integration schema and its corresponding data warehouse system for integrating, warehousing and presenting heterogeneous and geographically dispersed omics entities according to the cellular functional aspects.Results & Conclusion:Such aspect-oriented data integration, warehousing and data access interfacing through graphical search, web services and application programing interfaces make our proposed integrated data schema and warehouse system better and useful than other contemporary ones.
Collapse
Affiliation(s)
- Omer Irshad
- Department of Computer Science & Engineering, Faculty of Electrical Engineering, University of Engineering and Technology, Lahore, Pakistan
| | - Muhammad Usman Ghani Khan
- Department of Computer Science & Engineering, Faculty of Electrical Engineering, The University of Engineering and Technology, Lahore, Pakistan
| |
Collapse
|
22
|
Al Bitar S, Gali-Muhtasib H. The Role of the Cyclin Dependent Kinase Inhibitor p21 cip1/waf1 in Targeting Cancer: Molecular Mechanisms and Novel Therapeutics. Cancers (Basel) 2019; 11:cancers11101475. [PMID: 31575057 PMCID: PMC6826572 DOI: 10.3390/cancers11101475] [Citation(s) in RCA: 103] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Revised: 07/26/2019] [Accepted: 07/30/2019] [Indexed: 12/15/2022] Open
Abstract
p21cip1/waf1 mediates various biological activities by sensing and responding to multiple stimuli, via p53-dependent and independent pathways. p21 is known to act as a tumor suppressor mainly by inhibiting cell cycle progression and allowing DNA repair. Significant advances have been made in elucidating the potential role of p21 in promoting tumorigenesis. Here, we discuss the involvement of p21 in multiple signaling pathways, its dual role in cancer, and the importance of understanding its paradoxical functions for effectively designing therapeutic strategies that could selectively inhibit its oncogenic activities, override resistance to therapy and yet preserve its tumor suppressive functions.
Collapse
Affiliation(s)
- Samar Al Bitar
- Department of Biology, and Center for Drug Discovery, American University of Beirut, Beirut 1103, Lebanon.
| | - Hala Gali-Muhtasib
- Department of Biology, and Center for Drug Discovery, American University of Beirut, Beirut 1103, Lebanon.
| |
Collapse
|
23
|
Wani N, Raza K. Integrative approaches to reconstruct regulatory networks from multi-omics data: A review of state-of-the-art methods. Comput Biol Chem 2019; 83:107120. [PMID: 31499298 DOI: 10.1016/j.compbiolchem.2019.107120] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 02/22/2019] [Accepted: 08/27/2019] [Indexed: 02/06/2023]
Abstract
Data generation using high throughput technologies has led to the accumulation of diverse types of molecular data. These data have different types (discrete, real, string, etc.) and occur in various formats and sizes. Datasets including gene expression, miRNA expression, protein-DNA binding data (ChIP-Seq/ChIP-ChIP), mutation data (copy number variation, single nucleotide polymorphisms), annotations, interactions, and association data are some of the commonly used biological datasets to study various cellular mechanisms of living organisms. Each of them provides a unique, complementary and partly independent view of the genome and hence embed essential information about the regulatory mechanisms of genes and their products. Therefore, integrating these data and inferring regulatory interactions from them offer a system level of biological insight in predicting gene functions and their phenotypic outcomes. To study genome functionality through regulatory networks, different methods have been proposed for collective mining of information from an integrated dataset. We survey here integration methods that reconstruct regulatory networks using state-of-the-art techniques to handle multi-omics (i.e., genomic, transcriptomic, proteomic) and other biological datasets.
Collapse
Affiliation(s)
- Nisar Wani
- Govt. Degree College Baramulla, J & K, India; Department of Computer Science, jamia Milia Islamia, New Delhi, India
| | - Khalid Raza
- Department of Computer Science, jamia Milia Islamia, New Delhi, India.
| |
Collapse
|
24
|
Sügis E, Dauvillier J, Leontjeva A, Adler P, Hindie V, Moncion T, Collura V, Daudin R, Loe-Mie Y, Herault Y, Lambert JC, Hermjakob H, Pupko T, Rain JC, Xenarios I, Vilo J, Simonneau M, Peterson H. HENA, heterogeneous network-based data set for Alzheimer's disease. Sci Data 2019; 6:151. [PMID: 31413325 PMCID: PMC6694132 DOI: 10.1038/s41597-019-0152-0] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 06/18/2019] [Indexed: 12/13/2022] Open
Abstract
Alzheimer's disease and other types of dementia are the top cause for disabilities in later life and various types of experiments have been performed to understand the underlying mechanisms of the disease with the aim of coming up with potential drug targets. These experiments have been carried out by scientists working in different domains such as proteomics, molecular biology, clinical diagnostics and genomics. The results of such experiments are stored in the databases designed for collecting data of similar types. However, in order to get a systematic view of the disease from these independent but complementary data sets, it is necessary to combine them. In this study we describe a heterogeneous network-based data set for Alzheimer's disease (HENA). Additionally, we demonstrate the application of state-of-the-art graph convolutional networks, i.e. deep learning methods for the analysis of such large heterogeneous biological data sets. We expect HENA to allow scientists to explore and analyze their own results in the broader context of Alzheimer's disease research.
Collapse
Affiliation(s)
- Elena Sügis
- Quretec Ltd., Ülikooli 6a, 51003, Tartu, Estonia
- Institute of Computer Science, University of Tartu, J. Liivi 2, 50409, Tartu, Estonia
| | - Jerome Dauvillier
- Swiss Institute of Bioinformatics, Vital-IT group, Unil Quartier Sorge, Genopode building, CH-1015, Lausanne, Switzerland
| | - Anna Leontjeva
- CSIRO Data 61, 5/13 Garden St, Eveleigh, NSW, 2015, Australia
| | - Priit Adler
- Quretec Ltd., Ülikooli 6a, 51003, Tartu, Estonia
- Institute of Computer Science, University of Tartu, J. Liivi 2, 50409, Tartu, Estonia
| | - Valerie Hindie
- Hybrigenics SA, 3-5 Impasse Reille, 75014, Paris, France
| | - Thomas Moncion
- Hybrigenics SA, 3-5 Impasse Reille, 75014, Paris, France
| | | | - Rachel Daudin
- Institut national de la santé et de la recherche médicale, INSERM U894 2 ter rue d'Alésia, 75014, Paris, France
- Laboratoire Aimé Cotton, Centre National Recherche Scientifique, Université Paris-Sud, Ecole Normale Supérieure Paris-Saclay, Université Paris-Saclay, 91405, Orsay, France
| | - Yann Loe-Mie
- (Epi)genomics of Animal Development Unit, Institut Pasteur, CNRS UMR3738, Paris, 75015, France
| | - Yann Herault
- Centre Européen de Recherche en Biologie et Médecine, 1 rue Laurent Fries, 67404, Illkirch, France
| | - Jean-Charles Lambert
- Institut Pasteur de Lille, UMR 744 1 rue du Pr. Calmette BP 245, 59019, Lille cedex, France
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, CB10 1SD, Hinxton, United Kingdom
| | - Tal Pupko
- George S. Wise Faculty of Life Sciences, School of Molecular Cell Biology and Biotechnology, Tel Aviv University, P.O. Box 39040, 6997801, Tel Aviv, Israel
| | | | - Ioannis Xenarios
- Center for Integrative Genomics University of Lausanne, Genopode, 1015, Lausanne, Switzerland
- Genome Center Health 2030, Analytical Platform Department, Chemin des Mines 9, 1202, Genève, Switzerland
- DFR CHUV, Rue du Bugnon 21, 1011, Lausanne, Switzerland
- Agora Center, LICR/Department of Oncology, Rue du Bugnon 25A, 1005, Lausanne, Switzerland
| | - Jaak Vilo
- Quretec Ltd., Ülikooli 6a, 51003, Tartu, Estonia
- Institute of Computer Science, University of Tartu, J. Liivi 2, 50409, Tartu, Estonia
| | - Michel Simonneau
- Institut national de la santé et de la recherche médicale, INSERM U894 2 ter rue d'Alésia, 75014, Paris, France.
- Laboratoire Aimé Cotton, Centre National Recherche Scientifique, Université Paris-Sud, Ecole Normale Supérieure Paris-Saclay, Université Paris-Saclay, 91405, Orsay, France.
| | - Hedi Peterson
- Quretec Ltd., Ülikooli 6a, 51003, Tartu, Estonia.
- Institute of Computer Science, University of Tartu, J. Liivi 2, 50409, Tartu, Estonia.
| |
Collapse
|
25
|
Allot A, Peng Y, Wei CH, Lee K, Phan L, Lu Z. LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC. Nucleic Acids Res 2019; 46:W530-W536. [PMID: 29762787 PMCID: PMC6030971 DOI: 10.1093/nar/gky355] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 05/08/2018] [Indexed: 01/10/2023] Open
Abstract
The identification and interpretation of genomic variants play a key role in the diagnosis of genetic diseases and related research. These tasks increasingly rely on accessing relevant manually curated information from domain databases (e.g. SwissProt or ClinVar). However, due to the sheer volume of medical literature and high cost of expert curation, curated variant information in existing databases are often incomplete and out-of-date. In addition, the same genetic variant can be mentioned in publications with various names (e.g. ‘A146T’ versus ‘c.436G>A’ versus ‘rs121913527’). A search in PubMed using only one name usually cannot retrieve all relevant articles for the variant of interest. Hence, to help scientists, healthcare professionals, and database curators find the most up-to-date published variant research, we have developed LitVar for the search and retrieval of standardized variant information. In addition, LitVar uses advanced text mining techniques to compute and extract relationships between variants and other associated entities such as diseases and chemicals/drugs. LitVar is publicly available at https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/LitVar.
Collapse
Affiliation(s)
- Alexis Allot
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Yifan Peng
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Chih-Hsuan Wei
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kyubum Lee
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Lon Phan
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA
| |
Collapse
|
26
|
NetR and AttR, Two New Bioinformatic Tools to Integrate Diverse Datasets into Cytoscape Network and Attribute Files. Genes (Basel) 2019; 10:genes10060423. [PMID: 31159440 PMCID: PMC6628208 DOI: 10.3390/genes10060423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 05/25/2019] [Accepted: 05/27/2019] [Indexed: 11/17/2022] Open
Abstract
High-throughput technologies have allowed researchers to obtain genome-wide data from a wide array of experimental model systems. Unfortunately, however, new data generation tends to significantly outpace data re-utilization, and most high throughput datasets are only rarely used in subsequent studies or to generate new hypotheses to be tested experimentally. The reasons behind such data underutilization include a widespread lack of programming expertise among experimentalist biologists to carry out the necessary file reformatting that is often necessary to integrate published data from disparate sources. We have developed two programs (NetR and AttR), which allow experimental biologists with little to no programming background to integrate publicly available datasets into files that can be later visualized with Cytoscape to display hypothetical networks that result from combining individual datasets, as well as a series of published attributes related to the genes or proteins in the network. NetR also allows users to import protein and genetic interaction data from InterMine, which can further enrich a network model based on curated information. We expect that NetR/AttR will allow experimental biologists to mine a largely unexploited wealth of data in their fields and facilitate their integration into hypothetical models to be tested experimentally.
Collapse
|
27
|
Perez-Gil D, Lopez FJ, Dopazo J, Marin-Garcia P, Rendon A, Medina I. PyCellBase, an efficient python package for easy retrieval of biological data from heterogeneous sources. BMC Bioinformatics 2019; 20:159. [PMID: 30922213 PMCID: PMC6438028 DOI: 10.1186/s12859-019-2726-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Accepted: 03/13/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Biological databases and repositories are incrementing in diversity and complexity over the years. This rapid expansion of current and new sources of biological knowledge raises serious problems of data accessibility and integration. To handle the growing necessity of unification, CellBase was created as an integrative solution. CellBase provides a centralized NoSQL database containing biological information from different and heterogeneous sources. Access to this information is done through a RESTful web service API, which provides an efficient interface to the data. RESULTS In this work we present PyCellBase, a Python package that provides programmatic access to the rich RESTful web service API offered by CellBase. This package offers a fast and user-friendly access to biological information without the need of installing any local database. In addition, a series of command-line tools are provided to perform common bioinformatic tasks, such as variant annotation. CellBase data is always available by a high-availability cluster and queries have been tuned to ensure a real-time performance. CONCLUSION PyCellBase is an open-source Python package that provides an efficient access to heterogeneous biological information. It allows to perform tasks that require a comprehensive set of knowledge resources, as for example variant annotation. Queries can be easily fine-tuned to retrieve the desired information of particular biological features. PyCellBase offers the convenience of an object-oriented scripting language and provides the ability to integrate the obtained results into other Python applications and pipelines.
Collapse
Affiliation(s)
| | | | - Joaquin Dopazo
- Clinical Bioinformatics Area, Fundacion Progreso y Salud, Seville, Spain.,Functional Genomics Node, INB-ELIXIR-es, FPS, Hospital Virgen del Rocío, Seville, Spain
| | - Pablo Marin-Garcia
- Department of Bioinformatics, Universidad Católica de Valencia, Valencia, Spain.,Department of Bioinformatics, Institute for Integrative Systems Biology, Valencia, Spain
| | - Augusto Rendon
- Genomics England, London, UK.,Department of Haematology, University of Cambridge, Cambridge, UK
| | - Ignacio Medina
- HPC Service, UIS, University of Cambridge, Cambridge, UK.
| |
Collapse
|
28
|
Schneider MV, Griffin PC, Tyagi S, Flannery M, Dayalan S, Gladman S, Watson-Haigh N, Bayer PE, Charleston M, Cooke I, Cook R, Edwards RJ, Edwards D, Gorse D, McConville M, Powell D, Wilkins MR, Lonie A. Establishing a distributed national research infrastructure providing bioinformatics support to life science researchers in Australia. Brief Bioinform 2019; 20:384-389. [PMID: 29106479 PMCID: PMC6433737 DOI: 10.1093/bib/bbx071] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
EMBL Australia Bioinformatics Resource (EMBL-ABR) is a developing national research infrastructure, providing bioinformatics resources and support to life science and biomedical researchers in Australia. EMBL-ABR comprises 10 geographically distributed national nodes with one coordinating hub, with current funding provided through Bioplatforms Australia and the University of Melbourne for its initial 2-year development phase. The EMBL-ABR mission is to: (1) increase Australia's capacity in bioinformatics and data sciences; (2) contribute to the development of training in bioinformatics skills; (3) showcase Australian data sets at an international level and (4) enable engagement in international programs. The activities of EMBL-ABR are focussed in six key areas, aligning with comparable international initiatives such as ELIXIR, CyVerse and NIH Commons. These key areas-Tools, Data, Standards, Platforms, Compute and Training-are described in this article.
Collapse
Affiliation(s)
| | - Philippa C Griffin
- EMBL Australia Bioinformatics Resource, EMBL-ABR Hub, Melbourne, Victoria, Australia
| | - Sonika Tyagi
- Australian Genome Research Facility, Bioinformatics, 1G royal Pde Parkville, Victoria, Australia
| | - Madison Flannery
- EMBL Australia Bioinformatics Resource, EMBL-ABR Hub, Melbourne, Victoria, Australia
| | - Saravanan Dayalan
- University of Melbourne Bio21 Molecular Science and Biotechnology Institute, Metabolomics Platform, Parkville Victoria, Australia
| | - Simon Gladman
- EMBL Australia Bioinformatics Resource, EMBL-ABR Hub, Melbourne, Victoria, Australia
| | | | - Philipp E Bayer
- University of Western Australia, School of Plant Biology, Crawley, Western Australia, Australia
| | - Michael Charleston
- University of Tasmania Menzies Institute for Medical Research, Hobart Tasmania, Australia
| | - Ira Cooke
- James Cook University, College of Public Health, Medical & Vet Sciences, Townsville, Queensland, Australia
| | - Rob Cook
- University of New South Wales, Sydney, Australia
| | | | - David Edwards
- University of Western Australia, School of Plant Biology, Crawley, Western Australia, Western Australia
| | - Dominique Gorse
- Queensland Facility for Advanced Bioinformatics, Brisbane, Queensland, Australia
| | - Malcolm McConville
- University of Melbourne Bio21 Molecular Science and Biotechnology Institute, Parkville Victoria, Australia
| | | | - Marc R Wilkins
- University of New South Wales, School of Biotechnology and Biomolecular Sciences, Sydney, Australia
| | - Andrew Lonie
- University of Melbourne Department of General Practice and Primary Health Care, Melbourne Bioinformatics, Carlton Victoria, Australia
| |
Collapse
|
29
|
Ehrhart F, Roozen S, Verbeek J, Koek G, Kok G, van Kranen H, Evelo CT, Curfs LMG. Review and gap analysis: molecular pathways leading to fetal alcohol spectrum disorders. Mol Psychiatry 2019; 24:10-17. [PMID: 29892052 PMCID: PMC6325721 DOI: 10.1038/s41380-018-0095-4] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Revised: 11/17/2017] [Accepted: 04/23/2018] [Indexed: 12/30/2022]
Abstract
Alcohol exposure during pregnancy affects the development of the fetus in various ways and may lead to Fetal Alcohol Spectrum Disorders (FASD). FASD is one of the leading preventable forms of neurodevelopmental disorders. In the light of prevention and early intervention, knowledge on how ethanol exposure induces fetal damage is urgently needed. Besides direct ethanol and acetaldehyde toxicity, alcohol increases oxidative stress, and subsequent general effects (e.g., epigenetic imprinting, gene expression, and metabolite levels). The current review provides an overview of the existing knowledge about specific downstream pathways for FASD that affects e.g., the SHH pathway, cholesterol homeostasis, neurotransmitter signaling, and effects on the cytoskeleton. Available human data vary greatly, while animal studies with controlled ethanol exposition are only to a certain limit transferable to humans. The main deficits in knowledge about FASD are the lack of pathophysiological understanding and dose-response relationships, together with the lack of reliable biomarkers for either FASD detection or estimation of susceptibility. In addition to single outcome experiments, omics data should be generated to overcome this problem. Therefore, for future studies we recommend holistic data driven analysis, which allows integrative analyses over multiple levels of genetic variation, transcriptomics and metabolomics data to investigate the whole image of FASD development and to provide insight in potential drug targets for intervention.
Collapse
Affiliation(s)
- Friederike Ehrhart
- Governor Kremers Centre, Maastricht University Medical Centre+, Maastricht, The Netherlands. .,Department of Bioinformatics, NUTRIM School of Nutrition and Translational Research in Metabolism, Maastricht University, Maastricht, The Netherlands.
| | - Sylvia Roozen
- 0000 0004 0480 1382grid.412966.eGovernor Kremers Centre, Maastricht University Medical Centre+, Maastricht, The Netherlands ,0000 0001 0481 6099grid.5012.6Department of Work and Social Psychology, Maastricht University, Maastricht, The Netherlands
| | - Jef Verbeek
- 0000 0004 0480 1382grid.412966.eDepartment of Internal Medicine, Division of gastroenterology and hepatology, Maastricht University Medical Centre+, Maastricht, The Netherlands
| | - Ger Koek
- 0000 0004 0480 1382grid.412966.eGovernor Kremers Centre, Maastricht University Medical Centre+, Maastricht, The Netherlands ,0000 0004 0480 1382grid.412966.eDepartment of Internal Medicine, Division of gastroenterology and hepatology, Maastricht University Medical Centre+, Maastricht, The Netherlands
| | - Gerjo Kok
- 0000 0004 0480 1382grid.412966.eGovernor Kremers Centre, Maastricht University Medical Centre+, Maastricht, The Netherlands ,0000 0001 0481 6099grid.5012.6Department of Work and Social Psychology, Maastricht University, Maastricht, The Netherlands
| | - Henk van Kranen
- 0000 0004 0480 1382grid.412966.eGovernor Kremers Centre, Maastricht University Medical Centre+, Maastricht, The Netherlands ,0000 0001 0481 6099grid.5012.6Institute for Public Health Genomics, Maastricht University, Maastricht, The Netherlands
| | - Chris T. Evelo
- 0000 0004 0480 1382grid.412966.eGovernor Kremers Centre, Maastricht University Medical Centre+, Maastricht, The Netherlands ,0000 0001 0481 6099grid.5012.6Department of Bioinformatics, NUTRIM School of Nutrition and Translational Research in Metabolism, Maastricht University, Maastricht, The Netherlands
| | - Leopold M. G. Curfs
- 0000 0004 0480 1382grid.412966.eGovernor Kremers Centre, Maastricht University Medical Centre+, Maastricht, The Netherlands ,0000 0004 0480 1382grid.412966.eDepartment of Genetics, Maastricht University Medical Centre+, Maastricht, The Netherlands
| |
Collapse
|
30
|
Zhang H, Guo Y, Li Q, George TJ, Shenkman E, Modave F, Bian J. An ontology-guided semantic data integration framework to support integrative data analysis of cancer survival. BMC Med Inform Decis Mak 2018; 18:41. [PMID: 30066664 PMCID: PMC6069766 DOI: 10.1186/s12911-018-0636-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Cancer is the second leading cause of death in the United States, exceeded only by heart disease. Extant cancer survival analyses have primarily focused on individual-level factors due to limited data availability from a single data source. There is a need to integrate data from different sources to simultaneously study as much risk factors as possible. Thus, we proposed an ontology-based approach to integrate heterogeneous datasets addressing key data integration challenges. METHODS Following best practices in ontology engineering, we created the Ontology for Cancer Research Variables (OCRV) adapting existing semantic resources such as the National Cancer Institute (NCI) Thesaurus. Using the global-as-view data integration approach, we created mapping axioms to link the data elements in different sources to OCRV. Implemented upon the Ontop platform, we built a data integration pipeline to query, extract, and transform data in relational databases using semantic queries into a pooled dataset according to the downstream multi-level Integrative Data Analysis (IDA) needs. RESULTS Based on our use cases in the cancer survival IDA, we created tailored ontological structures in OCRV to facilitate the data integration tasks. Specifically, we created a flexible framework addressing key integration challenges: (1) using a shared, controlled vocabulary to make data understandable to both human and computers, (2) explicitly modeling the semantic relationships makes it possible to compute and reason with the data, (3) linking patients to contextual and environmental factors through geographic variables, (4) being able to document the data manipulation and integration processes clearly in the ontologies. CONCLUSIONS Using an ontology-based data integration approach not only standardizes the definitions of data variables through a common, controlled vocabulary, but also makes the semantic relationships among variables from different sources explicit and clear to all users of the same datasets. Such an approach resolves the ambiguity in variable selection, extraction and integration processes and thus improve reproducibility of the IDA.
Collapse
Affiliation(s)
- Hansi Zhang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Clinical and Translational Research Building Suite 3228, 2004 Mowry Road, PO Box 100219, Gainesville, FL, 32610-0219, USA
| | - Yi Guo
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Clinical and Translational Research Building Suite 3228, 2004 Mowry Road, PO Box 100219, Gainesville, FL, 32610-0219, USA
| | - Qian Li
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Clinical and Translational Research Building Suite 3228, 2004 Mowry Road, PO Box 100219, Gainesville, FL, 32610-0219, USA
| | - Thomas J George
- Division of Hematology and Oncology, Department of Medicine, College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Elizabeth Shenkman
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Clinical and Translational Research Building Suite 3228, 2004 Mowry Road, PO Box 100219, Gainesville, FL, 32610-0219, USA
| | - François Modave
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Clinical and Translational Research Building Suite 3228, 2004 Mowry Road, PO Box 100219, Gainesville, FL, 32610-0219, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Clinical and Translational Research Building Suite 3228, 2004 Mowry Road, PO Box 100219, Gainesville, FL, 32610-0219, USA.
| |
Collapse
|
31
|
Bastião Silva L, Trifan A, Luís Oliveira J. MONTRA: An agile architecture for data publishing and discovery. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 160:33-42. [PMID: 29728244 DOI: 10.1016/j.cmpb.2018.03.024] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2017] [Revised: 02/26/2018] [Accepted: 03/27/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND AND OBJECTIVE Data catalogues are a common form of capturing and presenting information about a specific kind of entity (e.g. products, services, professionals, datasets, etc.). However, the construction of a web-based catalogue for a particular scenario normally implies the development of a specific and dedicated solution. In this paper, we present MONTRA, a rapid-application development framework designed to facilitate the integration and discovery of heterogeneous objects, which may be characterized by distinct data structures. METHODS MONTRA was developed following a plugin-based architecture to allow dynamic composition of services over represented datasets. The core of MONTRA's functionalities resides in a flexible data skeleton used to characterize data entities, and from which a fully-fledged web data catalogue is automatically generated, ensuring access control and data privacy. RESULTS MONTRA is being successfully used by several European projects to collect and manage biomedical databases. In this paper, we describe three of these applications scenarios. CONCLUSIONS This work was motivated by the plethora of geographically scattered biomedical repositories, and by the role they can play altogether for the understanding of diseases and of the real-world effectiveness of treatments. Using metadata to expose datasets' characteristics, MONTRA greatly simplifies the task of building data catalogues. The source code is publicly available at https://github.com/bioinformatics-ua/montra.
Collapse
|
32
|
De Meulder B, Lefaudeux D, Bansal AT, Mazein A, Chaiboonchoe A, Ahmed H, Balaur I, Saqi M, Pellet J, Ballereau S, Lemonnier N, Sun K, Pandis I, Yang X, Batuwitage M, Kretsos K, van Eyll J, Bedding A, Davison T, Dodson P, Larminie C, Postle A, Corfield J, Djukanovic R, Chung KF, Adcock IM, Guo YK, Sterk PJ, Manta A, Rowe A, Baribaud F, Auffray C. A computational framework for complex disease stratification from multiple large-scale datasets. BMC SYSTEMS BIOLOGY 2018; 12:60. [PMID: 29843806 PMCID: PMC5975674 DOI: 10.1186/s12918-018-0556-z] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 02/21/2018] [Indexed: 01/05/2023]
Abstract
BACKGROUND Multilevel data integration is becoming a major area of research in systems biology. Within this area, multi-'omics datasets on complex diseases are becoming more readily available and there is a need to set standards and good practices for integrated analysis of biological, clinical and environmental data. We present a framework to plan and generate single and multi-'omics signatures of disease states. METHODS The framework is divided into four major steps: dataset subsetting, feature filtering, 'omics-based clustering and biomarker identification. RESULTS We illustrate the usefulness of this framework by identifying potential patient clusters based on integrated multi-'omics signatures in a publicly available ovarian cystadenocarcinoma dataset. The analysis generated a higher number of stable and clinically relevant clusters than previously reported, and enabled the generation of predictive models of patient outcomes. CONCLUSIONS This framework will help health researchers plan and perform multi-'omics big data analyses to generate hypotheses and make sense of their rich, diverse and ever growing datasets, to enable implementation of translational P4 medicine.
Collapse
Affiliation(s)
- Bertrand De Meulder
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France.
| | - Diane Lefaudeux
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France
| | - Aruna T Bansal
- Acclarogen Ltd, St John's Innovation Centre, Cambridge, CB4 OWS, UK
| | - Alexander Mazein
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France
| | - Amphun Chaiboonchoe
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France
| | - Hassan Ahmed
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France
| | - Irina Balaur
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France
| | - Mansoor Saqi
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France
| | - Johann Pellet
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France
| | - Stéphane Ballereau
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France
| | - Nathanaël Lemonnier
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France
| | - Kai Sun
- Data Science Institute, Imperial College, London, SW7 2AZ, UK
| | - Ioannis Pandis
- Data Science Institute, Imperial College, London, SW7 2AZ, UK.,Janssen Research and Development Ltd, High Wycombe, HP12 4DP, UK
| | - Xian Yang
- Data Science Institute, Imperial College, London, SW7 2AZ, UK
| | | | | | | | | | - Timothy Davison
- Janssen Research and Development Ltd, High Wycombe, HP12 4DP, UK
| | - Paul Dodson
- AstraZeneca Ltd, Alderley Park, Macclesfield, SK10 4TG, UK
| | | | - Anthony Postle
- Faculty of Medicine, University of Southampton, Southampton, SO17 1BJ, UK
| | - Julie Corfield
- AstraZeneca R & D, 43150, Mölndal, Sweden.,Arateva R & D Ltd, Nottingham, NG1 1GF, UK
| | - Ratko Djukanovic
- Faculty of Medicine, University of Southampton, Southampton, SO17 1BJ, UK
| | - Kian Fan Chung
- National Hearth and Lung Institute, Imperial College London, London, SW3 6LY, UK
| | - Ian M Adcock
- National Hearth and Lung Institute, Imperial College London, London, SW3 6LY, UK
| | - Yi-Ke Guo
- Data Science Institute, Imperial College, London, SW7 2AZ, UK
| | - Peter J Sterk
- Department of Respiratory Medicine, Academic Medical Centre, University of Amsterdam, Amsterdam, AZ1105, The Netherlands
| | - Alexander Manta
- Research Informatics, Roche Diagnostics GmbH, 82008, Unterhaching, Germany
| | - Anthony Rowe
- Janssen Research and Development Ltd, High Wycombe, HP12 4DP, UK
| | | | - Charles Auffray
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France.
| | | |
Collapse
|
33
|
Townend GS, Ehrhart F, van Kranen HJ, Wilkinson M, Jacobsen A, Roos M, Willighagen EL, van Enckevort D, Evelo CT, Curfs LMG. MECP2 variation in Rett syndrome-An overview of current coverage of genetic and phenotype data within existing databases. Hum Mutat 2018; 39:914-924. [PMID: 29704307 PMCID: PMC6033003 DOI: 10.1002/humu.23542] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2017] [Revised: 04/18/2018] [Accepted: 04/23/2018] [Indexed: 12/30/2022]
Abstract
Rett syndrome (RTT) is a monogenic rare disorder that causes severe neurological problems. In most cases, it results from a loss-of-function mutation in the gene encoding methyl-CPG-binding protein 2 (MECP2). Currently, about 900 unique MECP2 variations (benign and pathogenic) have been identified and it is suspected that the different mutations contribute to different levels of disease severity. For researchers and clinicians, it is important that genotype-phenotype information is available to identify disease-causing mutations for diagnosis, to aid in clinical management of the disorder, and to provide counseling for parents. In this study, 13 genotype-phenotype databases were surveyed for their general functionality and availability of RTT-specific MECP2 variation data. For each database, we investigated findability and interoperability alongside practical user functionality, and type and amount of genetic and phenotype data. The main conclusions are that, as well as being challenging to find these databases and specific MECP2 variants held within, interoperability is as yet poorly developed and requires effort to search across databases. Nevertheless, we found several thousand online database entries for MECP2 variations and their associated phenotypes, diagnosis, or predicted variant effects, which is a good starting point for researchers and clinicians who want to provide, annotate, and use the data.
Collapse
Affiliation(s)
- Gillian S Townend
- Rett Expertise Centre Netherlands - GKC, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Friederike Ehrhart
- Rett Expertise Centre Netherlands - GKC, Maastricht University Medical Center, Maastricht, The Netherlands.,Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Henk J van Kranen
- Rett Expertise Centre Netherlands - GKC, Maastricht University Medical Center, Maastricht, The Netherlands.,Institute for Public Health Genomics, Maastricht University, Maastricht, The Netherlands
| | - Mark Wilkinson
- Center for Plant Biotechnology and Genomics, Universidad Politécnica de Madrid, Madrid, Spain
| | - Annika Jacobsen
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Marco Roos
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - David van Enckevort
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Chris T Evelo
- Rett Expertise Centre Netherlands - GKC, Maastricht University Medical Center, Maastricht, The Netherlands.,Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Leopold M G Curfs
- Rett Expertise Centre Netherlands - GKC, Maastricht University Medical Center, Maastricht, The Netherlands
| |
Collapse
|
34
|
Pittman ME, Edwards SW, Ives C, Mortensen HM. AOP-DB: A database resource for the exploration of Adverse Outcome Pathways through integrated association networks. Toxicol Appl Pharmacol 2018; 343:71-83. [PMID: 29454060 DOI: 10.1016/j.taap.2018.02.006] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Revised: 01/31/2018] [Accepted: 02/13/2018] [Indexed: 02/07/2023]
Abstract
The Adverse Outcome Pathway (AOP) framework describes the progression of a toxicity pathway from molecular perturbation to population-level outcome in a series of measurable, mechanistic responses. The controlled, computer-readable vocabulary that defines an AOP has the ability to, automatically and on a large scale, integrate AOP knowledge with publically available sources of biological high-throughput data and its derived associations. To support the discovery and development of putative (existing) and potential AOPs, we introduce the AOP-DB, an exploratory database resource that aggregates association relationships between genes and their related chemicals, diseases, pathways, species orthology information, ontologies, and gene interactions. These associations are mined from publically available annotation databases and are integrated with the AOP information centralized in the AOP-Wiki, allowing for the automatic characterization of both putative and potential AOPs in the context of multiple areas of biological information, referred to here as "biological entities". The AOP-DB acts as a hypothesis-generation tool for the expansion of putative AOPs, as well as the characterization of potential AOPs, through the creation of association networks across these biological entities. Finally, the AOP-DB provides a useful interface between the AOP framework and existing chemical screening and prioritization efforts by the US Environmental Protection Agency.
Collapse
Affiliation(s)
- Maureen E Pittman
- Oak Ridge Associated Universities, Research Triangle Park, NC 27709, USA
| | - Stephen W Edwards
- US Environmental Protection Agency, Office of Research and Development (ORD), National Health and Environmental Effects Laboratory, Integrated Systems Toxicology Division, Research Triangle Park, NC 27709, USA
| | - Cataia Ives
- Oak Ridge Associated Universities, Research Triangle Park, NC 27709, USA
| | - Holly M Mortensen
- US Environmental Protection Agency, Office of Research and Development (ORD), National Health and Environmental Effects Laboratory, Research Cores Unit, Research Triangle Park, NC 27709, USA.
| |
Collapse
|
35
|
Dorel M, Viara E, Barillot E, Zinovyev A, Kuperstein I. NaviCom: a web application to create interactive molecular network portraits using multi-level omics data. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017; 2017:3098441. [PMID: 28415074 PMCID: PMC5467574 DOI: 10.1093/database/bax026] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2016] [Accepted: 01/08/2017] [Indexed: 12/21/2022]
Abstract
Human diseases such as cancer are routinely characterized by high-throughput molecular technologies, and multi-level omics data are accumulated in public databases at increasing rate. Retrieval and visualization of these data in the context of molecular network maps can provide insights into the pattern of regulation of molecular functions reflected by an omics profile. In order to make this task easy, we developed NaviCom, a Python package and web platform for visualization of multi-level omics data on top of biological network maps. NaviCom is bridging the gap between cBioPortal, the most used resource of large-scale cancer omics data and NaviCell, a data visualization web service that contains several molecular network map collections. NaviCom proposes several standardized modes of data display on top of molecular network maps, allowing addressing specific biological questions. We illustrate how users can easily create interactive network-based cancer molecular portraits via NaviCom web interface using the maps of Atlas of Cancer Signalling Network (ACSN) and other maps. Analysis of these molecular portraits can help in formulating a scientific hypothesis on the molecular mechanisms deregulated in the studied disease. Database URL: NaviCom is available at https://navicom.curie.fr
Collapse
Affiliation(s)
- Mathurin Dorel
- Institut Curie, 26 rue d'Ulm, F-75005 Paris, France.,Inserm, U900 F-75005, Paris France.,Mines Paris Tech, F-77305 Cedex Fontainebleau, France.,PSL Research University, Paris F-75005, France.,Ecole Normale Supérieure, 46 rue d'Ulm, Paris, France.,Institute of Pathology and Institute for Theoretical Biology, Charite - Universitätsmedizin Berlin, Chariteplatz 1, Berlin 10117, Germany
| | | | - Emmanuel Barillot
- Institut Curie, 26 rue d'Ulm, F-75005 Paris, France.,Inserm, U900 F-75005, Paris France.,Mines Paris Tech, F-77305 Cedex Fontainebleau, France.,PSL Research University, Paris F-75005, France
| | - Andrei Zinovyev
- Institut Curie, 26 rue d'Ulm, F-75005 Paris, France.,Inserm, U900 F-75005, Paris France.,Mines Paris Tech, F-77305 Cedex Fontainebleau, France.,PSL Research University, Paris F-75005, France
| | - Inna Kuperstein
- Institut Curie, 26 rue d'Ulm, F-75005 Paris, France.,Inserm, U900 F-75005, Paris France.,Mines Paris Tech, F-77305 Cedex Fontainebleau, France.,PSL Research University, Paris F-75005, France
| |
Collapse
|
36
|
Guardia GD, Ferreira Pires L, da Silva EG, de Farias CR. SemanticSCo: A platform to support the semantic composition of services for gene expression analysis. J Biomed Inform 2017; 66:116-128. [DOI: 10.1016/j.jbi.2016.12.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Revised: 11/27/2016] [Accepted: 12/31/2016] [Indexed: 10/20/2022]
|
37
|
Greenwood PL, Bishop-Hurley GJ, González LA, Ingham AB. Development and application of a livestock phenomics platform to enhance productivity and efficiency at pasture. ANIMAL PRODUCTION SCIENCE 2016. [DOI: 10.1071/an15400] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Our capacity to measure performance- and efficiency-related phenotypes in grazing livestock in a timely manner, ideally in real-time without human interference, has been severely limited. Future demands and constraints on grazing livestock production will require a step change beyond our current approaches to obtaining phenotypic data. Animal phenomics is a relatively new term that describes the next generation of animal trait measurement, including methodologies and equipment used to acquire data on traits, and computational approaches required to turn data into phenotypic information. Phenomics offers a range of emerging opportunities to define new traits specific to grazing livestock, including intake and efficiency at pasture, and to measure many traits simultaneously or at a level of detail previously unachievable in the grazing environment. Application of this approach to phenotyping can improve the precision with which nutritional and other management strategies are applied, enable development of predictive biological traits, and accelerate the rate at which genetic gain is achieved for existing and new traits. In the present paper, we briefly outline the potential for livestock phenomics and describe (1) on-animal sensory-based approaches to develop traits diagnostic of productivity and efficiency, as well as resilience, health and welfare and (2) on-farm methods for data collection that drive management solutions to reduce input costs and accelerate genetic gain. The technological and analytical challenges associated with these objectives are also briefly considered, along with a brief overview of a promising field of work in which phenomics will affect animal agriculture, namely efficiency at pasture.
Collapse
|