1
|
Jing X, Cimino JJ, Patel VL, Zhou Y, Shubrook JH, De Lacalle S, Draghi BN, Ernst MA, Weaver A, Sekar S, Liu C. Data-driven hypothesis generation among inexperienced clinical researchers: A comparison of secondary data analyses with visualization (VIADS) and other tools. J Clin Transl Sci 2024; 8:e13. [PMID: 38384898 PMCID: PMC10880005 DOI: 10.1017/cts.2023.708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 11/21/2023] [Accepted: 12/20/2023] [Indexed: 02/23/2024] Open
Abstract
Objectives To compare how clinical researchers generate data-driven hypotheses with a visual interactive analytic tool (VIADS, a visual interactive analysis tool for filtering and summarizing large datasets coded with hierarchical terminologies) or other tools. Methods We recruited clinical researchers and separated them into "experienced" and "inexperienced" groups. Participants were randomly assigned to a VIADS or control group within the groups. Each participant conducted a remote 2-hour study session for hypothesis generation with the same study facilitator on the same datasets by following a think-aloud protocol. Screen activities and audio were recorded, transcribed, coded, and analyzed. Hypotheses were evaluated by seven experts on their validity, significance, and feasibility. We conducted multilevel random effect modeling for statistical tests. Results Eighteen participants generated 227 hypotheses, of which 147 (65%) were valid. The VIADS and control groups generated a similar number of hypotheses. The VIADS group took a significantly shorter time to generate one hypothesis (e.g., among inexperienced clinical researchers, 258 s versus 379 s, p = 0.046, power = 0.437, ICC = 0.15). The VIADS group received significantly lower ratings than the control group on feasibility and the combination rating of validity, significance, and feasibility. Conclusion The role of VIADS in hypothesis generation seems inconclusive. The VIADS group took a significantly shorter time to generate each hypothesis. However, the combined validity, significance, and feasibility ratings of their hypotheses were significantly lower. Further characterization of hypotheses, including specifics on how they might be improved, could guide future tool development.
Collapse
Affiliation(s)
- Xia Jing
- Department of Public Health Sciences, College of Behavioral, Social and Health Sciences, Clemson University, Clemson, SC, USA
| | - James J. Cimino
- Informatics Institute, School of Medicine, University of Alabama, Birmingham, AL, USA
| | - Vimla L. Patel
- Cognitive Studies in Medicine and Public Health, The New York Academy of Medicine, New York City, NY, USA
| | - Yuchun Zhou
- Department of Educational Studies, The Patton College of Education, Ohio University, Athens, OH, USA
| | - Jay H. Shubrook
- Department of Clinical Sciences and Community Health, College of Osteopathic Medicine, Touro University California, Vallejo, CA, USA
| | - Sonsoles De Lacalle
- Department of Health Science, California State University Channel Islands, Camarillo, CA, USA
| | - Brooke N. Draghi
- Department of Public Health Sciences, College of Behavioral, Social and Health Sciences, Clemson University, Clemson, SC, USA
| | - Mytchell A. Ernst
- Department of Public Health Sciences, College of Behavioral, Social and Health Sciences, Clemson University, Clemson, SC, USA
| | - Aneesa Weaver
- Department of Public Health Sciences, College of Behavioral, Social and Health Sciences, Clemson University, Clemson, SC, USA
| | - Shriram Sekar
- Electrical Engineering and Computer Science, Russ College of Engineering and Technology, Ohio University, Athens, OH, USA
| | - Chang Liu
- Russ College of Engineering and Technology, Ohio University, Athens, OH, USA
| |
Collapse
|
2
|
Jing X, Cimino JJ, Patel VL, Zhou Y, Shubrook JH, De Lacalle S, Draghi BN, Ernst MA, Weaver A, Sekar S, Liu C. Data-driven hypothesis generation among inexperienced clinical researchers: A comparison of secondary data analyses with visualization (VIADS) and other tools. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.05.30.23290719. [PMID: 37333271 PMCID: PMC10274969 DOI: 10.1101/2023.05.30.23290719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Objectives To compare how clinical researchers generate data-driven hypotheses with a visual interactive analytic tool (VIADS, a visual interactive analysis tool for filtering and summarizing large data sets coded with hierarchical terminologies) or other tools. Methods We recruited clinical researchers and separated them into "experienced" and "inexperienced" groups. Participants were randomly assigned to a VIADS or control group within the groups. Each participant conducted a remote 2-hour study session for hypothesis generation with the same study facilitator on the same datasets by following a think-aloud protocol. Screen activities and audio were recorded, transcribed, coded, and analyzed. Hypotheses were evaluated by seven experts on their validity, significance, and feasibility. We conducted multilevel random effect modeling for statistical tests. Results Eighteen participants generated 227 hypotheses, of which 147 (65%) were valid. The VIADS and control groups generated a similar number of hypotheses. The VIADS group took a significantly shorter time to generate one hypothesis (e.g., among inexperienced clinical researchers, 258 seconds versus 379 seconds, p = 0.046, power = 0.437, ICC = 0.15). The VIADS group received significantly lower ratings than the control group on feasibility and the combination rating of validity, significance, and feasibility. Conclusion The role of VIADS in hypothesis generation seems inconclusive. The VIADS group took a significantly shorter time to generate each hypothesis. However, the combined validity, significance, and feasibility ratings of their hypotheses were significantly lower. Further characterization of hypotheses, including specifics on how they might be improved, could guide future tool development.
Collapse
Affiliation(s)
- Xia Jing
- Department of Public Health Sciences, Clemson University, Clemson, SC
| | - James J Cimino
- Informatics Institute, School of Medicine, University of Alabama, Birmingham, Birmingham, AL
| | - Vimla L Patel
- Cognitive Studies in Medicine and Public Health, The New York Academy of Medicine, New York City, NY
| | - Yuchun Zhou
- Patton College of Education, Ohio University, Athens, OH
| | - Jay H Shubrook
- College of Osteopathic Medicine, Touro University, Vallejo, CA
| | - Sonsoles De Lacalle
- Department of Health Science, California State University Channel Islands, Camarillo, CA
| | - Brooke N Draghi
- Department of Public Health Sciences, Clemson University, Clemson, SC
| | - Mytchell A Ernst
- Department of Public Health Sciences, Clemson University, Clemson, SC
| | - Aneesa Weaver
- Department of Public Health Sciences, Clemson University, Clemson, SC
| | - Shriram Sekar
- Schoole of Computing, Clemson University, Clemson, SC
| | - Chang Liu
- Russ College of Engineering and Technology, Ohio University, Athens, OH
| |
Collapse
|
3
|
Jing X, Draghi BN, Ernst MA, Patel VL, Cimino JJ, Shubrook JH, Zhou Y, Liu C, De Lacalle S. How do clinical researchers generate data-driven scientific hypotheses? Cognitive events using think-aloud protocol. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.10.31.23297860. [PMID: 37961555 PMCID: PMC10635246 DOI: 10.1101/2023.10.31.23297860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Objectives This study aims to identify the cognitive events related to information use (e.g., "Analyze data", "Seek connection") during hypothesis generation among clinical researchers. Specifically, we describe hypothesis generation using cognitive event counts and compare them between groups. Methods The participants used the same datasets, followed the same scripts, used VIADS (a visual interactive analysis tool for filtering and summarizing large data sets coded with hierarchical terminologies) or other analytical tools (as control) to analyze the datasets, and came up with hypotheses while following the think-aloud protocol. Their screen activities and audio were recorded and then transcribed and coded for cognitive events. Results The VIADS group exhibited the lowest mean number of cognitive events per hypothesis and the smallest standard deviation. The experienced clinical researchers had approximately 10% more valid hypotheses than the inexperienced group. The VIADS users among the inexperienced clinical researchers exhibit a similar trend as the experienced clinical researchers in terms of the number of cognitive events and their respective percentages out of all the cognitive events. The highest percentages of cognitive events in hypothesis generation were "Using analysis results" (30%) and "Seeking connections" (23%). Conclusion VIADS helped inexperienced clinical researchers use fewer cognitive events to generate hypotheses than the control group. This suggests that VIADS may guide participants to be more structured during hypothesis generation compared with the control group. The results provide evidence to explain the shorter average time needed by the VIADS group in generating each hypothesis.
Collapse
Affiliation(s)
- Xia Jing
- Department of Public Health Sciences, Clemson University, Clemson, SC
| | - Brooke N Draghi
- Department of Public Health Sciences, Clemson University, Clemson, SC
| | - Mytchell A Ernst
- Department of Public Health Sciences, Clemson University, Clemson, SC
| | - Vimla L Patel
- Cognitive Studies in Medicine and Public Health, The New York Academy of Medicine, New York City, NY
| | - James J Cimino
- Informatics Institute, School of Medicine, University of Alabama, Birmingham, Birmingham, AL
| | - Jay H Shubrook
- College of Osteopathic Medicine, Touro University, Vallejo, CA
| | - Yuchun Zhou
- Patton College of Education, Ohio University, Athens, OH
| | - Chang Liu
- Russ College of Engineering and Technology, Ohio University, Athens, OH
| | - Sonsoles De Lacalle
- Department of Health Science, California State University Channel Islands, Camarillo, CA
| |
Collapse
|
4
|
Slater K, Williams JA, Schofield PN, Russell S, Pendleton SC, Karwath A, Fanning H, Ball S, Hoehndorf R, Gkoutos GV. Klarigi: Characteristic explanations for semantic biomedical data. Comput Biol Med 2023; 153:106425. [PMID: 36638616 DOI: 10.1016/j.compbiomed.2022.106425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 12/04/2022] [Accepted: 12/13/2022] [Indexed: 12/24/2022]
Abstract
Annotation of biomedical entities with ontology classes provides for formal semantic analysis and mobilisation of background knowledge in determining their relationships. To date, enrichment analysis has been routinely employed to identify classes that are over-represented in annotations across sets of groups, such as biosample gene expression profiles or patient phenotypes, and is useful for a range of tasks including differential diagnosis and causative variant prioritisation. These approaches, however, usually consider only univariate relationships, make limited use of the semantic features of ontologies, and provide limited information and evaluation of the explanatory power of both singular and grouped candidate classes. Moreover, they are not designed to solve the problem of deriving cohesive, characteristic, and discriminatory sets of classes for entity groups. We have developed a new tool, called Klarigi, which introduces multiple scoring heuristics for identification of classes that are both compositional and discriminatory for groups of entities annotated with ontology classes. The tool includes a novel algorithm for derivation of multivariable semantic explanations for entity groups, makes use of semantic inference through live use of an ontology reasoner, and includes a classification method for identifying the discriminatory power of candidate sets, in addition to significance testing apposite to traditional enrichment approaches. We describe the design and implementation of Klarigi, including its scoring and explanation determination methods, and evaluate its use in application to two test cases with clinical significance, comparing and contrasting methods and results with literature-based and enrichment analysis methods. We demonstrate that Klarigi produces characteristic and discriminatory explanations for groups of biomedical entities in two settings. We also show that these explanations recapitulate and extend the knowledge held in existing biomedical databases and literature for several diseases. We conclude that Klarigi provides a distinct and valuable perspective on biomedical datasets when compared with traditional enrichment methods, and therefore constitutes a new method by which biomedical datasets can be explored, contributing to improved insight into semantic data.
Collapse
Affiliation(s)
- Karin Slater
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; MRC Health Data Research UK (HDR UK), Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK.
| | - John A Williams
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Paul N Schofield
- Department of Physiology, Development, and Neuroscience, University of Cambridge, UK
| | - Sophie Russell
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
| | - Samantha C Pendleton
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
| | - Andreas Karwath
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; MRC Health Data Research UK (HDR UK), Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Hilary Fanning
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Simon Ball
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, UK
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; NIHR Experimental Cancer Medicine Centre, UK; NIHR Surgical Reconstruction and Microbiology Research Centre, UK; NIHR Biomedical Research Centre, UK; MRC Health Data Research UK (HDR UK), Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| |
Collapse
|
5
|
Shen F, Peng S, Fan Y, Wen A, Liu S, Wang Y, Wang L, Liu H. HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology. J Biomed Inform 2019; 96:103246. [PMID: 31255713 DOI: 10.1016/j.jbi.2019.103246] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 06/25/2019] [Accepted: 06/26/2019] [Indexed: 11/25/2022]
Abstract
BACKGROUND In precision medicine, deep phenotyping is defined as the precise and comprehensive analysis of phenotypic abnormalities, aiming to acquire a better understanding of the natural history of a disease and its genotype-phenotype associations. Detecting phenotypic relevance is an important task when translating precision medicine into clinical practice, especially for patient stratification tasks based on deep phenotyping. In our previous work, we developed node embeddings for the Human Phenotype Ontology (HPO) to assist in phenotypic relevance measurement incorporating distributed semantic representations. However, the derived HPO embeddings hold only distributed representations for IS-A relationships among nodes, hampering the ability to fully explore the graph. METHODS In this study, we developed a framework, HPO2Vec+, to enrich the produced HPO embeddings with heterogeneous knowledge resources (i.e., DECIPHER, OMIM, and Orphanet) for detecting phenotypic relevance. Specifically, we parsed disease-phenotype associations contained in these three resources to enrich non-inheritance relationships among phenotypic nodes in the HPO. To generate node embeddings for the HPO, node2vec was applied to perform node sampling on the enriched HPO graphs based on random walk followed by feature learning over the sampled nodes to generate enriched node embeddings. Four HPO embeddings were generated based on different graph structures, which we hereafter label as HPOEmb-Original, HPOEmb-DECIPHER, HPOEmb-OMIM, and HPOEmb-Orphanet. We evaluated the derived embeddings quantitatively through an HPO link prediction task with four edge embeddings operations and six machine learning algorithms. The resulting best embeddings were then evaluated for patient stratification of 10 rare diseases using electronic health records (EHR) collected at Mayo Clinic. We assessed our framework qualitatively by visualizing phenotypic clusters and conducting a use case study on primary hyperoxaluria (PH), a rare disease, on the task of inferring relevant phenotypes given 22 annotated PH related phenotypes. RESULTS The quantitative link prediction task shows that HPOEmb-Orphanet achieved an optimal AUROC of 0.92 and an average precision of 0.94. In addition, HPOEmb-Orphanet achieved an optimal F1 score of 0.86. The quantitative patient similarity measurement task indicates that HPOEmb-Orphanet achieved the highest average detection rate for similar patients over 10 rare diseases and performed better than other similarity measures implemented by an existing tool, HPOSim, especially for pairwise patients with fewer shared common phenotypes. The qualitative evaluation shows that the enriched HPO embeddings are generally able to detect relationships among nodes with fine granularity and HPOEmb-Orphanet is particularly good at associating phenotypes across different disease systems. For the use case of detecting relevant phenotypic characterizations for given PH related phenotypes, HPOEmb-Orphanet outperformed the other three HPO embeddings by achieving the highest average P@5 of 0.81 and the highest P@10 of 0.79. Compared to seven conventional similarity measurements provided by HPOSim, HPOEmb-Orphanet is able to detect more relevant phenotypic pairs, especially for pairs not in inheritance relationships. CONCLUSION We drew the following conclusions based on the evaluation results. First, with additional non-inheritance edges, enriched HPO embeddings can detect more associations between fine granularity phenotypic nodes regardless of their topological structures in the HPO graph. Second, HPOEmb-Orphanet not only can achieve the optimal performance through link prediction and patient stratification based on phenotypic similarity, but is also able to detect relevant phenotypes closer to domain expert's judgments than other embeddings and conventional similarity measurements. Third, incorporating heterogeneous knowledge resources do not necessarily result in better performance for detecting relevant phenotypes. From a clinical perspective, in our use case study, clinical-oriented knowledge resources (e.g., Orphanet) can achieve better performance in detecting relevant phenotypic characterizations compared to biomedical-oriented knowledge resources (e.g., DECIPHER and OMIM).
Collapse
Affiliation(s)
- Feichen Shen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
| | - Suyuan Peng
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA; The Second Clinical College Guangzhou University of Chinese Medicine, China
| | - Yadan Fan
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA; Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA
| | - Andrew Wen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Sijia Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Yanshan Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Liwei Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
6
|
Ge SX, Son EW, Yao R. iDEP: an integrated web application for differential expression and pathway analysis of RNA-Seq data. BMC Bioinformatics 2018; 19:534. [PMID: 30567491 PMCID: PMC6299935 DOI: 10.1186/s12859-018-2486-6] [Citation(s) in RCA: 698] [Impact Index Per Article: 116.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 11/12/2018] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND RNA-seq is widely used for transcriptomic profiling, but the bioinformatics analysis of resultant data can be time-consuming and challenging, especially for biologists. We aim to streamline the bioinformatic analyses of gene-level data by developing a user-friendly, interactive web application for exploratory data analysis, differential expression, and pathway analysis. RESULTS iDEP (integrated Differential Expression and Pathway analysis) seamlessly connects 63 R/Bioconductor packages, 2 web services, and comprehensive annotation and pathway databases for 220 plant and animal species. The workflow can be reproduced by downloading customized R code and related pathway files. As an example, we analyzed an RNA-Seq dataset of lung fibroblasts with Hoxa1 knockdown and revealed the possible roles of SP1 and E2F1 and their target genes, including microRNAs, in blocking G1/S transition. In another example, our analysis shows that in mouse B cells without functional p53, ionizing radiation activates the MYC pathway and its downstream genes involved in cell proliferation, ribosome biogenesis, and non-coding RNA metabolism. In wildtype B cells, radiation induces p53-mediated apoptosis and DNA repair while suppressing the target genes of MYC and E2F1, and leads to growth and cell cycle arrest. iDEP helps unveil the multifaceted functions of p53 and the possible involvement of several microRNAs such as miR-92a, miR-504, and miR-30a. In both examples, we validated known molecular pathways and generated novel, testable hypotheses. CONCLUSIONS Combining comprehensive analytic functionalities with massive annotation databases, iDEP ( http://ge-lab.org/idep/ ) enables biologists to easily translate transcriptomic and proteomic data into actionable insights.
Collapse
Affiliation(s)
- Steven Xijin Ge
- Department of Mathematics and Statistics, South Dakota State University, Box 2225, Brookings, SD 57007 USA
| | - Eun Wo Son
- Department of Mathematics and Statistics, South Dakota State University, Box 2225, Brookings, SD 57007 USA
| | - Runan Yao
- Department of Mathematics and Statistics, South Dakota State University, Box 2225, Brookings, SD 57007 USA
| |
Collapse
|
7
|
Rodríguez-García MÁ, Hoehndorf R. Inferring ontology graph structures using OWL reasoning. BMC Bioinformatics 2018; 19:7. [PMID: 29304741 PMCID: PMC5756413 DOI: 10.1186/s12859-017-1999-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2017] [Accepted: 12/13/2017] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Ontologies are representations of a conceptualization of a domain. Traditionally, ontologies in biology were represented as directed acyclic graphs (DAG) which represent the backbone taxonomy and additional relations between classes. These graphs are widely exploited for data analysis in the form of ontology enrichment or computation of semantic similarity. More recently, ontologies are developed in a formal language such as the Web Ontology Language (OWL) and consist of a set of axioms through which classes are defined or constrained. While the taxonomy of an ontology can be inferred directly from the axioms of an ontology as one of the standard OWL reasoning tasks, creating general graph structures from OWL ontologies that exploit the ontologies' semantic content remains a challenge. RESULTS We developed a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes. Searching for (existential) patterns in the deductive closure of ontologies, we can identify relations between classes that are implied but not asserted and generate graph structures that encode for a large part of the ontologies' semantic content. We demonstrate the advantages of our method by applying it to inference of protein-protein interactions through semantic similarity over the Gene Ontology and demonstrate that performance is increased when graph structures are inferred using deductive inference according to our method. Our software and experiment results are available at http://github.com/bio-ontology-research-group/Onto2Graph . CONCLUSIONS Onto2Graph is a method to generate graph structures from OWL ontologies using automated reasoning. The resulting graphs can be used for improved ontology visualization and ontology-based data analysis.
Collapse
Affiliation(s)
- Miguel Ángel Rodríguez-García
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900 Kingdom of Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900 Kingdom of Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900 Kingdom of Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900 Kingdom of Saudi Arabia
| |
Collapse
|
8
|
Özcan S, Alessio N, Acar MB, Mert E, Omerli F, Peluso G, Galderisi U. Unbiased analysis of senescence associated secretory phenotype (SASP) to identify common components following different genotoxic stresses. Aging (Albany NY) 2017; 8:1316-29. [PMID: 27288264 PMCID: PMC4993333 DOI: 10.18632/aging.100971] [Citation(s) in RCA: 179] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Accepted: 05/28/2016] [Indexed: 01/10/2023]
Abstract
Senescent cells secrete senescence-associated secretory phenotype (SASP) proteins to carry out several functions, such as sensitizing surrounding cells to senesce; immunomodulation; impairing or fostering cancer growth; and promoting tissue development. Identifying secreted factors that achieve such tasks is a challenging issue since the profile of secreted proteins depends on genotoxic stress and cell type. Currently, researchers are trying to identify common markers for SASP. The present investigation compared the secretome composition of five different senescent phenotypes in two different cell types: bone marrow and adipose mesenchymal stromal cells (MSC). We induced MSC senescence by oxidative stress, doxorubicin treatment, X-ray irradiation, and replicative exhaustion. We took advantage of LC-MS/MS proteome identification and subsequent gene ontology (GO) evaluation to perform an unbiased analysis (hypothesis free manner) of senescent secretomes. GO analysis allowed us to distribute SASP components into four classes: extracellular matrix/cytoskeleton/cell junctions; metabolic processes; ox-redox factors; and regulators of gene expression. We used Ingenuity Pathway Analysis (IPA) to determine common pathways among the different senescent phenotypes. This investigation, along with identification of eleven proteins that were exclusively expressed in all the analyzed senescent phenotypes, permitted the identification of three key signaling paths: MMP2 - TIMP2; IGFBP3 - PAI-1; and Peroxiredoxin 6 - ERP46 - PARK7 - Cathepsin D - Major vault protein. We suggest that these paths could be involved in the paracrine circuit that induces senescence in neighboring cells and may confer apoptosis resistance to senescent cells.
Collapse
Affiliation(s)
- Servet Özcan
- Genome and Stem Cell Center (GENKOK), Erciyes University, Kayseri, Turkey.,Department of Biology, Faculty of Sciences, Erciyes University, Kayseri, Turkey
| | - Nicola Alessio
- Department of Experimental Medicine, Biotechnology and Molecular Biology Section, Second University of Naples, Naples, Italy
| | - Mustafa B Acar
- Genome and Stem Cell Center (GENKOK), Erciyes University, Kayseri, Turkey.,Department of Biology, Faculty of Sciences, Erciyes University, Kayseri, Turkey
| | - Eda Mert
- Genome and Stem Cell Center (GENKOK), Erciyes University, Kayseri, Turkey.,Department of Biology, Faculty of Sciences, Erciyes University, Kayseri, Turkey
| | - Fatih Omerli
- Genome and Stem Cell Center (GENKOK), Erciyes University, Kayseri, Turkey.,Department of Biology, Faculty of Sciences, Erciyes University, Kayseri, Turkey
| | | | - Umberto Galderisi
- Genome and Stem Cell Center (GENKOK), Erciyes University, Kayseri, Turkey.,Sbarro Institute for Cancer Research and Molecular Medicine, Center for Biotechnology, Temple University, Philadelphia, PA 19122, USA.,Department of Experimental Medicine, Biotechnology and Molecular Biology Section, Second University of Naples, Naples, Italy
| |
Collapse
|
9
|
Shaina H, UlAbdin Z, Webb BA, Arif MJ, Jamil A. De novo sequencing and transcriptome analysis of venom glands of endoparasitoid Aenasius arizonensis (Girault) (=Aenasius bambawalei Hayat) (Hymenoptera, Encyrtidae). Toxicon 2016; 121:134-144. [PMID: 27594666 DOI: 10.1016/j.toxicon.2016.08.022] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Revised: 08/11/2016] [Accepted: 08/31/2016] [Indexed: 12/25/2022]
Abstract
Aenasius bambawalei Hayat (Encyrtidae: Hymenoptera) has been synonymized with Aenasius arizonensis (Girault) is a small, newly discovered endoparasitoid of the cotton mealybug Phenacoccuss solenopsis Tinsley (Pseudococcidae: Hemiptera), which completes its life cycle inside the body of its host and it is a potential insect control tool. Despite the acquired knowledge regarding host-parasitoid interaction, little information is available on the factors of parasitoid origin able to modulate mealybug physiology. The components of A. arizonensis venom have not been well studied but venom from other parasitoids and wasps contain biologically active proteins that have potential applications in pest management or may be of medicinal importance. To provide an insight into the transcripts expressed in the venom gland of A. arizonensis, a transcriptomic database was developed utilizing high throughput RNA sequencing approaches to analyze the genes expressed in venom glands of this endoparasitic wasp. The resulting A. arizonensis RNA sequences were assembled de-novo with contigs then blasted against the NCBI non-redundant sequence database. Contigs which matched database sequences were mostly homologous to genes from hymenopteran parasitoids such as Nasonia vitripennis, Copidosoma floridanum, Fopius arsenus and Pteromalas puparium. Further analysis of the A. arizonensis database was then performed which focused on selected genes encoding proteins potentially involved in host developmental arrest, disrupting the host immune system, host paralysis, and transcripts that support these functions. Sequenced mRNAS predicted to encode full length ORFs of Calreticulin, Serine Protease Precursor and Arginine kinase proteins were identified and the tissue specific expression of these putative venom genes was analyzed by RT-PCR. In addition, results also demonstrate that de novo transcriptome assembly allows useful venom gene expression analysis in a species lacking a genome sequence database and may provide useful information for devising control tools for insect pests and other applications.
Collapse
Affiliation(s)
- Hoor Shaina
- Department of Entomology, University of Agriculture Faisalabad, Pakistan
| | - Zain UlAbdin
- Department of Entomology, University of Agriculture Faisalabad, Pakistan.
| | - Bruce A Webb
- Department of Entomology, University of Kentucky, Lexington, USA.
| | | | - Amer Jamil
- Department of Biochemistry, University of Agriculture Faisalabad, Pakistan
| |
Collapse
|
10
|
Siebert AL, Wheeler D, Werren JH. A new approach for investigating venom function applied to venom calreticulin in a parasitoid wasp. Toxicon 2015; 107:304-16. [PMID: 26359852 DOI: 10.1016/j.toxicon.2015.08.012] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Revised: 08/11/2015] [Accepted: 08/19/2015] [Indexed: 12/20/2022]
Abstract
A new method is developed to investigate functions of venom components, using venom gene RNA interference knockdown in the venomous animal coupled with RNA sequencing in the envenomated host animal. The vRNAi/eRNA-Seq approach is applied to the venom calreticulin component (v-crc) of the parasitoid wasp Nasonia vitripennis. Parasitoids are common, venomous animals that inject venom proteins into host insects, where they modulate physiology and metabolism to produce a better food resource for the parasitoid larvae. vRNAi/eRNA-Seq indicates that v-crc acts to suppress expression of innate immune cell response, enhance expression of clotting genes in the host, and up-regulate cuticle genes. V-crc KD also results in an increased melanization reaction immediately following envenomation. We propose that v-crc inhibits innate immune response to parasitoid venom and reduces host bleeding during adult and larval parasitoid feeding. Experiments do not support the hypothesis that v-crc is required for the developmental arrest phenotype observed in envenomated hosts. We propose that an important role for some venom components is to reduce (modulate) the exaggerated effects of other venom components on target host gene expression, physiology, and survival, and term this venom mitigation. A model is developed that uses vRNAi/eRNA-Seq to quantify the contribution of individual venom components to total venom phenotypes, and to define different categories of mitigation by individual venoms on host gene expression. Mitigating functions likely contribute to the diversity of venom proteins in parasitoids and other venomous organisms.
Collapse
Affiliation(s)
- Aisha L Siebert
- Department of Clinical and Translational Science, University of Rochester School of Medicine and Dentistry, Rochester, NY 14642, USA; Department of Biology, University of Rochester, Rochester, NY 14627, USA.
| | - David Wheeler
- Institute of Fundamental Science, Massey University, Palmerston North, 4442, New Zealand; Department of Biology, University of Rochester, Rochester, NY 14627, USA
| | - John H Werren
- Department of Biology, University of Rochester, Rochester, NY 14627, USA
| |
Collapse
|
11
|
Smith B, Arabandi S, Brochhausen M, Calhoun M, Ciccarese P, Doyle S, Gibaud B, Goldberg I, Kahn CE, Overton J, Tomaszewski J, Gurcan M. Biomedical imaging ontologies: A survey and proposal for future work. J Pathol Inform 2015; 6:37. [PMID: 26167381 PMCID: PMC4485195 DOI: 10.4103/2153-3539.159214] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 04/30/2015] [Indexed: 12/24/2022] Open
Abstract
Background: Ontology is one strategy for promoting interoperability of heterogeneous data through consistent tagging. An ontology is a controlled structured vocabulary consisting of general terms (such as “cell” or “image” or “tissue” or “microscope”) that form the basis for such tagging. These terms are designed to represent the types of entities in the domain of reality that the ontology has been devised to capture; the terms are provided with logical definitions thereby also supporting reasoning over the tagged data. Aim: This paper provides a survey of the biomedical imaging ontologies that have been developed thus far. It outlines the challenges, particularly faced by ontologies in the fields of histopathological imaging and image analysis, and suggests a strategy for addressing these challenges in the example domain of quantitative histopathology imaging. Results and Conclusions: The ultimate goal is to support the multiscale understanding of disease that comes from using interoperable ontologies to integrate imaging data with clinical and genomics data.
Collapse
Affiliation(s)
- Barry Smith
- Department of Philosophy, The State University of New York at Buffalo, Buffalo, NY 14260, USA
| | | | - Mathias Brochhausen
- Division of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Michael Calhoun
- Department of Health and Human Performance, Elon University, Elon, NC 27244, USA
| | - Paolo Ciccarese
- Harvard Medical School, Massachusetts General Hospital, PerkinElmer Innovation Labs, Boston, MA 02115, USA
| | - Scott Doyle
- Department of Pathology and Anatomical Sciences, University at Buffalo, The State University of New York, Buffalo, NY 14214, USA
| | - Bernard Gibaud
- Laboratoire du Traitement du Signal et de l'Image (LTSI), Inserm Unit 1099, University of Rennes 1, Rennes, France
| | - Ilya Goldberg
- National Institute on Aging, National Institutes of Health, Baltimore, MD 21224, USA
| | - Charles E Kahn
- Department of Radiology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - John Tomaszewski
- Department of Pathology and Anatomical Sciences, University at Buffalo, The State University of New York, Buffalo, NY 14214, USA
| | - Metin Gurcan
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
12
|
Deng Y, Gao L, Wang B, Guo X. HPOSim: an R package for phenotypic similarity measure and enrichment analysis based on the human phenotype ontology. PLoS One 2015; 10:e0115692. [PMID: 25664462 PMCID: PMC4321842 DOI: 10.1371/journal.pone.0115692] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2014] [Accepted: 11/25/2014] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Phenotypic features associated with genes and diseases play an important role in disease-related studies and most of the available methods focus solely on the Online Mendelian Inheritance in Man (OMIM) database without considering the controlled vocabulary. The Human Phenotype Ontology (HPO) provides a standardized and controlled vocabulary covering phenotypic abnormalities in human diseases, and becomes a comprehensive resource for computational analysis of human disease phenotypes. Most of the existing HPO-based software tools cannot be used offline and provide only few similarity measures. Therefore, there is a critical need for developing a comprehensive and offline software for phenotypic features similarity based on HPO. RESULTS HPOSim is an R package for analyzing phenotypic similarity for genes and diseases based on HPO data. Seven commonly used semantic similarity measures are implemented in HPOSim. Enrichment analysis of gene sets and disease sets are also implemented, including hypergeometric enrichment analysis and network ontology analysis (NOA). CONCLUSIONS HPOSim can be used to predict disease genes and explore disease-related function of gene modules. HPOSim is open source and freely available at SourceForge (https://sourceforge.net/p/hposim/).
Collapse
Affiliation(s)
- Yue Deng
- School of Computer Science and Technology, Xidian University, Xi'an, People’s Republic of China
- Institute of Software Engineering, Xidian University, Xi'an, People’s Republic of China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, People’s Republic of China
- * E-mail:
| | - Bingbo Wang
- School of Computer Science and Technology, Xidian University, Xi'an, People’s Republic of China
| | - Xingli Guo
- School of Computer Science and Technology, Xidian University, Xi'an, People’s Republic of China
| |
Collapse
|
13
|
Zykovich A, Hubbard A, Flynn JM, Tarnopolsky M, Fraga MF, Kerksick C, Ogborn D, MacNeil L, Mooney SD, Melov S. Genome-wide DNA methylation changes with age in disease-free human skeletal muscle. Aging Cell 2014; 13:360-6. [PMID: 24304487 PMCID: PMC3954952 DOI: 10.1111/acel.12180] [Citation(s) in RCA: 118] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/23/2013] [Indexed: 12/11/2022] Open
Abstract
A decline in skeletal muscle mass and function with aging is well recognized, but remains poorly characterized at the molecular level. Here, we report for the first time a genome-wide study of DNA methylation dynamics in skeletal muscle of healthy male individuals during normal human aging. We predominantly observed hypermethylation throughout the genome within the aged group as compared to the young subjects. Differentially methylated CpG (dmCpG) nucleotides tend to arise intragenically and are underrepresented in promoters and are overrepresented in the middle and 3′ end of genes. The intragenic methylation changes are overrepresented in genes that guide the formation of the junction of the motor neuron and myofibers. We report a low level of correlation of gene expression from previous studies of aged muscle with our current analysis of DNA methylation status. For those genes that had both changes in methylation and gene expression with age, we observed a reverse correlation, with the exception of intragenic hypermethylated genes that were correlated with an increased gene expression. We suggest that a minimal number of dmCpG sites or select sites are required to be altered in order to correlate with gene expression changes. Finally, we identified 500 dmCpG sites that perform well in discriminating young from old samples. Our findings highlight epigenetic links between aging postmitotic skeletal muscle and DNA methylation.
Collapse
Affiliation(s)
- Artem Zykovich
- Buck Institute for Research on Aging 8001 Redwood BlvdNovato CA 94945 USA
| | - Alan Hubbard
- Division of Biostatistics School of Public Health University of California 101 Haviland Hall MC 7358 Berkeley CA 94720 USA
| | - James M. Flynn
- Buck Institute for Research on Aging 8001 Redwood BlvdNovato CA 94945 USA
| | - Mark Tarnopolsky
- Neuromuscular and Neurometabolic Unit, Rm. 2H26 McMaster Children's Hospital McMaster University Medical Center 1200 Main St. W. Hamilton Ontario Canada L8N 3Z5
| | - Mario F. Fraga
- Cancer Epigenetics Laboratory Department of Immunology and Oncology Centro Nacional de Biotecnología/CNB‐CSIC Instituto Universitario de Oncología del Principado de Asturias (IUOPA) HUCA Universidad de Oviedo 33006Oviedo Spain
| | - Chad Kerksick
- Department of Health, Exercise and Sport Sciences University of New Mexico Albuquerque NM 87109 USA
| | - Dan Ogborn
- Neuromuscular and Neurometabolic Unit, Rm. 2H26 McMaster Children's Hospital McMaster University Medical Center 1200 Main St. W. Hamilton Ontario Canada L8N 3Z5
| | - Lauren MacNeil
- Neuromuscular and Neurometabolic Unit, Rm. 2H26 McMaster Children's Hospital McMaster University Medical Center 1200 Main St. W. Hamilton Ontario Canada L8N 3Z5
| | - Sean D. Mooney
- Buck Institute for Research on Aging 8001 Redwood BlvdNovato CA 94945 USA
| | - Simon Melov
- Buck Institute for Research on Aging 8001 Redwood BlvdNovato CA 94945 USA
| |
Collapse
|
14
|
Hoehndorf R, Haendel M, Stevens R, Rebholz-Schuhmann D. Thematic series on biomedical ontologies in JBMS: challenges and new directions. J Biomed Semantics 2014; 5:15. [PMID: 24602198 PMCID: PMC4006457 DOI: 10.1186/2041-1480-5-15] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2014] [Accepted: 02/09/2014] [Indexed: 01/08/2023] Open
Abstract
Over the past 15 years, the biomedical research community has increased its efforts to produce ontologies encoding biomedical knowledge, and to provide the corresponding infrastructure to maintain them. As ontologies are becoming a central part of biological and biomedical research, a communication channel to publish frequent updates and latest developments on them would be an advantage. Here, we introduce the JBMS thematic series on Biomedical Ontologies. The aim of the series is to disseminate the latest developments in research on biomedical ontologies and provide a venue for publishing newly developed ontologies, updates to existing ontologies as well as methodological advances, and selected contributions from conferences and workshops. We aim to give this thematic series a central role in the exploration of ongoing research in biomedical ontologies and intend to work closely together with the research community towards this aim. Researchers and working groups are encouraged to provide feedback on novel developments and special topics to be integrated into the existing publication cycles.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Department of Computer Science, Aberystwyth University, Llandinam Building, SY23 3DB Aberystwyth, UK
| | - Melissa Haendel
- OHSU Library and Department of Medical Informatics, Portland, Oregon, USA
- Department of Medical Informatics and Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | - Robert Stevens
- School of Computer Science, The University of Manchester, Oxford Road, M13 9PL Manchester, UK
| | - Dietrich Rebholz-Schuhmann
- Department of Computational Linguistics, University of Zürich, Binzmühlestrasse 14, 8050 Zürich, Switzerland
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
15
|
Schnoes AM, Ream DC, Thorman AW, Babbitt PC, Friedberg I. Biases in the experimental annotations of protein function and their effect on our understanding of protein function space. PLoS Comput Biol 2013; 9:e1003063. [PMID: 23737737 PMCID: PMC3667760 DOI: 10.1371/journal.pcbi.1003063] [Citation(s) in RCA: 84] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Accepted: 04/02/2013] [Indexed: 11/19/2022] Open
Abstract
The ongoing functional annotation of proteins relies upon the work of curators to capture experimental findings from scientific literature and apply them to protein sequence and structure data. However, with the increasing use of high-throughput experimental assays, a small number of experimental studies dominate the functional protein annotations collected in databases. Here, we investigate just how prevalent is the “few articles - many proteins” phenomenon. We examine the experimentally validated annotation of proteins provided by several groups in the GO Consortium, and show that the distribution of proteins per published study is exponential, with 0.14% of articles providing the source of annotations for 25% of the proteins in the UniProt-GOA compilation. Since each of the dominant articles describes the use of an assay that can find only one function or a small group of functions, this leads to substantial biases in what we know about the function of many proteins. Mass-spectrometry, microscopy and RNAi experiments dominate high throughput experiments. Consequently, the functional information derived from these experiments is mostly of the subcellular location of proteins, and of the participation of proteins in embryonic developmental pathways. For some organisms, the information provided by different studies overlap by a large amount. We also show that the information provided by high throughput experiments is less specific than those provided by low throughput experiments. Given the experimental techniques available, certain biases in protein function annotation due to high-throughput experiments are unavoidable. Knowing that these biases exist and understanding their characteristics and extent is important for database curators, developers of function annotation programs, and anyone who uses protein function annotation data to plan experiments. Experiments and observations are the vehicles used by science to understand the world around us. In the field of molecular biology, we are increasingly relying on high-throughput, genome-wide experiments to provide answers about the function of biological macromolecules. However, any experimental assay is essentially limited in the type of information it can discover. Here, we show that our increasing reliance on high-throughput experiments biases our understanding of protein function. While the primary source of information is experiments, the functions of many proteins are computationally annotated by sequence-based similarity, either directly or indirectly, to proteins whose function is experimentally determined. Therefore, any biases in experimental annotations can get amplified and entrenched in the majority of protein databases. We show here that high-throughput studies are biased towards certain aspects of protein function, and that they provide less information than low-throughput studies. While there is no clear solution to the phenomenon of bias from high-throughput experiments, recognizing its existence and its impact can help take steps to mitigate its effect.
Collapse
Affiliation(s)
- Alexandra M. Schnoes
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California, United States of America
| | - David C. Ream
- Department of Microbiology, Miami University, Oxford, Ohio, United States of America
| | - Alexander W. Thorman
- Department of Microbiology, Miami University, Oxford, Ohio, United States of America
| | - Patricia C. Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California, United States of America
| | - Iddo Friedberg
- Department of Microbiology, Miami University, Oxford, Ohio, United States of America
- Department of Computer Science and Software Engineering, Miami University, Oxford, Ohio, United States of America
- * E-mail:
| |
Collapse
|