Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: van Haagen HHHBM, 't Hoen PAC, Botelho Bovo A, de Morrée A, van Mulligen EM, Chichester C, Kors JA, den Dunnen JT, van Ommen GJB, van der Maarel SM, Kern VM, Mons B, Schuemie MJ. Novel protein-protein interactions inferred from literature context. PLoS One 2009;4:e7894. [PMID: 19924298 PMCID: PMC2774517 DOI: 10.1371/journal.pone.0007894] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2009] [Accepted: 10/09/2009] [Indexed: 01/15/2023] Open

For:	van Haagen HHHBM, 't Hoen PAC, Botelho Bovo A, de Morrée A, van Mulligen EM, Chichester C, Kors JA, den Dunnen JT, van Ommen GJB, van der Maarel SM, Kern VM, Mons B, Schuemie MJ. Novel protein-protein interactions inferred from literature context. PLoS One 2009;4:e7894. [PMID: 19924298 PMCID: PMC2774517 DOI: 10.1371/journal.pone.0007894] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2009] [Accepted: 10/09/2009] [Indexed: 01/15/2023] Open

Number

Cited by Other Article(s)

Schultes E, Roos M, Bonino da Silva Santos LO, Guizzardi G, Bouwman J, Hankemeier T, Baak A, Mons B. FAIR Digital Twins for Data-Intensive Research. Front Big Data 2022;5:883341. [PMID: 35647536 PMCID: PMC9130601 DOI: 10.3389/fdata.2022.883341] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 04/12/2022] [Indexed: 11/13/2022] Open

Theodosiou T, Papanikolaou N, Savvaki M, Bonetto G, Maxouri S, Fakoureli E, Eliopoulos AG, Tavernarakis N, Amoutzias GD, Pavlopoulos GA, Aivaliotis M, Nikoletopoulou V, Tzamarias D, Karagogeos D, Iliopoulos I. UniProt-Related Documents (UniReD): assisting wet lab biologists in their quest on finding novel counterparts in a protein network. NAR Genom Bioinform 2020;2:lqaa005. [PMID: 33575553 PMCID: PMC7671407 DOI: 10.1093/nargab/lqaa005] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Revised: 01/20/2020] [Accepted: 01/31/2020] [Indexed: 02/04/2023] Open

Affiliation(s)

Theodosios Theodosiou University of Crete, School of Medicine, Department of Basic Sciences, Heraklion 71003, Crete, Greece
Nikolaos Papanikolaou University of Crete, School of Medicine, Department of Basic Sciences, Heraklion 71003, Crete, Greece
Maria Savvaki University of Crete, School of Medicine, Department of Basic Sciences, Heraklion 71003, Crete, Greece.,Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology-Hellas, Nikolaou Plastira 100, 70013 Heraklion, Crete, Greece
Giulia Bonetto University of Crete, School of Medicine, Department of Basic Sciences, Heraklion 71003, Crete, Greece
Stella Maxouri University of Crete, School of Medicine, Department of Basic Sciences, Heraklion 71003, Crete, Greece.,Medical School of Patras University, Laboratory of General Biology, Asklipiou 1, 26500 Rio Patras, Greece
Eirini Fakoureli University of Crete, School of Medicine, Department of Basic Sciences, Heraklion 71003, Crete, Greece
Aristides G Eliopoulos Department of Biology, Medical School, National and Kapodistrian University of Athens, Mikras Asias 75, 11527 Athens, Greece
Nektarios Tavernarakis University of Crete, School of Medicine, Department of Basic Sciences, Heraklion 71003, Crete, Greece.,Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology-Hellas, Nikolaou Plastira 100, 70013 Heraklion, Crete, Greece
Grigoris D Amoutzias Bioinformatics Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, Larisa 41500, Greece
Georgios A Pavlopoulos Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", 34 Fleming Street, 16672 Vari, Greece
Michalis Aivaliotis Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology-Hellas, Nikolaou Plastira 100, 70013 Heraklion, Crete, Greece.,Laboratory of Biological Chemistry, Faculty of Health Sciences, School of Medicine, Aristotle University of Thessaloniki, GR-54124, Thessaloniki, Greece.,Functional Proteomics and Systems Biology (FunPATh), Center for Interdisciplinary Research and Innovation (CIRI-AUTH), Balkan Center, Thessaloniki, 10th km Thessaloniki-Thermi Rd, P.O.Box 8318, GR 57001, Greece
Vasiliki Nikoletopoulou Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology-Hellas, Nikolaou Plastira 100, 70013 Heraklion, Crete, Greece
Dimitris Tzamarias Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology-Hellas, Nikolaou Plastira 100, 70013 Heraklion, Crete, Greece
Domna Karagogeos University of Crete, School of Medicine, Department of Basic Sciences, Heraklion 71003, Crete, Greece.,Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology-Hellas, Nikolaou Plastira 100, 70013 Heraklion, Crete, Greece
Ioannis Iliopoulos University of Crete, School of Medicine, Department of Basic Sciences, Heraklion 71003, Crete, Greece

Collapse

Hatz S, Spangler S, Bender A, Studham M, Haselmayer P, Lacoste AMB, Willis VC, Martin RL, Gurulingappa H, Betz U. Identification of pharmacodynamic biomarker hypotheses through literature analysis with IBM Watson. PLoS One 2019;14:e0214619. [PMID: 30958864 PMCID: PMC6453528 DOI: 10.1371/journal.pone.0214619] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 03/16/2019] [Indexed: 12/12/2022] Open

Abstract

BACKGROUND

Pharmacodynamic biomarkers are becoming increasingly valuable for assessing drug activity and target modulation in clinical trials. However, identifying quality biomarkers is challenging due to the increasing volume and heterogeneity of relevant data describing the biological networks that underlie disease mechanisms. A biological pathway network typically includes entities (e.g. genes, proteins and chemicals/drugs) as well as the relationships between these and is typically curated or mined from structured databases and textual co-occurrence data. We propose a hybrid Natural Language Processing and directed relationships-based network analysis approach using IBM Watson for Drug Discovery to rank all human genes and identify potential candidate biomarkers, requiring only an initial determination of a specific target-disease relationship.

METHODS

Through natural language processing of scientific literature, Watson for Drug Discovery creates a network of semantic relationships between biological concepts such as genes, drugs, and diseases. Using Bruton's tyrosine kinase as a case study, Watson for Drug Discovery's automatically extracted relationship network was compared with a prominent manually curated physical interaction network. Additionally, potential biomarkers for Bruton's tyrosine kinase inhibition were predicted using a matrix factorization approach and subsequently compared with expert-generated biomarkers.

RESULTS

Watson's natural language processing generated a relationship network matching 55 (86%) genes upstream of BTK and 98 (95%) genes downstream of Bruton's tyrosine kinase in a prominent manually curated physical interaction network. Matrix factorization analysis predicted 11 of 13 genes identified by Merck subject matter experts in the top 20% of Watson for Drug Discovery's 13,595 ranked genes, with 7 in the top 5%.

CONCLUSION

Taken together, these results suggest that Watson for Drug Discovery's automatic relationship network identifies the majority of upstream and downstream genes in biological pathway networks and can be used to help with the identification and prioritization of pharmacodynamic biomarker evaluation, accelerating the early phases of disease hypothesis generation.

Collapse

Mons B. FAIR Science for Social Machines: Let's Share Metadata Knowlets in the Internet of FAIR Data and Services. DATA INTELLIGENCE 2019. [DOI: 10.1162/dint_a_00002] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

Abstract In a world awash with fragmented data and tools, the notion of Open Science has been gaining a lot of momentum, but simultaneously, it caused a great deal of anxiety. Some of the anxiety may be related to crumbling kingdoms, but there are also very legitimate concerns, especially about the relative role of machines and algorithms as compared to humans and the combination of both (i.e., social machines). There are also grave concerns about the connotations of the term “open”, but also regarding the unwanted side effects as well as the scalability of the approaches advocated by early adopters of new methodological developments. Many of these concerns are associated with mind-machine interaction and the critical role that computers are now playing in our day to day scientific practice. Here we address a number of these concerns and provide some possible solutions. FAIR (machine-actionable) data and services are obviously at the core of Open Science (or rather FAIR science). The scalable and transparent routing of data, tools and compute (to run the tools on) is a key central feature of the envisioned Internet of FAIR Data and Services (IFDS). Both the European Commission in its Declaration on the European Open Science Cloud, the G7, and the USA data commons have identified the need to ensure a solid and sustainable infrastructure for Open Science. Here we first define the term FAIR science as opposed to Open Science. In FAIR science, data and the associated tools are all Findable, Accessible under well defined conditions, Interoperable and Reusable, but not necessarily “open”; without restrictions and certainly not always “gratis”. The ambiguous term “open” has already caused considerable confusion and also opt-out reactions from researchers and other data-intensive professionals who cannot make their data open for very good reasons, such as patient privacy or national security. Although Open Science is a definition for a way of working rather than explicitly requesting for all data to be available in full Open Access, the connotation of openness of the data involved in Open Science is very strong. In FAIR science, data and the associated services to run all processes in the data stewardship cycle from design of experiment to capture to curation, processing, linking and analytics all have minimally FAIR metadata, which specify the conditions under which the actual underlying research objects are reusable, first for machines and then also for humans. This effectively means that—properly conducted—Open Science is part of FAIR science. However, FAIR science can also be done with partly closed, sensitive and proprietary data. As has been emphasized before, FAIR is not identical to “open”. In FAIR/Open Science, data should be as open as possible and as closed as necessary. Where data are generated using public funding, the default will usually be that for the FAIR data resulting from the study the accessibility will be as high as possible, and that more restrictive access and licensing policies on these data will have to be explicitly justified and described. In all cases, however, even if the reuse is restricted, data and related services should be findable for their major uses, machines, which will make them also much better findable for human users. With a tendency to make good data stewardship the norm, a very significant new market for distributed data analytics and learning is opening and a plethora of tools and reusable data objects are being developed and released. These all need FAIR metadata to be routed to each other and to be effective. Collapse

Botsis T, Foster M, Kreimeyer K, Pandey A, Forshee R. Monitoring biomedical literature for post-market safety purposes by analyzing networks of text-based coded information. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2017;2017:66-75. [PMID: 28815108 PMCID: PMC5543357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Emerging approaches in literature-based discovery: techniques and performance review. KNOWL ENG REV 2017. [DOI: 10.1017/s0269888917000042] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Hettne KM, Thompson M, van Haagen HHHBM, van der Horst E, Kaliyaperumal R, Mina E, Tatum Z, Laros JFJ, van Mulligen EM, Schuemie M, Aten E, Li TS, Bruskiewich R, Good BM, Su AI, Kors JA, den Dunnen J, van Ommen GJB, Roos M, ‘t Hoen PA, Mons B, Schultes EA. The Implicitome: A Resource for Rationalizing Gene-Disease Associations. PLoS One 2016;11:e0149621. [PMID: 26919047 PMCID: PMC4769089 DOI: 10.1371/journal.pone.0149621] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 02/03/2016] [Indexed: 11/19/2022] Open

Affiliation(s)

Kristina M. Hettne Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands * E-mail:
Mark Thompson Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Herman H. H. B. M. van Haagen Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Eelke van der Horst Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Rajaram Kaliyaperumal Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Eleni Mina Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Zuotian Tatum Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Jeroen F. J. Laros Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Erik M. van Mulligen Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands Department of Medical Informatics, Erasmus University Medical Center Rotterdam, Rotterdam, The Netherlands
Martijn Schuemie Department of Medical Informatics, Erasmus University Medical Center Rotterdam, Rotterdam, The Netherlands
Emmelien Aten Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Tong Shu Li Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, United States of America
Richard Bruskiewich STAR Informatics / Delphinai Corporation, Port Moody, BC, Canada
Benjamin M. Good Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, United States of America
Andrew I. Su Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, United States of America
Jan A. Kors Department of Medical Informatics, Erasmus University Medical Center Rotterdam, Rotterdam, The Netherlands
Johan den Dunnen Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Gert-Jan B. van Ommen Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Marco Roos Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Peter A.C. ‘t Hoen Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Barend Mons Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands Dutch Techcentre for Life Sciences, Utrecht, The Netherlands
Erik A. Schultes Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands Leiden Institute for Advanced Computer Science, Leiden, The Netherlands

Collapse

Laukens K, Naulaerts S, Berghe WV. Bioinformatics approaches for the functional interpretation of protein lists: from ontology term enrichment to network analysis. Proteomics 2015;15:981-96. [PMID: 25430566 DOI: 10.1002/pmic.201400296] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Revised: 10/16/2014] [Accepted: 11/24/2014] [Indexed: 12/24/2022]

Mina E, Thompson M, Kaliyaperumal R, Zhao J, der Horst VE, Tatum Z, Hettne KM, Schultes EA, Mons B, Roos M. Nanopublications for exposing experimental data in the life-sciences: a Huntington's Disease case study. J Biomed Semantics 2015;6:5. [PMID: 26464783 PMCID: PMC4603842 DOI: 10.1186/2041-1480-6-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2014] [Accepted: 10/31/2014] [Indexed: 12/20/2022] Open

Wang J, Zuo Y, Man YG, Avital I, Stojadinovic A, Liu M, Yang X, Varghese RS, Tadesse MG, Ressom HW. Pathway and network approaches for identification of cancer signature markers from omics data. J Cancer 2015;6:54-65. [PMID: 25553089 PMCID: PMC4278915 DOI: 10.7150/jca.10631] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Accepted: 11/14/2014] [Indexed: 12/12/2022] Open

Protein-protein interaction predictions using text mining methods. Methods 2014;74:47-53. [PMID: 25448298 DOI: 10.1016/j.ymeth.2014.10.026] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Revised: 09/05/2014] [Accepted: 10/21/2014] [Indexed: 01/10/2023] Open

Borland AM, Hartwell J, Weston DJ, Schlauch KA, Tschaplinski TJ, Tuskan GA, Yang X, Cushman JC. Engineering crassulacean acid metabolism to improve water-use efficiency. TRENDS IN PLANT SCIENCE 2014;19:327-38. [PMID: 24559590 PMCID: PMC4065858 DOI: 10.1016/j.tplants.2014.01.006] [Citation(s) in RCA: 122] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2013] [Revised: 01/01/2014] [Accepted: 01/13/2014] [Indexed: 05/19/2023]

Dumontier M, Baker CJ, Baran J, Callahan A, Chepelev L, Cruz-Toledo J, Del Rio NR, Duck G, Furlong LI, Keath N, Klassen D, McCusker JP, Queralt-Rosinach N, Samwald M, Villanueva-Rosales N, Wilkinson MD, Hoehndorf R. The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery. J Biomed Semantics 2014;5:14. [PMID: 24602174 PMCID: PMC4015691 DOI: 10.1186/2041-1480-5-14] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 02/02/2014] [Indexed: 11/10/2022] Open

Gradmann S. From containers to content to context. JOURNAL OF DOCUMENTATION 2014. [DOI: 10.1108/jd-05-2013-0058] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Abstract Purpose – The aim of this paper is to reposition the research library in the context of the changing information and knowledge architecture at the end of the “Gutenberg Parenthesis” and as part of the rapidly emerging “semantic” environment of the Linked Open Data paradigm. Understanding this process requires a good understanding of the evolution of the “document” notion in the passage from print based culture to the distributed hypertextual and RDF based information architecture of the WWW. Design/methodology/approach – These objectives are reached using literature study and a descriptive historical approach as well as text mining techniques using Google nGrams as a data source. Findings – The paper presents a proposal for effectively repositioning research libraries in the context of eScience and eScholarship as well as clear indications of the proposed repositioning already taking place. Furthermore, a new perspective of the “document” notion is provided. Practical implications – The evolution described in the contribution creates opportunities for libraries to reposition themselves as aggregators and selectors of content and as contextualising agents as part of future Linked Data based scholarly research environments provided they are able and ready to operate the related cultural changes. Originality/value – The paper will be useful for practitioners in search of strategic guidance for repositioning their librarian institutions in a context of ever increasing competition for scarce funding resources. Collapse

Coelho ED, Arrais JP, Matos S, Pereira C, Rosa N, Correia MJ, Barros M, Oliveira JL. Computational prediction of the human-microbial oral interactome. BMC SYSTEMS BIOLOGY 2014;8:24. [PMID: 24576332 PMCID: PMC3975954 DOI: 10.1186/1752-0509-8-24] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Accepted: 02/17/2014] [Indexed: 11/12/2022]

Abstract

BACKGROUND

The oral cavity is a complex ecosystem where human chemical compounds coexist with a particular microbiota. However, shifts in the normal composition of this microbiota may result in the onset of oral ailments, such as periodontitis and dental caries. In addition, it is known that the microbial colonization of the oral cavity is mediated by protein-protein interactions (PPIs) between the host and microorganisms. Nevertheless, this kind of PPIs is still largely undisclosed. To elucidate these interactions, we have created a computational prediction method that allows us to obtain a first model of the Human-Microbial oral interactome.

RESULTS

We collected high-quality experimental PPIs from five major human databases. The obtained PPIs were used to create our positive dataset and, indirectly, our negative dataset. The positive and negative datasets were merged and used for training and validation of a naïve Bayes classifier. For the final prediction model, we used an ensemble methodology combining five distinct PPI prediction techniques, namely: literature mining, primary protein sequences, orthologous profiles, biological process similarity, and domain interactions. Performance evaluation of our method revealed an area under the ROC-curve (AUC) value greater than 0.926, supporting our primary hypothesis, as no single set of features reached an AUC greater than 0.877. After subjecting our dataset to the prediction model, the classified result was filtered for very high confidence PPIs (probability ≥ 1-10-7), leading to a set of 46,579 PPIs to be further explored.

CONCLUSIONS

We believe this dataset holds not only important pathways involved in the onset of infectious oral diseases, but also potential drug-targets and biomarkers. The dataset used for training and validation, the predictions obtained and the network final network are available at http://bioinformatics.ua.pt/software/oralint.

Collapse

Pavlopoulos GA, Promponas VJ, Ouzounis CA, Iliopoulos I. Biological information extraction and co-occurrence analysis. Methods Mol Biol 2014;1159:77-92. [PMID: 24788262 DOI: 10.1007/978-1-4939-0709-0_5] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]

van Haagen HHHBM, 't Hoen PAC, Mons B, Schultes EA. Generic information can retrieve known biological associations: implications for biomedical knowledge discovery. PLoS One 2013;8:e78665. [PMID: 24260124 PMCID: PMC3834066 DOI: 10.1371/journal.pone.0078665] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2013] [Accepted: 09/13/2013] [Indexed: 02/01/2023] Open

Abstract

Motivation

Weighted semantic networks built from text-mined literature can be used to retrieve known protein-protein or gene-disease associations, and have been shown to anticipate associations years before they are explicitly stated in the literature. Our text-mining system recognizes over 640,000 biomedical concepts: some are specific (i.e., names of genes or proteins) others generic (e.g., ‘Homo sapiens’). Generic concepts may play important roles in automated information retrieval, extraction, and inference but may also result in concept overload and confound retrieval and reasoning with low-relevance or even spurious links. Here, we attempted to optimize the retrieval performance for protein-protein interactions (PPI) by filtering generic concepts (node filtering) or links to generic concepts (edge filtering) from a weighted semantic network. First, we defined metrics based on network properties that quantify the specificity of concepts. Then using these metrics, we systematically filtered generic information from the network while monitoring retrieval performance of known protein-protein interactions. We also systematically filtered specific information from the network (inverse filtering), and assessed the retrieval performance of networks composed of generic information alone.

Results

Filtering generic or specific information induced a two-phase response in retrieval performance: initially the effects of filtering were minimal but beyond a critical threshold network performance suddenly drops. Contrary to expectations, networks composed exclusively of generic information demonstrated retrieval performance comparable to unfiltered networks that also contain specific concepts. Furthermore, an analysis using individual generic concepts demonstrated that they can effectively support the retrieval of known protein-protein interactions. For instance the concept “binding” is indicative for PPI retrieval and the concept “mutation abnormality” is indicative for gene-disease associations.

Conclusion

Generic concepts are important for information retrieval and cannot be removed from semantic networks without negative impact on retrieval performance.

Collapse

Botsis T, Ball R. Automating case definitions using literature-based reasoning. Appl Clin Inform 2013;4:515-27. [PMID: 24454579 PMCID: PMC3885912 DOI: 10.4338/aci-2013-04-ra-0028] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Accepted: 10/08/2013] [Indexed: 11/23/2022] Open

Abstract

BACKGROUND

Establishing a Case Definition (CDef) is a first step in many epidemiological, clinical, surveillance, and research activities. The application of CDefs still relies on manual steps and this is a major source of inefficiency in surveillance and research.

OBJECTIVE

Describe the need and propose an approach for automating the useful representation of CDefs for medical conditions.

METHODS

We translated the existing Brighton Collaboration CDef for anaphylaxis by mostly relying on the identification of synonyms for the criteria of the CDef using the NLM MetaMap tool. We also generated a CDef for the same condition using all the related PubMed abstracts, processing them with a text mining tool, and further treating the synonyms with the above strategy. The co-occurrence of the anaphylaxis and any other medical term within the same sentence of the abstracts supported the construction of a large semantic network. The 'islands' algorithm reduced the network and revealed its densest region including the nodes that were used to represent the key criteria of the CDef. We evaluated the ability of the "translated" and the "generated" CDef to classify a set of 6034 H1N1 reports for anaphylaxis using two similarity approaches and comparing them with our previous semi-automated classification approach.

RESULTS

Overall classification performance across approaches to producing CDefs was similar, with the generated CDef and vector space model with cosine similarity having the highest accuracy (0.825 ± 0.003) and the semi-automated approach and vector space model with cosine similarity having the highest recall (0.809 ± 0.042). Precision was low for all approaches.

CONCLUSION

The useful representation of CDefs is a complicated task but potentially offers substantial gains in efficiency to support safety and clinical surveillance.

Collapse

de Vries B, Eising E, Broos LAM, Koelewijn SC, Todorov B, Frants RR, Boer JM, Ferrari MD, Hoen PAC', van den Maagdenberg AMJM. RNA expression profiling in brains of familial hemiplegic migraine type 1 knock-in mice. Cephalalgia 2013;34:174-82. [PMID: 23985897 DOI: 10.1177/0333102413502736] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Vos R, Aarts S, van Mulligen E, Metsemakers J, van Boxtel MP, Verhey F, van den Akker M. Finding potentially new multimorbidity patterns of psychiatric and somatic diseases: exploring the use of literature-based discovery in primary care research. J Am Med Inform Assoc 2013;21:139-45. [PMID: 23775174 DOI: 10.1136/amiajnl-2012-001448] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open

Abstract

BACKGROUND

Multimorbidity, the co-occurrence of two or more chronic medical conditions within a single individual, is increasingly becoming part of daily care of general medical practice. Literature-based discovery may help to investigate the patterns of multimorbidity and to integrate medical knowledge for improving healthcare delivery for individuals with co-occurring chronic conditions.

OBJECTIVE

To explore the usefulness of literature-based discovery in primary care research through the key-case of finding associations between psychiatric and somatic diseases relevant to general practice in a large biomedical literature database (Medline).

METHODS

By using literature based discovery for matching disease profiles as vectors in a high-dimensional associative concept space, co-occurrences of a broad spectrum of chronic medical conditions were matched for their potential in biomedicine. An experimental setting was chosen in parallel with expert evaluations and expert meetings to assess performance and to generate targets for integrating literature-based discovery in multidisciplinary medical research of psychiatric and somatic disease associations.

RESULTS

Through stepwise reductions a reference set of 21,945 disease combinations was generated, from which a set of 166 combinations between psychiatric and somatic diseases was selected and assessed by text mining and expert evaluation.

CONCLUSIONS

Literature-based discovery tools generate specific patterns of associations between psychiatric and somatic diseases: one subset was appraised as promising for further research; the other subset surprised the experts, leading to intricate discussions and further eliciting of frameworks of biomedical knowledge. These frameworks enable us to specify targets for further developing and integrating literature-based discovery in multidisciplinary research of general practice, psychology and psychiatry, and epidemiology.

Collapse

Li C, Jimeno-Yepes A, Arregui M, Kirsch H, Rebholz-Schuhmann D. PCorral--interactive mining of protein interactions from MEDLINE. Database (Oxford) 2013;2013:bat030. [PMID: 23640984 PMCID: PMC3641755 DOI: 10.1093/database/bat030] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2012] [Revised: 03/15/2013] [Accepted: 03/27/2013] [Indexed: 11/13/2022]

A protein prioritization approach tailored for the FA/BRCA pathway. PLoS One 2013;8:e62017. [PMID: 23620800 PMCID: PMC3631253 DOI: 10.1371/journal.pone.0062017] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Accepted: 03/15/2013] [Indexed: 11/22/2022] Open

Abstract

Fanconi anemia (FA) is a heterogeneous recessive disorder associated with a markedly elevated risk to develop cancer. To date sixteen FA genes have been identified, three of which predispose heterozygous mutation carriers to breast cancer. The FA proteins work together in a genome maintenance pathway, the so-called FA/BRCA pathway which is important during the S phase of the cell cycle. Since not all FA patients can be linked to (one of) the sixteen known complementation groups, new FA genes remain to be identified. In addition the complex FA network remains to be further unravelled. One of the FA genes, FANCI, has been identified via a combination of bioinformatic techniques exploiting FA protein properties and genetic linkage. The aim of this study was to develop a prioritization approach for proteins of the entire human proteome that potentially interact with the FA/BRCA pathway or are novel candidate FA genes. To this end, we combined the original bioinformatics approach based on the properties of the first thirteen FA proteins identified with publicly available tools for protein-protein interactions, literature mining (Nermal) and a protein function prediction tool (FuncNet). Importantly, the three newest FA proteins FANCO/RAD51C, FANCP/SLX4, and XRCC2 displayed scores in the range of the already known FA proteins. Likewise, a prime candidate FA gene based on next generation sequencing and having a very low score was subsequently disproven by functional studies for the FA phenotype. Furthermore, the approach strongly enriches for GO terms such as DNA repair, response to DNA damage stimulus, and cell cycle-regulated genes. Additionally, overlaying the top 150 with a haploinsufficiency probability score, renders the approach more tailored for identifying breast cancer related genes. This approach may be useful for prioritization of putative novel FA or breast cancer genes from next generation sequencing efforts.

Collapse

Rebholz-Schuhmann D, Oellrich A, Hoehndorf R. Text-mining solutions for biomedical research: enabling integrative biology. Nat Rev Genet 2012;13:829-39. [DOI: 10.1038/nrg3337] [Citation(s) in RCA: 170] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Williams AJ, Harland L, Groth P, Pettifer S, Chichester C, Willighagen EL, Evelo CT, Blomberg N, Ecker G, Goble C, Mons B. Open PHACTS: semantic interoperability for drug discovery. Drug Discov Today 2012;17:1188-98. [PMID: 22683805 DOI: 10.1016/j.drudis.2012.05.016] [Citation(s) in RCA: 172] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2012] [Revised: 05/18/2012] [Accepted: 05/31/2012] [Indexed: 01/22/2023]

Tang YT, Kao HY. Augmented transitive relationships with high impact protein distillation in protein interaction prediction. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2012;1824:1468-75. [PMID: 22683815 DOI: 10.1016/j.bbapap.2012.05.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2012] [Revised: 05/18/2012] [Accepted: 05/30/2012] [Indexed: 11/16/2022]

van Mulligen EM, Fourrier-Reglat A, Gurwitz D, Molokhia M, Nieto A, Trifiro G, Kors JA, Furlong LI. The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships. J Biomed Inform 2012;45:879-84. [PMID: 22554700 DOI: 10.1016/j.jbi.2012.04.004] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2011] [Revised: 02/02/2012] [Accepted: 04/11/2012] [Indexed: 11/25/2022]

Hossain MS, Gresock J, Edmonds Y, Helm R, Potts M, Ramakrishnan N. Connecting the dots between PubMed abstracts. PLoS One 2012;7:e29509. [PMID: 22235301 PMCID: PMC3250456 DOI: 10.1371/journal.pone.0029509] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 11/29/2011] [Indexed: 11/23/2022] Open

Abstract

Background

There are now a multitude of articles published in a diversity of journals providing information about genes, proteins, pathways, and diseases. Each article investigates subsets of a biological process, but to gain insight into the functioning of a system as a whole, we must integrate information from multiple publications. Particularly, unraveling relationships between extra-cellular inputs and downstream molecular response mechanisms requires integrating conclusions from diverse publications.

Methodology

We present an automated approach to biological knowledge discovery from PubMed abstracts, suitable for “connecting the dots” across the literature. We describe a storytelling algorithm that, given a start and end publication, typically with little or no overlap in content, identifies a chain of intermediate publications from one to the other, such that neighboring publications have significant content similarity. The quality of discovered stories is measured using local criteria such as the size of supporting neighborhoods for each link and the strength of individual links connecting publications, as well as global metrics of dispersion. To ensure that the story stays coherent as it meanders from one publication to another, we demonstrate the design of novel coherence and overlap filters for use as post-processing steps.

Conclusions

We demonstrate the application of our storytelling algorithm to three case studies: i) a many-one study exploring relationships between multiple cellular inputs and a molecule responsible for cell-fate decisions, ii) a many-many study exploring the relationships between multiple cytokines and multiple downstream transcription factors, and iii) a one-to-one study to showcase the ability to recover a cancer related association, viz. the Warburg effect, from past literature. The storytelling pipeline helps narrow down a scientist's focus from several hundreds of thousands of relevant documents to only around a hundred stories. We argue that our approach can serve as a valuable discovery aid for hypothesis generation and connection exploration in large unstructured biological knowledge bases.

Collapse

Smalheiser NR. Literature-based discovery: Beyond the ABCs. ACTA ACUST UNITED AC 2011. [DOI: 10.1002/asi.21599] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Xu L, Furlotte N, Lin Y, Heinrich K, Berry MW, George EO, Homayouni R. Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts. PLoS One 2011;6:e18851. [PMID: 21533142 PMCID: PMC3077411 DOI: 10.1371/journal.pone.0018851] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2010] [Accepted: 03/21/2011] [Indexed: 12/31/2022] Open

Ligthart L, de Vries B, Smith AV, Ikram MA, Amin N, Hottenga JJ, Koelewijn SC, Kattenberg VM, de Moor MHM, Janssens ACJW, Aulchenko YS, Oostra BA, de Geus EJC, Smit JH, Zitman FG, Uitterlinden AG, Hofman A, Willemsen G, Nyholt DR, Montgomery GW, Terwindt GM, Gudnason V, Penninx BWJH, Breteler M, Ferrari MD, Launer LJ, van Duijn CM, van den Maagdenberg AMJM, Boomsma DI. Meta-analysis of genome-wide association for migraine in six population-based European cohorts. Eur J Hum Genet 2011;19:901-7. [PMID: 21448238 PMCID: PMC3172930 DOI: 10.1038/ejhg.2011.48] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open

The value of data. Nat Genet 2011;43:281-3. [DOI: 10.1038/ng0411-281] [Citation(s) in RCA: 101] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

van Haagen HHHBM, 't Hoen PAC, de Morrée A, van Roon-Mom WMC, Peters DJM, Roos M, Mons B, van Ommen GJ, Schuemie MJ. In silico discovery and experimental validation of new protein-protein interactions. Proteomics 2011;11:843-53. [DOI: 10.1002/pmic.201000398] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2010] [Revised: 10/17/2010] [Accepted: 11/25/2010] [Indexed: 01/27/2023]

van Haagen H, Mons B. In silico knowledge and content tracking. Methods Mol Biol 2011;760:129-140. [PMID: 21779994 DOI: 10.1007/978-1-61779-176-5_8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Harmston N, Filsell W, Stumpf MPH. What the papers say: text mining for genomics and systems biology. Hum Genomics 2010;5:17-29. [PMID: 21106487 PMCID: PMC3500154 DOI: 10.1186/1479-7364-5-1-17] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2010] [Accepted: 08/06/2010] [Indexed: 12/11/2022] Open

Abstract

Keeping up with the rapidly growing literature has become virtually impossible for most scientists. This can have dire consequences. First, we may waste research time and resources on reinventing the wheel simply because we can no longer maintain a reliable grasp on the published literature. Second, and perhaps more detrimental, judicious (or serendipitous) combination of knowledge from different scientific disciplines, which would require following disparate and distinct research literatures, is rapidly becoming impossible for even the most ardent readers of research publications. Text mining - the automated extraction of information from (electronically) published sources - could potentially fulfil an important role - but only if we know how to harness its strengths and overcome its weaknesses. As we do not expect that the rate at which scientific results are published will decrease, text mining tools are now becoming essential in order to cope with, and derive maximum benefit from, this information explosion. In genomics, this is particularly pressing as more and more rare disease-causing variants are found and need to be understood. Not being conversant with this technology may put scientists and biomedical regulators at a severe disadvantage. In this review, we introduce the basic concepts underlying modern text mining and its applications in genomics and systems biology. We hope that this review will serve three purposes: (i) to provide a timely and useful overview of the current status of this field, including a survey of present challenges; (ii) to enable researchers to decide how and when to apply text mining tools in their own research; and (iii) to highlight how the research communities in genomics and systems biology can help to make text mining from biomedical abstracts and texts more straightforward.

Collapse

Biomedical semantics: the hub for biomedical research 2.0. J Biomed Semantics 2010;1:1. [PMID: 20618983 PMCID: PMC2895735 DOI: 10.1186/2041-1480-1-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2010] [Accepted: 03/31/2010] [Indexed: 11/10/2022] Open