101
|
Shimizu K, Nuida K, Arai H, Mitsunari S, Attrapadung N, Hamada M, Tsuda K, Hirokawa T, Sakuma J, Hanaoka G, Asai K. Privacy-preserving search for chemical compound databases. BMC Bioinformatics 2015; 16 Suppl 18:S6. [PMID: 26678650 PMCID: PMC4704467 DOI: 10.1186/1471-2105-16-s18-s6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Background Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources. Results In order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holder's privacy and database holder's privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation. Conclusion We proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information.
Collapse
|
102
|
Pharmacophore Models and Pharmacophore-Based Virtual Screening: Concepts and Applications Exemplified on Hydroxysteroid Dehydrogenases. Molecules 2015; 20:22799-832. [PMID: 26703541 PMCID: PMC6332202 DOI: 10.3390/molecules201219880] [Citation(s) in RCA: 95] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2015] [Revised: 12/03/2015] [Accepted: 12/09/2015] [Indexed: 01/06/2023] Open
Abstract
Computational methods are well-established tools in the drug discovery process and can be employed for a variety of tasks. Common applications include lead identification and scaffold hopping, as well as lead optimization by structure-activity relationship analysis and selectivity profiling. In addition, compound-target interactions associated with potentially harmful effects can be identified and investigated. This review focuses on pharmacophore-based virtual screening campaigns specifically addressing the target class of hydroxysteroid dehydrogenases. Many members of this enzyme family are associated with specific pathological conditions, and pharmacological modulation of their activity may represent promising therapeutic strategies. On the other hand, unintended interference with their biological functions, e.g., upon inhibition by xenobiotics, can disrupt steroid hormone-mediated effects, thereby contributing to the development and progression of major diseases. Besides a general introduction to pharmacophore modeling and pharmacophore-based virtual screening, exemplary case studies from the field of short-chain dehydrogenase/reductase (SDR) research are presented. These success stories highlight the suitability of pharmacophore modeling for the various application fields and suggest its application also in futures studies.
Collapse
|
103
|
Hofmann-Apitius M, Ball G, Gebel S, Bagewadi S, de Bono B, Schneider R, Page M, Kodamullil AT, Younesi E, Ebeling C, Tegnér J, Canard L. Bioinformatics Mining and Modeling Methods for the Identification of Disease Mechanisms in Neurodegenerative Disorders. Int J Mol Sci 2015; 16:29179-206. [PMID: 26690135 PMCID: PMC4691095 DOI: 10.3390/ijms161226148] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Revised: 11/10/2015] [Accepted: 11/12/2015] [Indexed: 12/22/2022] Open
Abstract
Since the decoding of the Human Genome, techniques from bioinformatics, statistics, and machine learning have been instrumental in uncovering patterns in increasing amounts and types of different data produced by technical profiling technologies applied to clinical samples, animal models, and cellular systems. Yet, progress on unravelling biological mechanisms, causally driving diseases, has been limited, in part due to the inherent complexity of biological systems. Whereas we have witnessed progress in the areas of cancer, cardiovascular and metabolic diseases, the area of neurodegenerative diseases has proved to be very challenging. This is in part because the aetiology of neurodegenerative diseases such as Alzheimer´s disease or Parkinson´s disease is unknown, rendering it very difficult to discern early causal events. Here we describe a panel of bioinformatics and modeling approaches that have recently been developed to identify candidate mechanisms of neurodegenerative diseases based on publicly available data and knowledge. We identify two complementary strategies-data mining techniques using genetic data as a starting point to be further enriched using other data-types, or alternatively to encode prior knowledge about disease mechanisms in a model based framework supporting reasoning and enrichment analysis. Our review illustrates the challenges entailed in integrating heterogeneous, multiscale and multimodal information in the area of neurology in general and neurodegeneration in particular. We conclude, that progress would be accelerated by increasing efforts on performing systematic collection of multiple data-types over time from each individual suffering from neurodegenerative disease. The work presented here has been driven by project AETIONOMY; a project funded in the course of the Innovative Medicines Initiative (IMI); which is a public-private partnership of the European Federation of Pharmaceutical Industry Associations (EFPIA) and the European Commission (EC).
Collapse
Affiliation(s)
- Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Institutszentrum Birlinghoven, Sankt Augustin D-53754, Germany.
- Rheinische Friedrich-Wilhelms-Universitaet Bonn, University of Bonn, Bonn 53113, Germany.
| | - Gordon Ball
- Unit of Computational Medicine, Center for Molecular Medicine, Department of Medicine, and Unit of Clinical Epidemiology, Karolinska University Hospital, Stockholm SE-171 77, Sweden.
- Science for Life Laboratories, Karolinska Institutet, Stockholm SE-171 77, Sweden.
| | - Stephan Gebel
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg.
| | - Shweta Bagewadi
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Institutszentrum Birlinghoven, Sankt Augustin D-53754, Germany.
| | - Bernard de Bono
- Institute of Health Informatics, University College London, London NW1 2DA, UK.
- Auckland Bioengineering Institute, University of Auckland, Symmonds Street, Auckland 1142, New Zealand.
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg.
| | - Matt Page
- Translational Bioinformatics, UCB Pharma, 216 Bath Rd, Slough SL1 3WE, UK.
| | - Alpha Tom Kodamullil
- Rheinische Friedrich-Wilhelms-Universitaet Bonn, University of Bonn, Bonn 53113, Germany.
| | - Erfan Younesi
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Institutszentrum Birlinghoven, Sankt Augustin D-53754, Germany.
| | - Christian Ebeling
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Institutszentrum Birlinghoven, Sankt Augustin D-53754, Germany.
| | - Jesper Tegnér
- Unit of Computational Medicine, Center for Molecular Medicine, Department of Medicine, and Unit of Clinical Epidemiology, Karolinska University Hospital, Stockholm SE-171 77, Sweden.
- Science for Life Laboratories, Karolinska Institutet, Stockholm SE-171 77, Sweden.
| | - Luc Canard
- Translational Science Unit, SANOFI Recherche & Développement, 1 Avenue Pierre Brossolette, Chilly-Mazarin Cedex 91385, France.
| |
Collapse
|
104
|
|
105
|
César-Razquin A, Snijder B, Frappier-Brinton T, Isserlin R, Gyimesi G, Bai X, Reithmeier RA, Hepworth D, Hediger MA, Edwards AM, Superti-Furga G. A Call for Systematic Research on Solute Carriers. Cell 2015; 162:478-87. [PMID: 26232220 DOI: 10.1016/j.cell.2015.07.022] [Citation(s) in RCA: 392] [Impact Index Per Article: 43.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Indexed: 01/10/2023]
Abstract
Solute carrier (SLC) membrane transport proteins control essential physiological functions, including nutrient uptake, ion transport, and waste removal. SLCs interact with several important drugs, and a quarter of the more than 400 SLC genes are associated with human diseases. Yet, compared to other gene families of similar stature, SLCs are relatively understudied. The time is right for a systematic attack on SLC structure, specificity, and function, taking into account kinship and expression, as well as the dependencies that arise from the common metabolic space.
Collapse
Affiliation(s)
- Adrián César-Razquin
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Berend Snijder
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | | | - Ruth Isserlin
- The Donnelly Centre, University of Toronto, Toronto, Ontario, M5S 3E1, Canada
| | - Gergely Gyimesi
- Institute of Biochemistry and Molecular Medicine and Swiss National Center of Competence in Research, NCCR TransCure, University of Bern, 3012 Bern, Switzerland
| | - Xiaoyun Bai
- Department of Biochemistry, University of Toronto, Toronto, Ontario, M5S 1A8 Canada
| | | | - David Hepworth
- Worldwide Medicinal Chemistry, Pfizer Worldwide Research and Development, Cambridge, MA 02139, USA
| | - Matthias A Hediger
- Institute of Biochemistry and Molecular Medicine and Swiss National Center of Competence in Research, NCCR TransCure, University of Bern, 3012 Bern, Switzerland.
| | - Aled M Edwards
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario M5G 1L7, Canada.
| | - Giulio Superti-Furga
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria; Center for Physiology and Pharmacology, Medical University of Vienna, 1090 Vienna, Austria.
| |
Collapse
|
106
|
Kutmon M, Riutta A, Nunes N, Hanspers K, Willighagen EL, Bohler A, Mélius J, Waagmeester A, Sinha SR, Miller R, Coort SL, Cirillo E, Smeets B, Evelo CT, Pico AR. WikiPathways: capturing the full diversity of pathway knowledge. Nucleic Acids Res 2015; 44:D488-94. [PMID: 26481357 PMCID: PMC4702772 DOI: 10.1093/nar/gkv1024] [Citation(s) in RCA: 298] [Impact Index Per Article: 33.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 09/28/2015] [Indexed: 12/19/2022] Open
Abstract
WikiPathways (http://www.wikipathways.org) is an open, collaborative platform for capturing and disseminating models of biological pathways for data visualization and analysis. Since our last NAR update, 4 years ago, WikiPathways has experienced massive growth in content, which continues to be contributed by hundreds of individuals each year. New aspects of the diversity and depth of the collected pathways are described from the perspective of researchers interested in using pathway information in their studies. We provide updates on extensions and services to support pathway analysis and visualization via popular standalone tools, i.e. PathVisio and Cytoscape, web applications and common programming environments. We introduce the Quick Edit feature for pathway authors and curators, in addition to new means of publishing pathways and maintaining custom pathway collections to serve specific research topics and communities. In addition to the latest milestones in our pathway collection and curation effort, we also highlight the latest means to access the content as publishable figures, as standard data files, and as linked data, including bulk and programmatic access.
Collapse
Affiliation(s)
- Martina Kutmon
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER Maastricht, The Netherlands Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, 6229 ER Maastricht, The Netherlands
| | - Anders Riutta
- Gladstone Institutes, San Francisco, California, CA 94158, USA
| | - Nuno Nunes
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER Maastricht, The Netherlands
| | | | - Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER Maastricht, The Netherlands
| | - Anwesha Bohler
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER Maastricht, The Netherlands
| | - Jonathan Mélius
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER Maastricht, The Netherlands
| | - Andra Waagmeester
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER Maastricht, The Netherlands Micelio, Antwerp, 2180 Antwerp, Belgium
| | - Sravanthi R Sinha
- Keshav Memorial Institute of Technology, Hyderabad, Telangana 500029, India
| | - Ryan Miller
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER Maastricht, The Netherlands
| | - Susan L Coort
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER Maastricht, The Netherlands
| | - Elisa Cirillo
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER Maastricht, The Netherlands
| | - Bart Smeets
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER Maastricht, The Netherlands
| | - Chris T Evelo
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER Maastricht, The Netherlands Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, 6229 ER Maastricht, The Netherlands
| | - Alexander R Pico
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, 6229 ER Maastricht, The Netherlands
| |
Collapse
|
107
|
Garcia-Serna R, Vidal D, Remez N, Mestres J. Large-Scale Predictive Drug Safety: From Structural Alerts to Biological Mechanisms. Chem Res Toxicol 2015; 28:1875-87. [PMID: 26360911 DOI: 10.1021/acs.chemrestox.5b00260] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The recent explosion of data linking drugs, proteins, and pathways with safety events has promoted the development of integrative systems approaches to large-scale predictive drug safety. The added value of such approaches is that, beyond the traditional identification of potentially labile chemical fragments for selected toxicity end points, they have the potential to provide mechanistic insights for a much larger and diverse set of safety events in a statistically sound nonsupervised manner, based on the similarity to drug classes, the interaction with secondary targets, and the interference with biological pathways. The combined identification of chemical and biological hazards enhances our ability to assess the safety risk of bioactive small molecules with higher confidence than that using structural alerts only. We are still a very long way from reliably predicting drug safety, but advances toward gaining a better understanding of the mechanisms leading to adverse outcomes represent a step forward in this direction.
Collapse
Affiliation(s)
- Ricard Garcia-Serna
- Chemotargets SL , Parc Científic de Barcelona, Baldiri Reixac 4 (TI-05A7), 08028 Barcelona, Catalonia, Spain
| | - David Vidal
- Chemotargets SL , Parc Científic de Barcelona, Baldiri Reixac 4 (TI-05A7), 08028 Barcelona, Catalonia, Spain
| | - Nikita Remez
- Chemotargets SL , Parc Científic de Barcelona, Baldiri Reixac 4 (TI-05A7), 08028 Barcelona, Catalonia, Spain.,Systems Pharmacology, Research Program on Biomedical Informatics (GRIB), IMIM Hospital del Mar Medical Research Institute and University Pompeu Fabra , Parc de Recerca Biomèdica, Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain
| | - Jordi Mestres
- Chemotargets SL , Parc Científic de Barcelona, Baldiri Reixac 4 (TI-05A7), 08028 Barcelona, Catalonia, Spain.,Systems Pharmacology, Research Program on Biomedical Informatics (GRIB), IMIM Hospital del Mar Medical Research Institute and University Pompeu Fabra , Parc de Recerca Biomèdica, Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain
| |
Collapse
|
108
|
Nicola G, Berthold MR, Hedrick MP, Gilson MK. Connecting proteins with drug-like compounds: Open source drug discovery workflows with BindingDB and KNIME. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav087. [PMID: 26384374 PMCID: PMC4572361 DOI: 10.1093/database/bav087] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2015] [Accepted: 08/17/2015] [Indexed: 12/24/2022]
Abstract
Today's large, public databases of protein-small molecule interaction data are creating important new opportunities for data mining and integration. At the same time, new graphical user interface-based workflow tools offer facile alternatives to custom scripting for informatics and data analysis. Here, we illustrate how the large protein-ligand database BindingDB may be incorporated into KNIME workflows as a step toward the integration of pharmacological data with broader biomolecular analyses. Thus, we describe a collection of KNIME workflows that access BindingDB data via RESTful webservices and, for more intensive queries, via a local distillation of the full BindingDB dataset. We focus in particular on the KNIME implementation of knowledge-based tools to generate informed hypotheses regarding protein targets of bioactive compounds, based on notions of chemical similarity. A number of variants of this basic approach are tested for seven existing drugs with relatively ill-defined therapeutic targets, leading to replication of some previously confirmed results and discovery of new, high-quality hits. Implications for future development are discussed. Database URL: www.bindingdb.org.
Collapse
Affiliation(s)
- George Nicola
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA,
| | - Michael R Berthold
- Department of Computer and Information Science, Konstanz University, 78457 Konstanz, Germany, and
| | | | - Michael K Gilson
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA,
| |
Collapse
|
109
|
Abstract
The emergence of a number of publicly available bioactivity databases, such as ChEMBL, PubChem BioAssay and BindingDB, has raised awareness about the topics of data curation, quality and integrity. Here we provide an overview and discussion of the current and future approaches to activity, assay and target data curation of the ChEMBL database. This curation process involves several manual and automated steps and aims to: (1) maximise data accessibility and comparability; (2) improve data integrity and flag outliers, ambiguities and potential errors; and (3) add further curated annotations and mappings thus increasing the usefulness and accuracy of the ChEMBL data for all users and modellers in particular. Issues related to activity, assay and target data curation and integrity along with their potential impact for users of the data are discussed, alongside robust selection and filter strategies in order to avoid or minimise these, depending on the desired application.
Collapse
|
110
|
Bolton E. Reporting biological assay screening results for maximum impact. DRUG DISCOVERY TODAY. TECHNOLOGIES 2015; 14:31-6. [PMID: 26194585 DOI: 10.1016/j.ddtec.2015.03.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2015] [Revised: 03/18/2015] [Accepted: 03/29/2015] [Indexed: 11/19/2022]
Abstract
A very large corpus of biological assay screening results exist in the public domain. The ability to compare and analyze this data is hampered due to missing details and lack of a commonly used terminology to describe assay protocols and assay endpoints. Minimum reporting guidelines exist that, if followed, would greatly enhance the utility of biological assay screening data so it may be independently reproduced, readily integrated, effectively compared, and rapidly analyzed.
Collapse
Affiliation(s)
- Evan Bolton
- National Center for Biotechnology Information, Bldg. 38A/8S810, National Library of Medicine, U.S. National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| |
Collapse
|
111
|
Fu G, Batchelor C, Dumontier M, Hastings J, Willighagen E, Bolton E. PubChemRDF: towards the semantic annotation of PubChem compound and substance databases. J Cheminform 2015; 7:34. [PMID: 26175801 PMCID: PMC4500850 DOI: 10.1186/s13321-015-0084-4] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Accepted: 06/22/2015] [Indexed: 12/02/2022] Open
Abstract
Background PubChem is an open repository for chemical structures, biological activities and biomedical annotations. Semantic Web technologies are emerging as an increasingly important approach to distribute and integrate scientific data. Exposing PubChem data to Semantic Web services may help enable automated data integration and management, as well as facilitate interoperable web applications. Description This work, one of a series covering the PubChemRDF project, describes an approach to translate PubChem Substance and Compound information into Resource Description Framework (RDF) format. Basic examples are provided to demonstrate its use. The aim of this effort is to provide two new primary benefits to researchers in a cost-effective manner. Firstly, we aim to remove the inherent limitations of using the web-based resource PubChem by allowing a researcher to use readily available semantic technologies (namely, RDF triple stores and their corresponding SPARQL query engines) to query and analyze PubChem data on local computing resources. Secondly, this work intends to help improve data sharing, analysis, and integration of PubChem data to resources external to NCBI and across scientific domains, by means of the association of PubChem data to existing ontological frameworks, including CHEMical INFormation ontology, Semanticscience Integrated Ontology, and others. Conclusions With the goal of semantically describing information available in the PubChem archive, pre-existing ontological frameworks were used, rather than creating new ones. Semantic relationships between compounds and substances, chemical descriptors associated with compounds and substances, interrelationships between chemicals, as well as provenance and attribute metadata of substances are described. Electronic supplementary material The online version of this article (doi:10.1186/s13321-015-0084-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gang Fu
- National Center for Biotechnology Information, National Library of Medicine, National Institute of Health, Bethesda, MD USA
| | - Colin Batchelor
- Royal Society of Chemistry, Thomas Graham House, Cambridge, UK
| | - Michel Dumontier
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, USA
| | - Janna Hastings
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Egon Willighagen
- Department of Bioinformatics-BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Evan Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institute of Health, Bethesda, MD USA
| |
Collapse
|
112
|
Hu XL, Li D, Shao L, Dong X, He XP, Chen GR, Chen D. Triazole-Linked Glycolipids Enhance the Susceptibility of MRSA to β-Lactam Antibiotics. ACS Med Chem Lett 2015; 6:793-7. [PMID: 26191368 DOI: 10.1021/acsmedchemlett.5b00142] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2015] [Accepted: 06/01/2015] [Indexed: 12/12/2022] Open
Abstract
We show here that a series of triazolyl glycolipid derivatives modularly synthesized by a "click" reaction have the ability to increase the susceptibility of a drug-resistant bacterium to β-lactam antibiotics. We determine that the glycolipids can suppress the minimal inhibitory concentration of a number of ineffective β-lactams, upward of 256-fold, for methicillin-resistant Staphylococuss aureus (MRSA). The mechanism of action has been preliminarily probed and discussed.
Collapse
Affiliation(s)
- Xi-Le Hu
- Key Laboratory for Advanced Materials & Institute of Fine Chemicals, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, PR China
| | - Dan Li
- State
Key Laboratory of New Drug and Pharmaceutical Process, Shanghai Institute
of Pharmaceutical Industry, China State Institute of Pharmaceutical Industry, Shanghai 200040, PR China
| | - Lei Shao
- State
Key Laboratory of New Drug and Pharmaceutical Process, Shanghai Institute
of Pharmaceutical Industry, China State Institute of Pharmaceutical Industry, Shanghai 200040, PR China
| | - Xiaojing Dong
- State
Key Laboratory of New Drug and Pharmaceutical Process, Shanghai Institute
of Pharmaceutical Industry, China State Institute of Pharmaceutical Industry, Shanghai 200040, PR China
| | - Xiao-Peng He
- Key Laboratory for Advanced Materials & Institute of Fine Chemicals, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, PR China
| | - Guo-Rong Chen
- Key Laboratory for Advanced Materials & Institute of Fine Chemicals, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, PR China
| | - Daijie Chen
- State
Key Laboratory of New Drug and Pharmaceutical Process, Shanghai Institute
of Pharmaceutical Industry, China State Institute of Pharmaceutical Industry, Shanghai 200040, PR China
| |
Collapse
|
113
|
Abstract
ChEMBL is a large-scale drug discovery database containing bioactivity information primarily extracted from scientific literature. Due to the medicinal chemistry focus of the journals from which data are extracted, the data are currently of most direct value in the field of human health research. However, many of the scientific use-cases for the current data set are equally applicable in other fields, such as crop protection research: for example, identification of chemical scaffolds active against a particular target or endpoint, the de-convolution of the potential targets of a phenotypic assay, or the potential targets/pathways for safety liabilities. In order to broaden the applicability of the ChEMBL database and allow more widespread use in crop protection research, an extensive data set of bioactivity data of insecticidal, fungicidal and herbicidal compounds and assays was collated and added to the database.
Collapse
|
114
|
Richter L, Ecker GF. Medicinal chemistry in the era of big data. DRUG DISCOVERY TODAY. TECHNOLOGIES 2015; 14:37-41. [PMID: 26194586 DOI: 10.1016/j.ddtec.2015.06.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Revised: 06/02/2015] [Accepted: 06/02/2015] [Indexed: 06/04/2023]
Abstract
In the era of big data medicinal chemists are exposed to an enormous amount of bioactivity data. Numerous public data sources allow for querying across medium to large data sets mostly compiled from literature. However, the data available are still quite incomplete and of mixed quality. This mini review will focus on how medicinal chemists might use such resources and how valuable the current data sources are for guiding drug discovery.
Collapse
Affiliation(s)
- Lars Richter
- University of Vienna, Department of Pharmaceutical Chemistry, Althanstrasse 14, 1090 Wien, Austria
| | - Gerhard F Ecker
- University of Vienna, Department of Pharmaceutical Chemistry, Althanstrasse 14, 1090 Wien, Austria.
| |
Collapse
|
115
|
Hersey A, Chambers J, Bellis L, Patrícia Bento A, Gaulton A, Overington JP. Chemical databases: curation or integration by user-defined equivalence? DRUG DISCOVERY TODAY. TECHNOLOGIES 2015; 14:17-24. [PMID: 26194583 PMCID: PMC6294287 DOI: 10.1016/j.ddtec.2015.01.005] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2014] [Revised: 01/15/2015] [Accepted: 01/16/2015] [Indexed: 11/30/2022]
Abstract
There is a wealth of valuable chemical information in publicly available databases for use by scientists undertaking drug discovery. However finite curation resource, limitations of chemical structure software and differences in individual database applications mean that exact chemical structure equivalence between databases is unlikely to ever be a reality. The ability to identify compound equivalence has been made significantly easier by the use of the International Chemical Identifier (InChI), a non-proprietary line-notation for describing a chemical structure. More importantly, advances in methods to identify compounds that are the same at various levels of similarity, such as those containing the same parent component or having the same connectivity, are now enabling related compounds to be linked between databases where the structure matches are not exact.
Collapse
Affiliation(s)
- Anne Hersey
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.
| | - Jon Chambers
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Louisa Bellis
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - A Patrícia Bento
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Anna Gaulton
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - John P Overington
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| |
Collapse
|
116
|
Finding the right approach to big data-driven medicinal chemistry. Future Med Chem 2015; 7:1213-6. [DOI: 10.4155/fmc.15.58] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
|
117
|
Lambrinidis G, Vallianatou T, Tsantili-Kakoulidou A. In vitro, in silico and integrated strategies for the estimation of plasma protein binding. A review. Adv Drug Deliv Rev 2015; 86:27-45. [PMID: 25819487 DOI: 10.1016/j.addr.2015.03.011] [Citation(s) in RCA: 73] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Revised: 02/11/2015] [Accepted: 03/20/2015] [Indexed: 12/28/2022]
Abstract
Plasma protein binding (PPB) strongly affects drug distribution and pharmacokinetic behavior with consequences in overall pharmacological action. Extended plasma protein binding may be associated with drug safety issues and several adverse effects, like low clearance, low brain penetration, drug-drug interactions, loss of efficacy, while influencing the fate of enantiomers and diastereoisomers by stereoselective binding within the body. Therefore in holistic drug design approaches, where ADME(T) properties are considered in parallel with target affinity, considerable efforts are focused in early estimation of PPB mainly in regard to human serum albumin (HSA), which is the most abundant and most important plasma protein. The second critical serum protein α1-acid glycoprotein (AGP), although often underscored, plays also an important and complicated role in clinical therapy and thus the last years it has been studied thoroughly too. In the present review, after an overview of the principles of HSA and AGP binding as well as the structure topology of the proteins, the current trends and perspectives in the field of PPB predictions are presented and discussed considering both HSA and AGP binding. Since however for the latter protein systematic studies have started only the last years, the review focuses mainly to HSA. One part of the review highlights the challenge to develop rapid techniques for HSA and AGP binding simulation and their performance in assessment of PPB. The second part focuses on in silico approaches to predict HSA and AGP binding, analyzing and evaluating structure-based and ligand-based methods, as well as combination of both methods in the aim to exploit the different information and overcome the limitations of each individual approach. Ligand-based methods use the Quantitative Structure-Activity Relationships (QSAR) methodology to establish quantitate models for the prediction of binding constants from molecular descriptors, while they provide only indirect information on binding mechanism. Efforts for the establishment of global models, automated workflows and web-based platforms for PPB predictions are presented and discussed. Structure-based methods relying on the crystal structures of drug-protein complexes provide detailed information on the underlying mechanism but are usually restricted to specific compounds. They are useful to identify the specific binding site while they may be important in investigating drug-drug interactions, related to PPB. Moreover, chemometrics or structure-based modeling may be supported by experimental data a promising integrated alternative strategy for ADME(T) properties optimization. In the case of PPB the use of molecular modeling combined with bioanalytical techniques is frequently used for the investigation of AGP binding.
Collapse
|
118
|
Karapetyan K, Batchelor C, Sharpe D, Tkachenko V, Williams AJ. The Chemical Validation and Standardization Platform (CVSP): large-scale automated validation of chemical structure datasets. J Cheminform 2015; 7:30. [PMID: 26155308 PMCID: PMC4494041 DOI: 10.1186/s13321-015-0072-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2014] [Accepted: 04/28/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND There are presently hundreds of online databases hosting millions of chemical compounds and associated data. As a result of the number of cheminformatics software tools that can be used to produce the data, subtle differences between the various cheminformatics platforms, as well as the naivety of the software users, there are a myriad of issues that can exist with chemical structure representations online. In order to help facilitate validation and standardization of chemical structure datasets from various sources we have delivered a freely available internet-based platform to the community for the processing of chemical compound datasets. RESULTS The chemical validation and standardization platform (CVSP) both validates and standardizes chemical structure representations according to sets of systematic rules. The chemical validation algorithms detect issues with submitted molecular representations using pre-defined or user-defined dictionary-based molecular patterns that are chemically suspicious or potentially requiring manual review. Each identified issue is assigned one of three levels of severity - Information, Warning, and Error - in order to conveniently inform the user of the need to browse and review subsets of their data. The validation process includes validation of atoms and bonds (e.g., making aware of query atoms and bonds), valences, and stereo. The standard form of submission of collections of data, the SDF file, allows the user to map the data fields to predefined CVSP fields for the purpose of cross-validating associated SMILES and InChIs with the connection tables contained within the SDF file. This platform has been applied to the analysis of a large number of data sets prepared for deposition to our ChemSpider database and in preparation of data for the Open PHACTS project. In this work we review the results of the automated validation of the DrugBank dataset, a popular drug and drug target database utilized by the community, and ChEMBL 17 data set. CVSP web site is located at http://cvsp.chemspider.com/. CONCLUSION A platform for the validation and standardization of chemical structure representations of various formats has been developed and made available to the community to assist and encourage the processing of chemical structure files to produce more homogeneous compound representations for exchange and interchange between online databases. While the CVSP platform is designed with flexibility inherent to the rules that can be used for processing the data we have produced a recommended rule set based on our own experiences with the large data sets such as DrugBank, ChEMBL, and data sets from ChemSpider.
Collapse
Affiliation(s)
- Karen Karapetyan
- />Royal Society of Chemistry, US Office, 904 Tamaras Circle, Wake Forest, NC 27587 USA
| | - Colin Batchelor
- />Thomas Graham House, Science Park, 290 Milton Road, Cambridge, UK
| | - David Sharpe
- />Thomas Graham House, Science Park, 290 Milton Road, Cambridge, UK
| | - Valery Tkachenko
- />Royal Society of Chemistry, US Office, 904 Tamaras Circle, Wake Forest, NC 27587 USA
| | - Antony J Williams
- />Royal Society of Chemistry, US Office, 904 Tamaras Circle, Wake Forest, NC 27587 USA
- />Environmental Protection Agency, Research Triangle Park, NC USA
| |
Collapse
|
119
|
Warr WA. Many InChIs and quite some feat. J Comput Aided Mol Des 2015; 29:681-94. [PMID: 26081259 DOI: 10.1007/s10822-015-9854-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Accepted: 06/10/2015] [Indexed: 12/14/2022]
Affiliation(s)
- Wendy A Warr
- Wendy Warr & Associates, Holmes Chapel, Crewe, Cheshire, CW4 7HZ, UK,
| |
Collapse
|
120
|
Alnazzawi N, Thompson P, Batista-Navarro R, Ananiadou S. Using text mining techniques to extract phenotypic information from the PhenoCHF corpus. BMC Med Inform Decis Mak 2015; 15 Suppl 2:S3. [PMID: 26099853 PMCID: PMC4474585 DOI: 10.1186/1472-6947-15-s2-s3] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Background Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text. Methods To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013. Results Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set. Conclusions PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single disease, the promising results achieved can stimulate further work into the extraction of phenotypic information for other diseases. The PhenoCHF annotation guidelines and annotations are publicly available at https://code.google.com/p/phenochf-corpus.
Collapse
|
121
|
Ernst P, Siu A, Weikum G. KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences. BMC Bioinformatics 2015; 16:157. [PMID: 25971816 PMCID: PMC4448285 DOI: 10.1186/s12859-015-0549-5] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2014] [Accepted: 03/25/2015] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Biomedical knowledge bases (KB's) have become important assets in life sciences. Prior work on KB construction has three major limitations. First, most biomedical KBs are manually built and curated, and cannot keep up with the rate at which new findings are published. Second, for automatic information extraction (IE), the text genre of choice has been scientific publications, neglecting sources like health portals and online communities. Third, most prior work on IE has focused on the molecular level or chemogenomics only, like protein-protein interactions or gene-drug relationships, or solely address highly specific topics such as drug effects. RESULTS We address these three limitations by a versatile and scalable approach to automatic KB construction. Using a small number of seed facts for distant supervision of pattern-based extraction, we harvest a huge number of facts in an automated manner without requiring any explicit training. We extend previous techniques for pattern-based IE with confidence statistics, and we combine this recall-oriented stage with logical reasoning for consistency constraint checking to achieve high precision. To our knowledge, this is the first method that uses consistency checking for biomedical relations. Our approach can be easily extended to incorporate additional relations and constraints. We ran extensive experiments not only for scientific publications, but also for encyclopedic health portals and online communities, creating different KB's based on different configurations. We assess the size and quality of each KB, in terms of number of facts and precision. The best configured KB, KnowLife, contains more than 500,000 facts at a precision of 93% for 13 relations covering genes, organs, diseases, symptoms, treatments, as well as environmental and lifestyle risk factors. CONCLUSION KnowLife is a large knowledge base for health and life sciences, automatically constructed from different Web sources. As a unique feature, KnowLife is harvested from different text genres such as scientific publications, health portals, and online communities. Thus, it has the potential to serve as one-stop portal for a wide range of relations and use cases. To showcase the breadth and usefulness, we make the KnowLife KB accessible through the health portal (http://knowlife.mpi-inf.mpg.de).
Collapse
Affiliation(s)
- Patrick Ernst
- Max-Planck-Institute for Informatics, Campus E1 4, Saarbrücken, 66123, Germany.
| | - Amy Siu
- Max-Planck-Institute for Informatics, Campus E1 4, Saarbrücken, 66123, Germany.
| | - Gerhard Weikum
- Max-Planck-Institute for Informatics, Campus E1 4, Saarbrücken, 66123, Germany.
| |
Collapse
|
122
|
Leroux H, Lefort L. Semantic enrichment of longitudinal clinical study data using the CDISC standards and the semantic statistics vocabularies. J Biomed Semantics 2015; 6:16. [PMID: 25973166 PMCID: PMC4429421 DOI: 10.1186/s13326-015-0012-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2014] [Accepted: 03/05/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND There is an increasing recognition of the need for the data capture phase of clinical studies to be improved and for more effective sharing of clinical data. The Health Care and Life Sciences community has embraced semantic technologies to facilitate the integration of health data from electronic health records, clinical studies and pharmaceutical research. This paper explores the integration of clinical study data exchange standards and semantic statistic vocabularies to deliver clinical data as linked data in a format that is easier to enrich with links to complementary data sources and consume by a broad user base. METHODS We propose a Linked Clinical Data Cube (LCDC), which combines the strength of the RDF Data Cube and DDI-RDF vocabulary to enrich clinical data based on the CDISC standards. The CDISC standards provide the mechanisms for the data to be standardised, made more accessible and accountable whereas the RDF Data Cube and DDI-RDF vocabularies provide novel approaches to managing large volumes of heterogeneous linked data resources. RESULTS We validate our approach using a large-scale longitudinal clinical study into neurodegenerative diseases. This dataset, comprising more than 1600 variables clustered in 25 different sub-domains, has been fully converted into RDF forming one main data cube and one specialised cube for each sub-domain. One sub-domain, the Medications specialised cube, has been linked to relevant external vocabularies, such as the Australian Medicines Terminology and the ATC DDD taxonomy and DrugBank terminology. This provides new dimensions on which to query the data that promote the exploration of drug-drug and drug-disease interactions. CONCLUSIONS This implementation highlights the effectiveness of the association of the semantic statistics vocabularies for the publication of large heterogeneous data sets as linked data and the integration of the semantic statistics vocabularies with the CDISC standards. In particular, it demonstrates the potential of the two vocabularies in overcoming the monolithic nature of the underlying model and improving the navigation and querying of the data from multiple angles to support richer data analysis of clinical study data. The forecasted benefits are more efficient use of clinicians' time and the potential to facilitate cross-study analysis.
Collapse
Affiliation(s)
- Hugo Leroux
- The Australian e-Health Research Centre, Digital Productivity Flagship, CSIRO, Level 5 - UQ Health Sciences Building 901/16, Brisbane, 4029 Queensland Australia
| | - Laurent Lefort
- Digital Economy Program, Digital Productivity Flagship, CSIRO, Canberra, 2601 ACT Australia
| |
Collapse
|
123
|
Drug discovery FAQs: workflows for answering multidomain drug discovery questions. Drug Discov Today 2015; 20:399-405. [DOI: 10.1016/j.drudis.2014.11.006] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2014] [Revised: 10/22/2014] [Accepted: 11/13/2014] [Indexed: 12/26/2022]
|
124
|
Clark AM, Williams AJ, Ekins S. Machines first, humans second: on the importance of algorithmic interpretation of open chemistry data. J Cheminform 2015; 7:9. [PMID: 25798198 PMCID: PMC4369291 DOI: 10.1186/s13321-015-0057-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Accepted: 02/23/2015] [Indexed: 11/12/2022] Open
Abstract
The current rise in the use of open lab notebook techniques means that there are an increasing number of scientists who make chemical information freely and openly available to the entire community as a series of micropublications that are released shortly after the conclusion of each experiment. We propose that this trend be accompanied by a thorough examination of data sharing priorities. We argue that the most significant immediate benefactor of open data is in fact chemical algorithms, which are capable of absorbing vast quantities of data, and using it to present concise insights to working chemists, on a scale that could not be achieved by traditional publication methods. Making this goal practically achievable will require a paradigm shift in the way individual scientists translate their data into digital form, since most contemporary methods of data entry are designed for presentation to humans rather than consumption by machine learning algorithms. We discuss some of the complex issues involved in fixing current methods, as well as some of the immediate benefits that can be gained when open data is published correctly using unambiguous machine readable formats. Lab notebook entries must target both visualisation by scientists and use by machine learning algorithms ![]()
Collapse
Affiliation(s)
- Alex M Clark
- Molecular Materials Informatics, 1900 St. Jacques #302, Montreal, H3J 2S1, QC Canada
| | - Antony J Williams
- Royal Society of Chemistry, 904 Tamaras Circle, Wake Forest, NC 27587 USA
| | - Sean Ekins
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526 USA ; Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010 USA
| |
Collapse
|
125
|
Hastings J, Jeliazkova N, Owen G, Tsiliki G, Munteanu CR, Steinbeck C, Willighagen E. eNanoMapper: harnessing ontologies to enable data integration for nanomaterial risk assessment. J Biomed Semantics 2015; 6:10. [PMID: 25815161 PMCID: PMC4374589 DOI: 10.1186/s13326-015-0005-5] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 02/27/2015] [Indexed: 11/18/2022] Open
Abstract
Engineered nanomaterials (ENMs) are being developed to meet specific application needs in diverse domains across the engineering and biomedical sciences (e.g. drug delivery). However, accompanying the exciting proliferation of novel nanomaterials is a challenging race to understand and predict their possibly detrimental effects on human health and the environment. The eNanoMapper project (www.enanomapper.net) is creating a pan-European computational infrastructure for toxicological data management for ENMs, based on semantic web standards and ontologies. Here, we describe the development of the eNanoMapper ontology based on adopting and extending existing ontologies of relevance for the nanosafety domain. The resulting eNanoMapper ontology is available at http://purl.enanomapper.net/onto/enanomapper.owl. We aim to make the re-use of external ontology content seamless and thus we have developed a library to automate the extraction of subsets of ontology content and the assembly of the subsets into an integrated whole. The library is available (open source) at http://github.com/enanomapper/slimmer/. Finally, we give a comprehensive survey of the domain content and identify gap areas. ENM safety is at the boundary between engineering and the life sciences, and at the boundary between molecular granularity and bulk granularity. This creates challenges for the definition of key entities in the domain, which we also discuss.
Collapse
Affiliation(s)
- Janna Hastings
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Cambridge, United Kingdom
| | | | - Gareth Owen
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Cambridge, United Kingdom
| | - Georgia Tsiliki
- National Technical University of Athens (NTUA), Athens, Greece
| | - Cristian R Munteanu
- Computer Science Faculty, University of A Coruña, A Coruña, Spain ; Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, Netherlands
| | - Christoph Steinbeck
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Cambridge, United Kingdom
| | - Egon Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, Netherlands
| |
Collapse
|
126
|
Carrió P, López O, Sanz F, Pastor M. eTOXlab, an open source modeling framework for implementing predictive models in production environments. J Cheminform 2015; 7:8. [PMID: 25774224 PMCID: PMC4358905 DOI: 10.1186/s13321-015-0058-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2014] [Accepted: 02/24/2015] [Indexed: 11/10/2022] Open
Abstract
Background Computational models based in Quantitative-Structure Activity Relationship (QSAR) methodologies are widely used tools for predicting the biological properties of new compounds. In many instances, such models are used as a routine in the industry (e.g. food, cosmetic or pharmaceutical industry) for the early assessment of the biological properties of new compounds. However, most of the tools currently available for developing QSAR models are not well suited for supporting the whole QSAR model life cycle in production environments. Results We have developed eTOXlab; an open source modeling framework designed to be used at the core of a self-contained virtual machine that can be easily deployed in production environments, providing predictions as web services. eTOXlab consists on a collection of object-oriented Python modules with methods mapping common tasks of standard modeling workflows. This framework allows building and validating QSAR models as well as predicting the properties of new compounds using either a command line interface or a graphic user interface (GUI). Simple models can be easily generated by setting a few parameters, while more complex models can be implemented by overriding pieces of the original source code. eTOXlab benefits from the object-oriented capabilities of Python for providing high flexibility: any model implemented using eTOXlab inherits the features implemented in the parent model, like common tools and services or the automatic exposure of the models as prediction web services. The particular eTOXlab architecture as a self-contained, portable prediction engine allows building models with confidential information within corporate facilities, which can be safely exported and used for prediction without disclosing the structures of the training series. Conclusions The software presented here provides full support to the specific needs of users that want to develop, use and maintain predictive models in corporate environments. The technologies used by eTOXlab (web services, VM, object-oriented programming) provide an elegant solution to common practical issues; the system can be installed easily in heterogeneous environments and integrates well with other software. Moreover, the system provides a simple and safe solution for building models with confidential structures that can be shared without disclosing sensitive information. Electronic supplementary material The online version of this article (doi:10.1186/s13321-015-0058-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pau Carrió
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, IMIM (Hospital del Mar Medical Research Institute), Dr. Aiguader 88, E-08003 Barcelona, Spain
| | - Oriol López
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, IMIM (Hospital del Mar Medical Research Institute), Dr. Aiguader 88, E-08003 Barcelona, Spain
| | - Ferran Sanz
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, IMIM (Hospital del Mar Medical Research Institute), Dr. Aiguader 88, E-08003 Barcelona, Spain
| | - Manuel Pastor
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, IMIM (Hospital del Mar Medical Research Institute), Dr. Aiguader 88, E-08003 Barcelona, Spain
| |
Collapse
|
127
|
Nantasenamat C, Prachayasittikul V. Maximizing computational tools for successful drug discovery. Expert Opin Drug Discov 2015; 10:321-9. [PMID: 25693813 DOI: 10.1517/17460441.2015.1016497] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Drug discovery is an iterative cycle of identifying promising hits followed by lead optimization via bioisosteric replacements. In the search for compounds affording good bioactivity, equal importance should also be placed on achieving those with favorable pharmacokinetic properties. Thus, the balance and realization of both key properties is an intricate problem that requires great caution. In this editorial, the authors explore the available computational tools in the context of the extant of big data that has borne out via advents of the Omics revolution. As such, the selection of appropriate computational tools for analyzing the vast number of chemical libraries, target proteins and interactomes is the first step toward maximizing the chance for success. However, in order to realize this, it is also necessary to have a solid foundation on the big concepts of drug discovery as well as knowing which tools are available in order to give drug discovery scientists the best opportunity.
Collapse
Affiliation(s)
- Chanin Nantasenamat
- Mahidol University, Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology , 10700 Bangkok , Thailand
| | | |
Collapse
|
128
|
Eijssen L, Evelo C, Kok R, Mons B, Hooft R. The Dutch Techcentre for Life Sciences: Enabling data-intensive life science research in the Netherlands. F1000Res 2015; 4:33. [PMID: 26913186 PMCID: PMC4743138 DOI: 10.12688/f1000research.6009.2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/04/2016] [Indexed: 11/20/2022] Open
Abstract
We describe the Data programme of the Dutch Techcentre for Life Sciences (DTL, www.dtls.nl). DTL is a new national organisation in scientific research that facilitates life scientists with technologies and technological expertise in an era where new projects often are data-intensive, multi-disciplinary, and multi-site. It is run as a lean not-for-profit organisation with research organisations (both academic and industrial) as paying members. The small staff of the organisation undertakes a variety of tasks that are necessary to perform or support modern academic research, but that are not easily undertaken in a purely academic setting. DTL Data takes care of such tasks related to data stewardship, facilitating exchange of knowledge and expertise, and brokering access to e-infrastructure. DTL also represents the Netherlands in ELIXIR, the European infrastructure for life science data. The organisation is still being fine-tuned and this will continue over time, as it is crucial for this kind of organisation to adapt to a constantly changing environment. However, already being underway for several years, our experiences can benefit researchers in other fields or other countries setting up similar initiatives.
Collapse
Affiliation(s)
- Lars Eijssen
- Department of Bioinformatics - BiGCaT, Maastricht University, 6229 ER Maastricht, Netherlands
| | - Chris Evelo
- Department of Bioinformatics - BiGCaT, Maastricht University, 6229 ER Maastricht, Netherlands
| | - Ruben Kok
- Dutch Techcentre for Life Sciences (Foundation office), Catharijnesingel 54, 3511 GC Utrecht, Netherlands
| | - Barend Mons
- Dutch Techcentre for Life Sciences (Foundation office), Catharijnesingel 54, 3511 GC Utrecht, Netherlands; Netherlands eScience Center, Science Park 140, 1098 XG Amsterdam, Netherlands; Leiden University Medical Center, Albinusdreef 2, 2333 ZA, Leiden, Netherlands
| | - Rob Hooft
- Dutch Techcentre for Life Sciences (Foundation office), Catharijnesingel 54, 3511 GC Utrecht, Netherlands; Netherlands eScience Center, Science Park 140, 1098 XG Amsterdam, Netherlands
| | | |
Collapse
|
129
|
Application of text mining in the biomedical domain. Methods 2015; 74:97-106. [PMID: 25641519 DOI: 10.1016/j.ymeth.2015.01.015] [Citation(s) in RCA: 78] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2014] [Revised: 01/21/2015] [Accepted: 01/23/2015] [Indexed: 12/12/2022] Open
Abstract
In recent years the amount of experimental data that is produced in biomedical research and the number of papers that are being published in this field have grown rapidly. In order to keep up to date with developments in their field of interest and to interpret the outcome of experiments in light of all available literature, researchers turn more and more to the use of automated literature mining. As a consequence, text mining tools have evolved considerably in number and quality and nowadays can be used to address a variety of research questions ranging from de novo drug target discovery to enhanced biological interpretation of the results from high throughput experiments. In this paper we introduce the most important techniques that are used for a text mining and give an overview of the text mining tools that are currently being used and the type of problems they are typically applied for.
Collapse
|
130
|
Hoehndorf R, Slater L, Schofield PN, Gkoutos GV. Aber-OWL: a framework for ontology-based data access in biology. BMC Bioinformatics 2015; 16:26. [PMID: 25627673 PMCID: PMC4384359 DOI: 10.1186/s12859-015-0456-9] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2014] [Accepted: 01/09/2015] [Indexed: 11/10/2022] Open
Abstract
Background Many ontologies have been developed in biology and these ontologies increasingly contain large volumes of formalized knowledge commonly expressed in the Web Ontology Language (OWL). Computational access to the knowledge contained within these ontologies relies on the use of automated reasoning. Results We have developed the Aber-OWL infrastructure that provides reasoning services for bio-ontologies. Aber-OWL consists of an ontology repository, a set of web services and web interfaces that enable ontology-based semantic access to biological data and literature. Aber-OWL is freely available at http://aber-owl.net. Conclusions Aber-OWL provides a framework for automatically accessing information that is annotated with ontologies or contains terms used to label classes in ontologies. When using Aber-OWL, access to ontologies and data annotated with them is not merely based on class names or identifiers but rather on the knowledge the ontologies contain and the inferences that can be drawn from it.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia. .,Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia.
| | - Luke Slater
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia. .,Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia. .,Department of Computer Science, Aberystwyth University, Llandinam Building, Aberystwyth, SY23 3DB, UK.
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, UK.
| | - Georgios V Gkoutos
- Department of Computer Science, Aberystwyth University, Llandinam Building, Aberystwyth, SY23 3DB, UK.
| |
Collapse
|
131
|
Ontology-based data integration between clinical and research systems. PLoS One 2015; 10:e0116656. [PMID: 25588043 PMCID: PMC4294641 DOI: 10.1371/journal.pone.0116656] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2014] [Accepted: 12/06/2014] [Indexed: 12/17/2022] Open
Abstract
Data from the electronic medical record comprise numerous structured but uncoded ele-ments, which are not linked to standard terminologies. Reuse of such data for secondary research purposes has gained in importance recently. However, the identification of rele-vant data elements and the creation of database jobs for extraction, transformation and loading (ETL) are challenging: With current methods such as data warehousing, it is not feasible to efficiently maintain and reuse semantically complex data extraction and trans-formation routines. We present an ontology-supported approach to overcome this challenge by making use of abstraction: Instead of defining ETL procedures at the database level, we use ontologies to organize and describe the medical concepts of both the source system and the target system. Instead of using unique, specifically developed SQL statements or ETL jobs, we define declarative transformation rules within ontologies and illustrate how these constructs can then be used to automatically generate SQL code to perform the desired ETL procedures. This demonstrates how a suitable level of abstraction may not only aid the interpretation of clinical data, but can also foster the reutilization of methods for un-locking it.
Collapse
|
132
|
Moghadam BT, Alvarsson J, Holm M, Eklund M, Carlsson L, Spjuth O. Scaling predictive modeling in drug development with cloud computing. J Chem Inf Model 2015; 55:19-25. [PMID: 25493610 DOI: 10.1021/ci500580y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Growing data sets with increased time for analysis is hampering predictive modeling in drug discovery. Model building can be carried out on high-performance computer clusters, but these can be expensive to purchase and maintain. We have evaluated ligand-based modeling on cloud computing resources where computations are parallelized and run on the Amazon Elastic Cloud. We trained models on open data sets of varying sizes for the end points logP and Ames mutagenicity and compare with model building parallelized on a traditional high-performance computing cluster. We show that while high-performance computing results in faster model building, the use of cloud computing resources is feasible for large data sets and scales well within cloud instances. An additional advantage of cloud computing is that the costs of predictive models can be easily quantified, and a choice can be made between speed and economy. The easy access to computational resources with no up-front investments makes cloud computing an attractive alternative for scientists, especially for those without access to a supercomputer, and our study shows that it enables cost-efficient modeling of large data sets on demand within reasonable time.
Collapse
Affiliation(s)
- Behrooz Torabi Moghadam
- Department of Pharmaceutical Biosciences, ‡Department of Information Technology, and §Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University , SE-751 24 Uppsala, Sweden
| | | | | | | | | | | |
Collapse
|
133
|
Machado CM, Rebholz-Schuhmann D, Freitas AT, Couto FM. The semantic web in translational medicine: current applications and future directions. Brief Bioinform 2015; 16:89-103. [PMID: 24197933 PMCID: PMC4293377 DOI: 10.1093/bib/bbt079] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2013] [Accepted: 10/08/2013] [Indexed: 11/14/2022] Open
Abstract
Semantic web technologies offer an approach to data integration and sharing, even for resources developed independently or broadly distributed across the web. This approach is particularly suitable for scientific domains that profit from large amounts of data that reside in the public domain and that have to be exploited in combination. Translational medicine is such a domain, which in addition has to integrate private data from the clinical domain with proprietary data from the pharmaceutical domain. In this survey, we present the results of our analysis of translational medicine solutions that follow a semantic web approach. We assessed these solutions in terms of their target medical use case; the resources covered to achieve their objectives; and their use of existing semantic web resources for the purposes of data sharing, data interoperability and knowledge discovery. The semantic web technologies seem to fulfill their role in facilitating the integration and exploration of data from disparate sources, but it is also clear that simply using them is not enough. It is fundamental to reuse resources, to define mappings between resources, to share data and knowledge. All these aspects allow the instantiation of translational medicine at the semantic web-scale, thus resulting in a network of solutions that can share resources for a faster transfer of new scientific results into the clinical practice. The envisioned network of translational medicine solutions is on its way, but it still requires resolving the challenges of sharing protected data and of integrating semantic-driven technologies into the clinical practice.
Collapse
Affiliation(s)
- Catia M. Machado
- *Corresponding author. Catia M. Machado, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Portugal and Instituto de Engenharia de Sistemas e Computadores - Investigação e Desenvolvimento, Universidade de Lisboa, Portugal. E-mail:
| | | | | | | |
Collapse
|
134
|
Ratnam J, Zdrazil B, Digles D, Cuadrado-Rodriguez E, Neefs JM, Tipney H, Siebes R, Waagmeester A, Bradley G, Chau CH, Richter L, Brea J, Evelo CT, Jacoby E, Senger S, Loza MI, Ecker GF, Chichester C. The application of the open pharmacological concepts triple store (open PHACTS) to support drug discovery research. PLoS One 2014; 9:e115460. [PMID: 25522365 PMCID: PMC4270790 DOI: 10.1371/journal.pone.0115460] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2014] [Accepted: 10/30/2014] [Indexed: 01/08/2023] Open
Abstract
Integration of open access, curated, high-quality information from multiple disciplines in the Life and Biomedical Sciences provides a holistic understanding of the domain. Additionally, the effective linking of diverse data sources can unearth hidden relationships and guide potential research strategies. However, given the lack of consistency between descriptors and identifiers used in different resources and the absence of a simple mechanism to link them, gathering and combining relevant, comprehensive information from diverse databases remains a challenge. The Open Pharmacological Concepts Triple Store (Open PHACTS) is an Innovative Medicines Initiative project that uses semantic web technology approaches to enable scientists to easily access and process data from multiple sources to solve real-world drug discovery problems. The project draws together sources of publicly-available pharmacological, physicochemical and biomolecular data, represents it in a stable infrastructure and provides well-defined information exploration and retrieval methods. Here, we highlight the utility of this platform in conjunction with workflow tools to solve pharmacological research questions that require interoperability between target, compound, and pathway data. Use cases presented herein cover 1) the comprehensive identification of chemical matter for a dopamine receptor drug discovery program 2) the identification of compounds active against all targets in the Epidermal growth factor receptor (ErbB) signaling pathway that have a relevance to disease and 3) the evaluation of established targets in the Vitamin D metabolism pathway to aid novel Vitamin D analogue design. The example workflows presented illustrate how the Open PHACTS Discovery Platform can be used to exploit existing knowledge and generate new hypotheses in the process of drug discovery.
Collapse
Affiliation(s)
- Joseline Ratnam
- Universidade de Santiago de Compostela, Grupo BioFarma-USEF, Departamento de Farmacología, Campus Universitario Sur s/n, 15782 Santiago de Compostela, Spain
- * E-mail:
| | - Barbara Zdrazil
- University of Vienna, Department of Pharmaceutical Chemistry, Althanstrasse 14, 1090 Vienna, Austria
| | - Daniela Digles
- University of Vienna, Department of Pharmaceutical Chemistry, Althanstrasse 14, 1090 Vienna, Austria
| | - Emiliano Cuadrado-Rodriguez
- Universidade de Santiago de Compostela, Grupo BioFarma-USEF, Departamento de Farmacología, Campus Universitario Sur s/n, 15782 Santiago de Compostela, Spain
| | - Jean-Marc Neefs
- Janssen Research & Development, Turnhoutseweg 30, Beerse, Belgium
| | - Hannah Tipney
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, United Kingdom
| | - Ronald Siebes
- Vrije Universiteit, Faculty of Sciences, division of Math. and Computer Science, De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands
| | - Andra Waagmeester
- Department of Bioinformatics – BiGCaT, Maastricht University, Maastricht, The Netherlands
| | - Glyn Bradley
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, United Kingdom
| | - Chau Han Chau
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, United Kingdom
| | - Lars Richter
- University of Vienna, Department of Pharmaceutical Chemistry, Althanstrasse 14, 1090 Vienna, Austria
| | - Jose Brea
- Universidade de Santiago de Compostela, Grupo BioFarma-USEF, Departamento de Farmacología, Campus Universitario Sur s/n, 15782 Santiago de Compostela, Spain
| | - Chris T. Evelo
- Department of Bioinformatics – BiGCaT, Maastricht University, Maastricht, The Netherlands
| | - Edgar Jacoby
- Janssen Research & Development, Turnhoutseweg 30, Beerse, Belgium
| | - Stefan Senger
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, United Kingdom
| | - Maria Isabel Loza
- Universidade de Santiago de Compostela, Grupo BioFarma-USEF, Departamento de Farmacología, Campus Universitario Sur s/n, 15782 Santiago de Compostela, Spain
| | - Gerhard F. Ecker
- University of Vienna, Department of Pharmaceutical Chemistry, Althanstrasse 14, 1090 Vienna, Austria
| | - Christine Chichester
- Swiss Institute of Bioinformatics, CALIPHO Group, CMU – Rue Michel-Servet 1, 1211 Geneva 4, Switzerland
| |
Collapse
|
135
|
Rinaldi F, Clematide S, Marques H, Ellendorff T, Romacker M, Rodriguez-Esteban R. OntoGene web services for biomedical text mining. BMC Bioinformatics 2014; 15 Suppl 14:S6. [PMID: 25472638 PMCID: PMC4255746 DOI: 10.1186/1471-2105-15-s14-s6] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Text mining services are rapidly becoming a crucial component of various knowledge management pipelines, for example in the process of database curation, or for exploration and enrichment of biomedical data within the pharmaceutical industry. Traditional architectures, based on monolithic applications, do not offer sufficient flexibility for a wide range of use case scenarios, and therefore open architectures, as provided by web services, are attracting increased interest. We present an approach towards providing advanced text mining capabilities through web services, using a recently proposed standard for textual data interchange (BioC). The web services leverage a state-of-the-art platform for text mining (OntoGene) which has been tested in several community-organized evaluation challenges, with top ranked results in several of them.
Collapse
|
136
|
Wassermann AM, Lounkine E, Davies JW, Glick M, Camargo LM. The opportunities of mining historical and collective data in drug discovery. Drug Discov Today 2014; 20:422-34. [PMID: 25463034 DOI: 10.1016/j.drudis.2014.11.004] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Revised: 10/21/2014] [Accepted: 11/10/2014] [Indexed: 12/26/2022]
Abstract
Vast amounts of bioactivity data have been generated for small molecules across public and corporate domains. Biological signatures, either derived from systematic profiling efforts or from existing historical assay data, have been successfully employed for small molecule mechanism-of-action elucidation, drug repositioning, hit expansion and screening subset design. This article reviews different types of biological descriptors and applications, and we demonstrate how biological data can outlive the original purpose or project for which it was generated. By comparing 150 HTS campaigns run at Novartis over the past decade on the basis of their active and inactive chemical matter, we highlight the opportunities and challenges associated with cross-project learning in drug discovery.
Collapse
Affiliation(s)
- Anne Mai Wassermann
- In Silico Lead Discovery, Novartis Institutes for Biomedical Research, 250 Massachusetts Avenue, Cambridge, MA 02139, USA.
| | - Eugen Lounkine
- In Silico Lead Discovery, Novartis Institutes for Biomedical Research, 250 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - John W Davies
- In Silico Lead Discovery, Novartis Institutes for Biomedical Research, 250 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - Meir Glick
- In Silico Lead Discovery, Novartis Institutes for Biomedical Research, 250 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - L Miguel Camargo
- In Silico Lead Discovery, Novartis Institutes for Biomedical Research, 250 Massachusetts Avenue, Cambridge, MA 02139, USA.
| |
Collapse
|
137
|
Abstract
Within the last decade open data concepts has been gaining increasing interest in the area of drug discovery. With the launch of ChEMBL and PubChem, an enormous amount of bioactivity data was made easily accessible to the public domain. In addition, platforms that semantically integrate those data, such as the Open PHACTS Discovery Platform, permit querying across different domains of open life science data beyond the concept of ligand-target-pharmacology. However, most public databases are compiled from literature sources and are thus heterogeneous in their coverage. In addition, assay descriptions are not uniform and most often lack relevant information in the primary literature and, consequently, in databases. This raises the question how useful large public data sources are for deriving computational models. In this perspective, we highlight selected open-source initiatives and outline the possibilities and also the limitations when exploiting this huge amount of bioactivity data.
Collapse
|
138
|
The eTOX data-sharing project to advance in silico drug-induced toxicity prediction. Int J Mol Sci 2014; 15:21136-54. [PMID: 25405742 PMCID: PMC4264217 DOI: 10.3390/ijms151121136] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Accepted: 10/20/2014] [Indexed: 11/16/2022] Open
Abstract
The high-quality in vivo preclinical safety data produced by the pharmaceutical industry during drug development, which follows numerous strict guidelines, are mostly not available in the public domain. These safety data are sometimes published as a condensed summary for the few compounds that reach the market, but the majority of studies are never made public and are often difficult to access in an automated way, even sometimes within the owning company itself. It is evident from many academic and industrial examples, that useful data mining and model development requires large and representative data sets and careful curation of the collected data. In 2010, under the auspices of the Innovative Medicines Initiative, the eTOX project started with the objective of extracting and sharing preclinical study data from paper or pdf archives of toxicology departments of the 13 participating pharmaceutical companies and using such data for establishing a detailed, well-curated database, which could then serve as source for read-across approaches (early assessment of the potential toxicity of a drug candidate by comparison of similar structure and/or effects) and training of predictive models. The paper describes the efforts undertaken to allow effective data sharing intellectual property (IP) protection and set up of adequate controlled vocabularies) and to establish the database (currently with over 4000 studies contributed by the pharma companies corresponding to more than 1400 compounds). In addition, the status of predictive models building and some specific features of the eTOX predictive system (eTOXsys) are presented as decision support knowledge-based tools for drug development process at an early stage.
Collapse
|
139
|
Hu Y, Bajorath J. Influence of search parameters and criteria on compound selection, promiscuity, and pan assay interference characteristics. J Chem Inf Model 2014; 54:3056-66. [PMID: 25329977 DOI: 10.1021/ci5005509] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Compound activity data grow at unprecedented rates, and their complexity increases. This challenges compound data mining efforts and makes it difficult to draw reliable conclusions from data analysis. We have aimed to investigate the influence of individual parameters and data confidence levels on compound selection and property assessment. Therefore, alternative sets of bioactive compounds were systematically extracted from ChEMBL on the basis of iteratively expanding selection criteria with increasing stringency covering a variety of search parameters. The sequential application of criteria for the selection of high-confidence compound data was order-independent, as expected. Furthermore, the influence of separately applied selection criteria was analyzed. Criteria that largely influenced compound selection and compound promiscuity rates were identified. In the presence of stringent selection criteria and high data confidence, many compounds with likely assay artifacts or liabilities were eliminated from further consideration. Taken together, the findings of our analysis emphasize the need to carefully consider search parameters related to target organisms, confidence level of activity, and activity measurements and suggest reliable protocols for compound data mining.
Collapse
Affiliation(s)
- Ye Hu
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität , Dahlmannstr. 2, D-53113 Bonn, Germany
| | | |
Collapse
|
140
|
|
141
|
Vuorinen A, Schuster D. Methods for generating and applying pharmacophore models as virtual screening filters and for bioactivity profiling. Methods 2014; 71:113-34. [PMID: 25461773 DOI: 10.1016/j.ymeth.2014.10.013] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2014] [Revised: 09/29/2014] [Accepted: 10/14/2014] [Indexed: 01/03/2023] Open
Abstract
Biological effects of small molecules in an organism result from favorable interactions between the molecules and their target proteins. These interactions depend on chemical functionalities, bonds, and their 3D-orientations towards each other. These 3D-arrangements of chemical functionalities that make a small molecule active towards its target can be described by pharmacophore models. In these models, chemical functionalities are represented as so-called features. Commonly, they are obtained either from a set of active compounds or directly from the observed protein-ligand interactions as present in X-ray crystal structures, NMR structures, or docking poses. In this review, we explain the basics of pharmacophore modeling including dataset generation, 3D-representations and conformational analysis of small molecules, pharmacophore model construction, model validation, and its benefits to virtual screening and other applications.
Collapse
Affiliation(s)
- Anna Vuorinen
- Institute of Pharmacy/Pharmaceutical Chemistry and Center for Molecular Biosciences Innsbruck - CMBI, University of Innsbruck, Innrain 80/82, 6020 Innsbruck, Austria
| | - Daniela Schuster
- Institute of Pharmacy/Pharmaceutical Chemistry and Center for Molecular Biosciences Innsbruck - CMBI, University of Innsbruck, Innrain 80/82, 6020 Innsbruck, Austria.
| |
Collapse
|
142
|
Ekins S, Clark AM, Swamidass SJ, Litterman N, Williams AJ. Bigger data, collaborative tools and the future of predictive drug discovery. J Comput Aided Mol Des 2014; 28:997-1008. [PMID: 24943138 PMCID: PMC4198464 DOI: 10.1007/s10822-014-9762-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2014] [Accepted: 06/09/2014] [Indexed: 12/31/2022]
Abstract
Over the past decade we have seen a growth in the provision of chemistry data and cheminformatics tools as either free websites or software as a service commercial offerings. These have transformed how we find molecule-related data and use such tools in our research. There have also been efforts to improve collaboration between researchers either openly or through secure transactions using commercial tools. A major challenge in the future will be how such databases and software approaches handle larger amounts of data as it accumulates from high throughput screening and enables the user to draw insights, enable predictions and move projects forward. We now discuss how information from some drug discovery datasets can be made more accessible and how privacy of data should not overwhelm the desire to share it at an appropriate time with collaborators. We also discuss additional software tools that could be made available and provide our thoughts on the future of predictive drug discovery in this age of big data. We use some examples from our own research on neglected diseases, collaborations, mobile apps and algorithm development to illustrate these ideas.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC, 27526, USA,
| | | | | | | | | |
Collapse
|
143
|
Butler WE, Atai N, Carter B, Hochberg F. Informatic system for a global tissue-fluid biorepository with a graph theory-oriented graphical user interface. J Extracell Vesicles 2014; 3:24247. [PMID: 25317275 PMCID: PMC4172698 DOI: 10.3402/jev.v3.24247] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Revised: 06/13/2014] [Accepted: 06/15/2014] [Indexed: 12/12/2022] Open
Abstract
The Richard Floor Biorepository supports collaborative studies of extracellular vesicles (EVs) found in human fluids and tissue specimens. The current emphasis is on biomarkers for central nervous system neoplasms but its structure may serve as a template for collaborative EV translational studies in other fields. The informatic system provides specimen inventory tracking with bar codes assigned to specimens and containers and projects, is hosted on globalized cloud computing resources, and embeds a suite of shared documents, calendars, and video-conferencing features. Clinical data are recorded in relation to molecular EV attributes and may be tagged with terms drawn from a network of externally maintained ontologies thus offering expansion of the system as the field matures. We fashioned the graphical user interface (GUI) around a web-based data visualization package. This system is now in an early stage of deployment, mainly focused on specimen tracking and clinical, laboratory, and imaging data capture in support of studies to optimize detection and analysis of brain tumour-specific mutations. It currently includes 4,392 specimens drawn from 611 subjects, the majority with brain tumours. As EV science evolves, we plan biorepository changes which may reflect multi-institutional collaborations, proteomic interfaces, additional biofluids, changes in operating procedures and kits for specimen handling, novel procedures for detection of tumour-specific EVs, and for RNA extraction and changes in the taxonomy of EVs. We have used an ontology-driven data model and web-based architecture with a graph theory-driven GUI to accommodate and stimulate the semantic web of EV science.
Collapse
Affiliation(s)
- William E. Butler
- Neurosurgical Service, Massachusetts General Hospital, Boston, MA, USA
- Massachusetts General Hospital, Boston, MA, USA
| | - Nadia Atai
- Neurosurgical Service, Massachusetts General Hospital, Boston, MA, USA
- Massachusetts General Hospital, Boston, MA, USA
- Department of Cell Biology and Histology, University of Amsterdam, Amsterdam, The Netherlands
| | - Bob Carter
- Department of Neurosurgery, University of San Diego Medical School, San Diego, CA, USA
| | | |
Collapse
|
144
|
Hettne KM, Dharuri H, Zhao J, Wolstencroft K, Belhajjame K, Soiland-Reyes S, Mina E, Thompson M, Cruickshank D, Verdes-Montenegro L, Garrido J, de Roure D, Corcho O, Klyne G, van Schouwen R, ‘t Hoen PAC, Bechhofer S, Goble C, Roos M. Structuring research methods and data with the research object model: genomics workflows as a case study. J Biomed Semantics 2014; 5:41. [PMID: 25276335 PMCID: PMC4177597 DOI: 10.1186/2041-1480-5-41] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Accepted: 07/29/2014] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e.g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows. RESULTS We present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as "which particular data was input to a particular workflow to test a particular hypothesis?", and "which particular conclusions were drawn from a particular workflow?". CONCLUSIONS Applying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well. AVAILABILITY The Research Object is available at http://www.myexperiment.org/packs/428 The Wf4Ever Research Object Model is available at http://wf4ever.github.io/ro.
Collapse
Affiliation(s)
- Kristina M Hettne
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Harish Dharuri
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Jun Zhao
- />Department of Zoology, University of Oxford, Oxford, UK
| | - Katherine Wolstencroft
- />School of Computer Science, University of Manchester, Manchester, UK
- />Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands
| | - Khalid Belhajjame
- />School of Computer Science, University of Manchester, Manchester, UK
| | | | - Eleni Mina
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Mark Thompson
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | | | | | | | - David de Roure
- />Department of Zoology, University of Oxford, Oxford, UK
| | - Oscar Corcho
- />Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain
| | - Graham Klyne
- />Department of Zoology, University of Oxford, Oxford, UK
| | - Reinout van Schouwen
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Peter A C ‘t Hoen
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Sean Bechhofer
- />School of Computer Science, University of Manchester, Manchester, UK
| | - Carole Goble
- />School of Computer Science, University of Manchester, Manchester, UK
| | - Marco Roos
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
145
|
Chambers J, Davies M, Gaulton A, Papadatos G, Hersey A, Overington JP. UniChem: extension of InChI-based compound mapping to salt, connectivity and stereochemistry layers. J Cheminform 2014; 6:43. [PMID: 25221628 PMCID: PMC4158273 DOI: 10.1186/s13321-014-0043-5] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2014] [Accepted: 09/01/2014] [Indexed: 11/10/2022] Open
Abstract
UniChem is a low-maintenance, fast and freely available compound identifier mapping service, recently made available on the Internet. Until now, the criterion of molecular equivalence within UniChem has been on the basis of complete identity between Standard InChIs. However, a limitation of this approach is that stereoisomers, isotopes and salts of otherwise identical molecules are not considered as related. Here, we describe how we have exploited the layered structural representation of the Standard InChI to create new functionality within UniChem that integrates these related molecular forms. The service, called 'Connectivity Search' allows molecules to be first matched on the basis of complete identity between the connectivity layer of their corresponding Standard InChIs, and the remaining layers then compared to highlight stereochemical and isotopic differences. Parsing of Standard InChI sub-layers permits mixtures and salts to also be included in this integration process. Implementation of these enhancements required simple modifications to the schema, loader and web application, but none of which have changed the original UniChem functionality or services. The scope of queries may be varied using a variety of easily configurable options, and the output is annotated to assist the user to filter, sort and understand the difference between query and retrieved structures. A RESTful web service output may be easily processed programmatically to allow developers to present the data in whatever form they believe their users will require, or to define their own level of molecular equivalence for their resource, albeit within the constraint of identical connectivity.
Collapse
Affiliation(s)
- Jon Chambers
- ChEMBL, European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton Cambridge, CB10 1SD UK
| | - Mark Davies
- ChEMBL, European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton Cambridge, CB10 1SD UK
| | - Anna Gaulton
- ChEMBL, European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton Cambridge, CB10 1SD UK
| | - George Papadatos
- ChEMBL, European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton Cambridge, CB10 1SD UK
| | - Anne Hersey
- ChEMBL, European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton Cambridge, CB10 1SD UK
| | - John P Overington
- ChEMBL, European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton Cambridge, CB10 1SD UK
| |
Collapse
|
146
|
Korb O, Finn PW, Jones G. The cloud and other new computational methods to improve molecular modelling. Expert Opin Drug Discov 2014; 9:1121-31. [DOI: 10.1517/17460441.2014.941800] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
147
|
Clark AM, Bunin BA, Litterman NK, Schürer SC, Visser U. Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation. PeerJ 2014; 2:e524. [PMID: 25165633 PMCID: PMC4137659 DOI: 10.7717/peerj.524] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2014] [Accepted: 07/27/2014] [Indexed: 11/29/2022] Open
Abstract
Bioinformatics and computer aided drug design rely on the curation of a large number of protocols for biological assays that measure the ability of potential drugs to achieve a therapeutic effect. These assay protocols are generally published by scientists in the form of plain text, which needs to be more precisely annotated in order to be useful to software methods. We have developed a pragmatic approach to describing assays according to the semantic definitions of the BioAssay Ontology (BAO) project, using a hybrid of machine learning based on natural language processing, and a simplified user interface designed to help scientists curate their data with minimum effort. We have carried out this work based on the premise that pure machine learning is insufficiently accurate, and that expecting scientists to find the time to annotate their protocols manually is unrealistic. By combining these approaches, we have created an effective prototype for which annotation of bioassay text within the domain of the training set can be accomplished very quickly. Well-trained annotations require single-click user approval, while annotations from outside the training set domain can be identified using the search feature of a well-designed user interface, and subsequently used to improve the underlying models. By drastically reducing the time required for scientists to annotate their assays, we can realistically advocate for semantic annotation to become a standard part of the publication process. Once even a small proportion of the public body of bioassay data is marked up, bioinformatics researchers can begin to construct sophisticated and useful searching and analysis algorithms that will provide a diverse and powerful set of tools for drug discovery researchers.
Collapse
Affiliation(s)
- Alex M Clark
- Collaborative Drug Discovery, Inc. , Burlingame, CA , USA
| | - Barry A Bunin
- Collaborative Drug Discovery, Inc. , Burlingame, CA , USA
| | | | - Stephan C Schürer
- Center for Computational Science, University of Miami , Miami, FL , USA
| | - Ubbo Visser
- Center for Computational Science, University of Miami , Miami, FL , USA
| |
Collapse
|
148
|
The Royal Society of Chemistry and the delivery of chemistry data repositories for the community. J Comput Aided Mol Des 2014; 28:1023-30. [DOI: 10.1007/s10822-014-9784-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Accepted: 07/25/2014] [Indexed: 10/24/2022]
|
149
|
Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications. J Biomed Semantics 2014; 5:28. [PMID: 26261718 PMCID: PMC4530550 DOI: 10.1186/2041-1480-5-28] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Accepted: 06/16/2014] [Indexed: 11/10/2022] Open
Abstract
Background Scientific publications are documentary representations of defeasible arguments, supported by data and repeatable methods. They are the essential mediating artifacts in the ecosystem of scientific communications. The institutional “goal” of science is publishing results. The linear document publication format, dating from 1665, has survived transition to the Web. Intractable publication volumes; the difficulty of verifying evidence; and observed problems in evidence and citation chains suggest a need for a web-friendly and machine-tractable model of scientific publications. This model should support: digital summarization, evidence examination, challenge, verification and remix, and incremental adoption. Such a model must be capable of expressing a broad spectrum of representational complexity, ranging from minimal to maximal forms. Results The micropublications semantic model of scientific argument and evidence provides these features. Micropublications support natural language statements; data; methods and materials specifications; discussion and commentary; challenge and disagreement; as well as allowing many kinds of statement formalization. The minimal form of a micropublication is a statement with its attribution. The maximal form is a statement with its complete supporting argument, consisting of all relevant evidence, interpretations, discussion and challenges brought forward in support of or opposition to it. Micropublications may be formalized and serialized in multiple ways, including in RDF. They may be added to publications as stand-off metadata. An OWL 2 vocabulary for micropublications is available at http://purl.org/mp. A discussion of this vocabulary along with RDF examples from the case studies, appears as OWL Vocabulary and RDF Examples in Additional file
1. Conclusion Micropublications, because they model evidence and allow qualified, nuanced assertions, can play essential roles in the scientific communications ecosystem in places where simpler, formalized and purely statement-based models, such as the nanopublications model, will not be sufficient. At the same time they will add significant value to, and are intentionally compatible with, statement-based formalizations. We suggest that micropublications, generated by useful software tools supporting such activities as writing, editing, reviewing, and discussion, will be of great value in improving the quality and tractability of biomedical communications.
Collapse
|
150
|
Zdrazil B, Chichester C, Zander Balderud L, Engkvist O, Gaulton A, Overington JP. Transporter assays and assay ontologies: useful tools for drug discovery. DRUG DISCOVERY TODAY. TECHNOLOGIES 2014; 12:e47-e54. [PMID: 25027375 DOI: 10.1016/j.ddtec.2014.03.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Transport proteins represent an eminent class of drug targets and ADMET (absorption, distribution, metabolism, excretion, toxicity) associated genes. There exists a large number of distinct activity assays for transport proteins, depending on not only the measurement needed (e.g. transport activity, strength of ligand–protein interaction), but also due to heterogeneous assay setups used by different research groups. Efforts to systematically organize this (divergent) bioassay data have large potential impact in Public-Private partnership and conventional commercial drug discovery. In this short review, we highlight some of the frequently used high-throughput assays for transport proteins, and we discuss emerging assay ontologies and their application to this field. Focusing on human P-glycoprotein (Multidrug resistance protein 1; gene name: ABCB1, MDR1), we exemplify how annotation of bioassay data per target class could improve and add to existing ontologies, and we propose to include an additional layer of metadata supporting data fusion across different bioassays.
Collapse
Affiliation(s)
- Barbara Zdrazil
- University of Vienna, Division of Drug Design and Medicinal Chemistry, Department of Pharmaceutical Chemistry, Pharmacoinformatics Research Group, Althanstrasse 14, A-1090 Vienna, Austria
| | - Christine Chichester
- Swiss Institute of Bioinformatics, CALIPHO Group, CMU - Rue Michel-Servet 1, 1211 Geneva 4, Switzerland
| | | | - Ola Engkvist
- Discovery Sciences, Chemistry Innovation Center, AstraZeneca R&D, Mölndal, Sweden
| | - Anna Gaulton
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - John P Overington
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| |
Collapse
|