1
|
Högberg J, Järnberg J. Approaches for the setting of occupational exposure limits (OELs) for carcinogens. Crit Rev Toxicol 2023:1-37. [PMID: 37366107 DOI: 10.1080/10408444.2023.2218887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 05/12/2023] [Accepted: 05/22/2023] [Indexed: 06/28/2023]
Abstract
This article addresses issues of importance for occupational exposure limits (OELs) and chemical carcinogens with a focus on non-threshold carcinogens. It comprises scientific as well as regulatory issues. It is an overview, not a comprehensive review. A central topic is mechanistic research and insights, and its implications for cancer risk assessment. Alongside scientific advancements, the approaches of hazard identification and qualitative and quantitative risk assessment have developed over the years. The key steps in a quantitative risk assessment are outlined, with special attention given to the dose-response assessment and the derivation of an OEL using risk calculations or default assessment factors. The work procedures of several bodies performing cancer hazard identifications and quantitative risk assessments, as well as regulatory procedures to derive OELs for non-threshold carcinogens, are presented. Non-threshold carcinogens for which the European Union (EU) introduced binding OELs in 2017-2019 serve as illustrations together with some currently used strategies in the EU and elsewhere. Available knowledge supports the derivation of health-based OELs (Hb-OELs) for non-threshold carcinogens, and the use of a risk-based approach with low-dose linear extrapolation (linear non-threshold, LNT) as the default for non-threshold carcinogens. However, there is a need to develop methods that allow recent years' advances in cancer research to be used for improving risk estimates. It is recommended that defined risk levels (terminology and numerical values) are harmonised, and that both collective and individual risks are considered and clearly communicated. Socioeconomic aspects should be dealt with transparently and separated from the scientific health risk assessment.
Collapse
Affiliation(s)
- Johan Högberg
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | | |
Collapse
|
2
|
A Narrative Literature Review of Natural Language Processing Applied to the Occupational Exposome. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19148544. [PMID: 35886395 PMCID: PMC9316260 DOI: 10.3390/ijerph19148544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 07/07/2022] [Accepted: 07/11/2022] [Indexed: 02/05/2023]
Abstract
The evolution of the Exposome concept revolutionised the research in exposure assessment and epidemiology by introducing the need for a more holistic approach on the exploration of the relationship between the environment and disease. At the same time, further and more dramatic changes have also occurred on the working environment, adding to the already existing dynamic nature of it. Natural Language Processing (NLP) refers to a collection of methods for identifying, reading, extracting and untimely transforming large collections of language. In this work, we aim to give an overview of how NLP has successfully been applied thus far in Exposome research. Methods: We conduct a literature search on PubMed, Scopus and Web of Science for scientific articles published between 2011 and 2021. We use both quantitative and qualitative methods to screen papers and provide insights into the inclusion and exclusion criteria. We outline our approach for article selection and provide an overview of our findings. This is followed by a more detailed insight into selected articles. Results: Overall, 6420 articles were screened for the suitability of this review, where we review 37 articles in depth. Finally, we discuss future avenues of research and outline challenges in existing work. Conclusions: Our results show that (i) there has been an increase in articles published that focus on applying NLP to exposure and epidemiology research, (ii) most work uses existing NLP tools and (iii) traditional machine learning is the most popular approach.
Collapse
|
3
|
Using semantics to scale up evidence-based chemical risk-assessments. PLoS One 2021; 16:e0260712. [PMID: 34910747 PMCID: PMC8673667 DOI: 10.1371/journal.pone.0260712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 11/15/2021] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND The manual processes used for risk assessments are not scaling to the amount of data available. Although automated approaches appear promising, they must be transparent in a public policy setting. OBJECTIVE Our goal is to create an automated approach that moves beyond retrieval to the extraction step of the information synthesis process, where evidence is characterized as supporting, refuting, or neutral with respect to a given outcome. METHODS We combine knowledge resources and natural language processing to resolve coordinated ellipses and thus avoid surface level differences between concepts in an ontology and outcomes in an abstract. As with a systematic review, the search criterion, and inclusion and exclusion criterion are explicit. RESULTS The system scales to 482K abstracts on 27 chemicals. Results for three endpoints that are critical for cancer risk assessments show that refuting evidence (where the outcome decreased) was higher for cell proliferation (45.9%), and general cell changes (37.7%) than for cell death (25.0%). Moreover, cell death was the only end point where supporting claims were the majority (61.3%). If the number of abstracts that measure an outcome was used as a proxy for association there would be a stronger association with cell proliferation than cell death (20/27 chemicals). However, if the amount of supporting evidence was used (where the outcome increased) the conclusion would change for 21/27 chemicals (20 from proliferation to death and 1 from death to proliferation). CONCLUSIONS We provide decision makers with a visual representation of supporting, neutral, and refuting evidence whilst maintaining the reproducibility and transparency needed for public policy. Our findings show that results from the retrieval step where the number of abstracts that measure an outcome are reported can be misleading if not accompanied with results from the extraction step where the directionality of the outcome is established.
Collapse
|
4
|
Zengul AG, Zengul FD, Ozaydin B, Oner N, Fiveash JB. Identifying research themes and trends in the top 20 cancer journals through textual analysis. J Cancer Policy 2021; 30:100313. [DOI: 10.1016/j.jcpo.2021.100313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 09/28/2021] [Accepted: 10/27/2021] [Indexed: 11/28/2022]
|
5
|
Conceição SIR, Couto FM. Text Mining for Building Biomedical Networks Using Cancer as a Case Study. Biomolecules 2021; 11:biom11101430. [PMID: 34680062 PMCID: PMC8533101 DOI: 10.3390/biom11101430] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 09/24/2021] [Accepted: 09/27/2021] [Indexed: 12/15/2022] Open
Abstract
In the assembly of biological networks it is important to provide reliable interactions in an effort to have the most possible accurate representation of real-life systems. Commonly, the data used to build a network comes from diverse high-throughput essays, however most of the interaction data is available through scientific literature. This has become a challenge with the notable increase in scientific literature being published, as it is hard for human curators to track all recent discoveries without using efficient tools to help them identify these interactions in an automatic way. This can be surpassed by using text mining approaches which are capable of extracting knowledge from scientific documents. One of the most important tasks in text mining for biological network building is relation extraction, which identifies relations between the entities of interest. Many interaction databases already use text mining systems, and the development of these tools will lead to more reliable networks, as well as the possibility to personalize the networks by selecting the desired relations. This review will focus on different approaches of automatic information extraction from biomedical text that can be used to enhance existing networks or create new ones, such as deep learning state-of-the-art approaches, focusing on cancer disease as a case-study.
Collapse
|
6
|
Ali I, Dreij K, Baker S, Högberg J, Korhonen A, Stenius U. Application of Text Mining in Risk Assessment of Chemical Mixtures: A Case Study of Polycyclic Aromatic Hydrocarbons (PAHs). ENVIRONMENTAL HEALTH PERSPECTIVES 2021; 129:67008. [PMID: 34165340 PMCID: PMC8318069 DOI: 10.1289/ehp6702] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Revised: 05/07/2021] [Accepted: 05/10/2021] [Indexed: 05/08/2023]
Abstract
BACKGROUND Cancer risk assessment of complex exposures, such as exposure to mixtures of polycyclic aromatic hydrocarbons (PAHs), is challenging due to the diverse biological activities of these compounds. With the help of text mining (TM), we have developed TM tools-the latest iteration of the Cancer Risk Assessment using Biomedical literature tool (CRAB3) and a Cancer Hallmarks Analytics Tool (CHAT)-that could be useful for automatic literature analyses in cancer risk assessment and research. Although CRAB3 analyses are based on carcinogenic modes of action (MOAs) and cover almost all the key characteristics of carcinogens, CHAT evaluates literature according to the hallmarks of cancer referring to the alterations in cellular behavior that characterize the cancer cell. OBJECTIVES The objective was to evaluate the usefulness of these tools to support cancer risk assessment by performing a case study of 22 European Union and U.S. Environmental Protection Agency priority PAHs and diesel exhaust and a case study of PAH interactions with silica. METHODS We analyzed PubMed literature, comprising 57,498 references concerning priority PAHs and complex PAH mixtures, using CRAB3 and CHAT. RESULTS CRAB3 analyses correctly identified similarities and differences in genotoxic and nongenotoxic MOAs of the 22 priority PAHs and grouped them according to their known carcinogenic potential. CHAT had the same capacity and complemented the CRAB output when comparing, for example, benzo[a]pyrene and dibenzo[a,l]pyrene. Both CRAB3 and CHAT analyses highlighted potentially interacting mechanisms within and across complex PAH mixtures and mechanisms of possible importance for interactions with silica. CONCLUSION These data suggest that our TM approach can be useful in the hazard identification of PAHs and mixtures including PAHs. The tools can assist in grouping chemicals and identifying similarities and differences in carcinogenic MOAs and their interactions. https://doi.org/10.1289/EHP6702.
Collapse
Affiliation(s)
- Imran Ali
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Kristian Dreij
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Simon Baker
- Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge, UK
| | - Johan Högberg
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Anna Korhonen
- Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge, UK
| | - Ulla Stenius
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
7
|
Ding Z, Liu R, Yuan H. A text mining-based thematic model for analyzing construction and demolition waste management studies. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2021; 28:30499-30527. [PMID: 33905057 DOI: 10.1007/s11356-021-13989-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 04/13/2021] [Indexed: 06/12/2023]
Abstract
Over the years, numerous studies have been conducted to investigate construction and demolition waste (CDW) management problems. However, the massive amount of literature brings challenges to scholars because it is difficult and time-consuming to manually identify research emphasis from the literature. Therefore, a method that can informationize literature collection and automatically detect insights from the identified literature is worthy of exploration. This paper attempts to present a comprehensive thematic model by combining Latent Dirichlet Allocation, word2vec, and community detection algorithm on python to detect insights from CDW management literature. Based on the database of Web of Science, 641 articles published between 2000 and 2019 are retrieved and used as the sample for analysis. The comprehensive thematic results reveal a four-domain knowledge map in CDW management research, which covers (1) introducing current situation of CDW management, (2) quantifying CDW generation, (3) assessing CDW and by-products, and (4) facilitating waste diversion. Future research directions in CDW management research have also been discussed. The results prove that the comprehensive thematic model is useful in mining insights from CDW management literature.
Collapse
Affiliation(s)
- Zhikun Ding
- Department of Construction Management and Real Estate, College of Civil and Transportation Engineering, Shenzhen University, Shenzhen, People's Republic of China
| | - Rongsheng Liu
- Department of Construction Management and Real Estate, College of Civil and Transportation Engineering, Shenzhen University, Shenzhen, People's Republic of China
| | - Hongping Yuan
- School of Management, Guangzhou University, Guangdong, 510006, People's Republic of China.
| |
Collapse
|
8
|
Ovalle A, Goldstein O, Kachuee M, Wu ESC, Hong C, Holloway IW, Sarrafzadeh M. Leveraging Social Media Activity and Machine Learning for HIV and Substance Abuse Risk Assessment: Development and Validation Study. J Med Internet Res 2021; 23:e22042. [PMID: 33900200 PMCID: PMC8111510 DOI: 10.2196/22042] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 11/25/2020] [Accepted: 01/31/2021] [Indexed: 01/08/2023] Open
Abstract
Background Social media networks provide an abundance of diverse information that can be leveraged for data-driven applications across various social and physical sciences. One opportunity to utilize such data exists in the public health domain, where data collection is often constrained by organizational funding and limited user adoption. Furthermore, the efficacy of health interventions is often based on self-reported data, which are not always reliable. Health-promotion strategies for communities facing multiple vulnerabilities, such as men who have sex with men, can benefit from an automated system that not only determines health behavior risk but also suggests appropriate intervention targets. Objective This study aims to determine the value of leveraging social media messages to identify health risk behavior for men who have sex with men. Methods The Gay Social Networking Analysis Program was created as a preliminary framework for intelligent web-based health-promotion intervention. The program consisted of a data collection system that automatically gathered social media data, health questionnaires, and clinical results for sexually transmitted diseases and drug tests across 51 participants over 3 months. Machine learning techniques were utilized to assess the relationship between social media messages and participants' offline sexual health and substance use biological outcomes. The F1 score, a weighted average of precision and recall, was used to evaluate each algorithm. Natural language processing techniques were employed to create health behavior risk scores from participant messages. Results Offline HIV, amphetamine, and methamphetamine use were correctly identified using only social media data, with machine learning models obtaining F1 scores of 82.6%, 85.9%, and 85.3%, respectively. Additionally, constructed risk scores were found to be reasonably comparable to risk scores adapted from the Center for Disease Control. Conclusions To our knowledge, our study is the first empirical evaluation of a social media–based public health intervention framework for men who have sex with men. We found that social media data were correlated with offline sexual health and substance use, verified through biological testing. The proof of concept and initial results validate that public health interventions can indeed use social media–based systems to successfully determine offline health risk behaviors. The findings demonstrate the promise of deploying a social media–based just-in-time adaptive intervention to target substance use and HIV risk behavior.
Collapse
Affiliation(s)
- Anaelia Ovalle
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, United States
| | - Orpaz Goldstein
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, United States
| | - Mohammad Kachuee
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, United States
| | - Elizabeth S C Wu
- Department of Social Welfare, University of California Los Angeles, Los Angeles, CA, United States
| | - Chenglin Hong
- Department of Social Welfare, University of California Los Angeles, Los Angeles, CA, United States
| | - Ian W Holloway
- Department of Social Welfare, University of California Los Angeles, Los Angeles, CA, United States
| | - Majid Sarrafzadeh
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, United States
| |
Collapse
|
9
|
Acheson E, Purves RS. Extracting and modeling geographic information from scientific articles. PLoS One 2021; 16:e0244918. [PMID: 33406109 PMCID: PMC7787447 DOI: 10.1371/journal.pone.0244918] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Accepted: 12/20/2020] [Indexed: 11/29/2022] Open
Abstract
Scientific articles often contain relevant geographic information such as where field work was performed or where patients were treated. Most often, this information appears in the full-text article contents as a description in natural language including place names, with no accompanying machine-readable geographic metadata. Automatically extracting this geographic information could help conduct meta-analyses, find geographical research gaps, and retrieve articles using spatial search criteria. Research on this problem is still in its infancy, with many works manually processing corpora for locations and few cross-domain studies. In this paper, we develop a fully automatic pipeline to extract and represent relevant locations from scientific articles, applying it to two varied corpora. We obtain good performance, with full pipeline precision of 0.84 for an environmental corpus, and 0.78 for a biomedical corpus. Our results can be visualized as simple global maps, allowing human annotators to both explore corpus patterns in space and triage results for downstream analysis. Future work should not only focus on improving individual pipeline components, but also be informed by user needs derived from the potential spatial analysis and exploration of such corpora.
Collapse
Affiliation(s)
- Elise Acheson
- Department of Geography, University of Zurich, Zurich, Switzerland
- * E-mail:
| | - Ross S. Purves
- Department of Geography, University of Zurich, Zurich, Switzerland
| |
Collapse
|
10
|
A Thematic Network-Based Methodology for the Research Trend Identification in Building Energy Management. ENERGIES 2020. [DOI: 10.3390/en13184621] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The rapid increase in the number of online resources and academic articles has created great challenges for researchers and practitioners to efficiently grasp the status quo of building energy-related research. Rather than relying on manual inspections, advanced data analytics (such as text mining) can be used to enhance the efficiency and effectiveness in literature reviews. This article proposes a text mining-based approach for the automatic identification of major research trends in the field of building energy management. In total, 5712 articles (from 1972 to 2019) are analyzed. The word2vec model is used to optimize the latent Dirichlet allocation (LDA) results, and social networks are adopted to visualize the inter-topic relationships. The results are presented using the Gephi visualization platform. Based on inter-topic relevance and topic evolutions, in-depth analysis has been conducted to reveal research trends and hot topics in the field of building energy management. The research results indicate that heating, ventilation, and air conditioning (HVAC) is one of the most essential topics. The thermal environment, indoor illumination, and residential building occupant behaviors are important factors affecting building energy consumption. In addition, building energy-saving renovations, green buildings, and intelligent buildings are research hotspots, and potential future directions. The method developed in this article serves as an effective alternative for researchers and practitioners to extract useful insights from massive text data. It provides a prototype for the automatic identification of research trends based on text mining techniques.
Collapse
|
11
|
Wang N, Yang Y, Pang M, Du C, Chen Y, Li S, Tian Z, Feng F, Wang Y, Chen Z, Liu B, Rong L. MicroRNA-135a-5p Promotes the Functional Recovery of Spinal Cord Injury by Targeting SP1 and ROCK. MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 22:1063-1077. [PMID: 33294293 PMCID: PMC7691148 DOI: 10.1016/j.omtn.2020.08.035] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Accepted: 08/28/2020] [Indexed: 01/18/2023]
Abstract
Emerging evidence indicates that microRNAs play a pivotal role in neural remodeling after spinal cord injury (SCI). This study aimed to investigate the mechanisms of miR-135a-5p in regulating the functional recovery of SCI by impacting its target genes and downstream signaling. The gene transfection assay and luciferase reporter assay confirmed the target relationship between miR-135a-5p and its target genes (specificity protein 1 [SP1] and Rho-associated kinase [ROCK]1/2). By establishing the H2O2-induced injury model, miR-135a-5p transfection was found to inhibit the apoptosis of PC12 cells by downregulating the SP1 gene, which subsequently induced downregulation of pro-apoptotic proteins (Bax, cleaved caspase-3) and upregulation of anti-apoptotic protein Bcl-2. By measuring the neurite lengths of PC12 cells, miR-135a-5p transfection was found to promote axon outgrowth by downregulating the ROCK1/2 gene, which subsequently caused upregulation of phosphate protein kinase B (AKT) and phosphate glycogen synthase kinase 3β (GSK3β). Use of the rat SCI models showed that miR-135a-5p could increase the Basso, Beattie, and Bresnahan (BBB) scores, indicating neurological function recovery. In conclusion, the miR-135a-5p-SP1-Bax/Bcl-2/caspase-3 and miR-135a-5p-ROCK-AKT/GSK3β axes are involved in functional recovery of SCI by regulating neural apoptosis and axon regeneration, respectively, and thus can be promising effective therapeutic strategies in SCI.
Collapse
Affiliation(s)
- Nanxiang Wang
- Department of Spine Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
| | - Yang Yang
- Department of Spine Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
| | - Mao Pang
- Department of Spine Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
| | - Cong Du
- Cell-Gene Therapy Translational Medicine Research Center, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
| | - Yuyong Chen
- Department of Spine Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
| | - Simin Li
- Department of Cariology, Endodontology and Periodontology, University Leipzig, Liebigstrasse 12, 04103 Leipzig, Germany
| | - Zhenming Tian
- Department of Spine Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
| | - Feng Feng
- Department of Spine Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
| | - Yang Wang
- Department of Spine Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
| | - Zhenxiang Chen
- Department of Spine Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
| | - Bin Liu
- Department of Spine Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
| | - Limin Rong
- Department of Spine Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
| |
Collapse
|
12
|
DES-ROD: Exploring Literature to Develop New Links between RNA Oxidation and Human Diseases. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2020; 2020:5904315. [PMID: 32308806 PMCID: PMC7142358 DOI: 10.1155/2020/5904315] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 02/21/2020] [Indexed: 12/27/2022]
Abstract
Normal cellular physiology and biochemical processes require undamaged RNA molecules. However, RNAs are frequently subjected to oxidative damage. Overproduction of reactive oxygen species (ROS) leads to RNA oxidation and disturbs redox (oxidation-reduction reaction) homeostasis. When oxidation damage affects RNA carrying protein-coding information, this may result in the synthesis of aberrant proteins as well as a lower efficiency of translation. Both of these, as well as imbalanced redox homeostasis, may lead to numerous human diseases. The number of studies on the effects of RNA oxidative damage in mammals is increasing by year due to the understanding that this oxidation fundamentally leads to numerous human diseases. To enable researchers in this field to explore information relevant to RNA oxidation and effects on human diseases, we developed DES-ROD, an online knowledgebase that contains processed information from 298,603 relevant documents that consist of PubMed abstracts and PubMed Central full-text articles. The system utilizes concepts/terms from 38 curated thematic dictionaries mapped to the analyzed documents. Researchers can explore enriched concepts, as well as enriched pairs of putatively associated concepts. In this way, one can explore mutual relationships between any combinations of two concepts from used dictionaries. Dictionaries cover a wide range of biomedical topics, such as human genes and proteins, pathways, Gene Ontology categories, mutations, noncoding RNAs, enzymes, toxins, metabolites, and diseases. This makes insights into different facets of the effects of RNA oxidation and the control of this process possible. The usefulness of the DES-ROD system is demonstrated by case studies on some known information, as well as potentially novel information involving RNA oxidation and diseases. DES-ROD is the first knowledgebase based on text and data mining that focused on the exploration of RNA oxidation and human diseases.
Collapse
|
13
|
Wittwehr C, Blomstedt P, Gosling JP, Peltola T, Raffael B, Richarz AN, Sienkiewicz M, Whaley P, Worth A, Whelan M. Artificial Intelligence for chemical risk assessment. ACTA ACUST UNITED AC 2020; 13:100114. [PMID: 32140631 PMCID: PMC7043333 DOI: 10.1016/j.comtox.2019.100114] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Revised: 09/10/2019] [Accepted: 11/25/2019] [Indexed: 02/03/2023]
Abstract
As the basis for managing the risks of chemical exposure, the Chemical Risk Assessment (CRA) process can impact a substantial part of the economy, the health of hundreds of millions of people, and the condition of the environment. However, the number of properly assessed chemicals falls short of societal needs due to a lack of experts for evaluation, interference of third party interests, and the sheer volume of potentially relevant information on the chemicals from disparate sources. In order to explore ways in which computational methods may help overcome this discrepancy between the number of chemical risk assessments required on the one hand and the number and adequateness of assessments actually being conducted on the other, the European Commission's Joint Research Centre organised a workshop on Artificial Intelligence for Chemical Risk Assessment (AI4CRA). The workshop identified a number of areas where Artificial Intelligence could potentially increase the number and quality of regulatory risk management decisions based on CRA, involving process simulation, supporting evaluation, identifying problems, facilitating collaboration, finding experts, evidence gathering, systematic review, knowledge discovery, and building cognitive models. Although these are interconnected, they are organised and discussed under two main themes: scientific-technical process and social aspects and the decision making process.
Collapse
Affiliation(s)
| | | | | | | | - Barbara Raffael
- European Commission, Joint Research Centre (JRC), Ispra, Italy
| | | | | | - Paul Whaley
- Lancaster Environment Centre, University Lancaster, UK.,The Evidence-based Toxicology Collaboration at Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Andrew Worth
- European Commission, Joint Research Centre (JRC), Ispra, Italy
| | - Maurice Whelan
- European Commission, Joint Research Centre (JRC), Ispra, Italy
| |
Collapse
|
14
|
Barrón Cuenca J, Tirado N, Barral J, Ali I, Levi M, Stenius U, Berglund M, Dreij K. Increased levels of genotoxic damage in a Bolivian agricultural population exposed to mixtures of pesticides. THE SCIENCE OF THE TOTAL ENVIRONMENT 2019; 695:133942. [PMID: 31756860 DOI: 10.1016/j.scitotenv.2019.133942] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Revised: 08/13/2019] [Accepted: 08/14/2019] [Indexed: 05/25/2023]
Abstract
During the past decades, farmers in low to middle-income countries have increased their use of pesticides, and thereby the risk of being exposed to potentially genotoxic chemicals that can cause adverse health effects. Here, the aim was to investigate the correlation between exposure to pesticides and genotoxic damage in a Bolivian agricultural population. Genotoxic effects were assessed in peripheral blood samples by comet and micronucleus (MN) assays, and exposure levels by measurements of 10 urinary pesticide metabolites. Genetic susceptibility was assessed by determination of null frequency of GSTM1 and GSTT1 genotypes. The results showed higher MN frequency in women and farmers active ≥8 years compared to their counterpart (P < 0.05). In addition, age, GST genotype, alcohol consumption, and type of water source influenced levels of genotoxic damage. Individuals with high exposure to tebuconazole, 2,4-D, or cyfluthrin displayed increased levels of genotoxic damage (P < 0.05-0.001). Logistic regression was conducted to evaluate associations between pesticide exposure and risk of genotoxic damage. After adjustment for confounders, a significant increased risk of DNA strand breaks was found for high exposure to 2,4-D, odds ratio (OR) = 1.99 (P < 0.05). In contrast, high exposure to pyrethroids was associated with a reduced risk of DNA strand breaks, OR = 0.49 (P < 0.05). It was also found that high exposure to certain mixtures of pesticides (containing mainly 2,4-D or cyfluthrin) was significantly associated with increased level and risk of genotoxic damage (P < 0.05). In conclusion, our data show that high exposure levels to some pesticides is associated with an increased risk of genotoxic damage among Bolivian farmers, suggesting that their use should be better controlled or limited.
Collapse
Affiliation(s)
- Jessika Barrón Cuenca
- Institute of Environmental Medicine, Unit of Biochemical Toxicology, Karolinska Institutet, Box 210, SE-171 77 Stockholm, Sweden; Genetic Institute, Medicine Faculty, Universidad Mayor de San Andrés, Saavedra Av. 2246 Miraflores, La Paz, Bolivia
| | - Noemí Tirado
- Genetic Institute, Medicine Faculty, Universidad Mayor de San Andrés, Saavedra Av. 2246 Miraflores, La Paz, Bolivia.
| | - Josue Barral
- Genetic Institute, Medicine Faculty, Universidad Mayor de San Andrés, Saavedra Av. 2246 Miraflores, La Paz, Bolivia
| | - Imran Ali
- Institute of Environmental Medicine, Unit of Biochemical Toxicology, Karolinska Institutet, Box 210, SE-171 77 Stockholm, Sweden
| | - Michael Levi
- Institute of Environmental Medicine, Unit of Metals and Health, Karolinska Institutet, Box 210, SE-171 77 Stockholm, Sweden
| | - Ulla Stenius
- Institute of Environmental Medicine, Unit of Biochemical Toxicology, Karolinska Institutet, Box 210, SE-171 77 Stockholm, Sweden
| | - Marika Berglund
- Institute of Environmental Medicine, Unit of Biochemical Toxicology, Karolinska Institutet, Box 210, SE-171 77 Stockholm, Sweden
| | - Kristian Dreij
- Institute of Environmental Medicine, Unit of Biochemical Toxicology, Karolinska Institutet, Box 210, SE-171 77 Stockholm, Sweden.
| |
Collapse
|
15
|
ProtFus: A Comprehensive Method Characterizing Protein-Protein Interactions of Fusion Proteins. PLoS Comput Biol 2019; 15:e1007239. [PMID: 31437145 PMCID: PMC6705771 DOI: 10.1371/journal.pcbi.1007239] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Accepted: 07/03/2019] [Indexed: 01/10/2023] Open
Abstract
Tailored therapy aims to cure cancer patients effectively and safely, based on the complex interactions between patients' genomic features, disease pathology and drug metabolism. Thus, the continual increase in scientific literature drives the need for efficient methods of data mining to improve the extraction of useful information from texts based on patients' genomic features. An important application of text mining to tailored therapy in cancer encompasses the use of mutations and cancer fusion genes as moieties that change patients' cellular networks to develop cancer, and also affect drug metabolism. Fusion proteins, which are derived from the slippage of two parental genes, are produced in cancer by chromosomal aberrations and trans-splicing. Given that the two parental proteins for predicted fusion proteins are known, we used our previously developed method for identifying chimeric protein-protein interactions (ChiPPIs) associated with the fusion proteins. Here, we present a validation approach that receives fusion proteins of interest, predicts their cellular network alterations by ChiPPI and validates them by our new method, ProtFus, using an online literature search. This process resulted in a set of 358 fusion proteins and their corresponding protein interactions, as a training set for a Naïve Bayes classifier, to identify predicted fusion proteins that have reliable evidence in the literature and that were confirmed experimentally. Next, for a test group of 1817 fusion proteins, we were able to identify from the literature 2908 PPIs in total, across 18 cancer types. The described method, ProtFus, can be used for screening the literature to identify unique cases of fusion proteins and their PPIs, as means of studying alterations of protein networks in cancers. Availability: http://protfus.md.biu.ac.il/.
Collapse
|
16
|
Literature-Based Enrichment Insights into Redox Control of Vascular Biology. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2019; 2019:1769437. [PMID: 31223421 PMCID: PMC6542245 DOI: 10.1155/2019/1769437] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 04/11/2019] [Accepted: 05/02/2019] [Indexed: 02/07/2023]
Abstract
In cellular physiology and signaling, reactive oxygen species (ROS) play one of the most critical roles. ROS overproduction leads to cellular oxidative stress. This may lead to an irrecoverable imbalance of redox (oxidation-reduction reaction) function that deregulates redox homeostasis, which itself could lead to several diseases including neurodegenerative disease, cardiovascular disease, and cancers. In this study, we focus on the redox effects related to vascular systems in mammals. To support research in this domain, we developed an online knowledge base, DES-RedoxVasc, which enables exploration of information contained in the biomedical scientific literature. The DES-RedoxVasc system analyzed 233399 documents consisting of PubMed abstracts and PubMed Central full-text articles related to different aspects of redox biology in vascular systems. It allows researchers to explore enriched concepts from 28 curated thematic dictionaries, as well as literature-derived potential associations of pairs of such enriched concepts, where associations themselves are statistically enriched. For example, the system allows exploration of associations of pathways, diseases, mutations, genes/proteins, miRNAs, long ncRNAs, toxins, drugs, biological processes, molecular functions, etc. that allow for insights about different aspects of redox effects and control of processes related to the vascular system. Moreover, we deliver case studies about some existing or possibly novel knowledge regarding redox of vascular biology demonstrating the usefulness of DES-RedoxVasc. DES-RedoxVasc is the first compiled knowledge base using text mining for the exploration of this topic.
Collapse
|
17
|
Abstract
The accessing and processing of textual information (i.e. the storing and querying of a set of strings) is especially important for many current applications (e.g. information retrieval and social networks), especially when working in the fields of Big Data or IoT, which require the handling of very large string dictionaries. Typical data structures for textual indexing are Hash Tables and some variants of Tries such as the Double Trie (DT). In this paper, we propose an extension of the DT that we have called MergedTrie. It improves the DT compression by merging both Tries into a single and by segmenting the indexed term into two fixed length parts in order to balance the new Trie. Thus, a higher overlapping of both prefixes and suffixes is obtained. Moreover, we propose a new implementation of Tries that achieves better compression rates than the Double-Array representation usually chosen for implementing Tries. Our proposal also overcomes the limitation of static implementations that does not allow insertions and updates in their compact representations. Finally, our MergedTrie implementation experimentally improves the efficiency of the Hash Tables, the DTs, the Double-Array, the Crit-bit, the Directed Acyclic Word Graphs (DAWG), and the Acyclic Deterministic Finite Automata (ADFA) data structures, requiring less space than the original text to be indexed.
Collapse
Affiliation(s)
- Antonio Ferrández
- GPLSI Research Group, Department of Software and Computing Systems, University of Alicante, Alicante, Spain
| | - Jesús Peral
- Lucentia Research Group, Department of Software and Computing Systems, University of Alicante, Alicante, Spain
- * E-mail:
| |
Collapse
|
18
|
Azam MF, Musa A, Dehmer M, Yli-Harja OP, Emmert-Streib F. Global Genetics Research in Prostate Cancer: A Text Mining and Computational Network Theory Approach. Front Genet 2019; 10:70. [PMID: 30838019 PMCID: PMC6383410 DOI: 10.3389/fgene.2019.00070] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2018] [Accepted: 01/28/2019] [Indexed: 11/13/2022] Open
Abstract
Prostate cancer is the most common cancer type in men in Finland and second worldwide. In this paper, we analyze almost 150, 000 published papers about prostate cancer, authored by ten thousands of scientists worldwide, with an integrated text mining and computational network theory approach. We demonstrate how to integrate text mining with network analysis investigating research contributions of countries and collaborations within and between countries. Furthermore, we study the time evolution of individually and collectively studied genes. Finally, we investigate a collaboration network of Finland and compare studied genes with globally studied genes in prostate cancer genetics. Overall, our results provide a global overview of prostate cancer research in genetics. In addition, we present a specific discussion for Finland. Our results shed light on trends within the last 30 years and are useful for translational researchers within the full range from genetics to public health management and health policy.
Collapse
Affiliation(s)
- Md Facihul Azam
- Predictive Society and Data Analysis Lab, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland.,Institute of Biosciences and Medical Technology, Tampere, Finland
| | - Aliyu Musa
- Predictive Society and Data Analysis Lab, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland.,Institute of Biosciences and Medical Technology, Tampere, Finland
| | - Matthias Dehmer
- Faculty for Management, Institute for Intelligent Production, University of Applied Sciences Upper Austria, Steyr, Austria.,Department of Mechatronics and Biomedical Computer Science, UMIT, Hall in Tyrol, Austria.,College of Computer and Control Engineering, Nankai University, Tianjin, China
| | - Olli P Yli-Harja
- Institute of Biosciences and Medical Technology, Tampere, Finland.,Computational Systems Biology, Faculty of Biomedical Engineering, Tampere University, Tampere, Finland.,Institute for Systems Biology, Seattle, WA, United States
| | - Frank Emmert-Streib
- Predictive Society and Data Analysis Lab, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland.,Institute of Biosciences and Medical Technology, Tampere, Finland
| |
Collapse
|
19
|
Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 2017; 117:7673-7761. [PMID: 28475312 DOI: 10.1021/acs.chemrev.6b00851] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
Collapse
Affiliation(s)
- Martin Krallinger
- Structural Computational Biology Group, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre , C/Melchor Fernández Almagro 3, Madrid E-28029, Spain
| | - Obdulia Rabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Anália Lourenço
- ESEI - Department of Computer Science, University of Vigo , Edificio Politécnico, Campus Universitario As Lagoas s/n, Ourense E-32004, Spain.,Centro de Investigaciones Biomédicas (Centro Singular de Investigación de Galicia) , Campus Universitario Lagoas-Marcosende, Vigo E-36310, Spain.,CEB-Centre of Biological Engineering, University of Minho , Campus de Gualtar, Braga 4710-057, Portugal
| | - Julen Oyarzabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Alfonso Valencia
- Life Science Department, Barcelona Supercomputing Centre (BSC-CNS) , C/Jordi Girona, 29-31, Barcelona E-08034, Spain.,Joint BSC-IRB-CRG Program in Computational Biology, Parc Científic de Barcelona , C/ Baldiri Reixac 10, Barcelona E-08028, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA) , Passeig de Lluís Companys 23, Barcelona E-08010, Spain
| |
Collapse
|
20
|
Karystianis G, Thayer K, Wolfe M, Tsafnat G. Evaluation of a rule-based method for epidemiological document classification towards the automation of systematic reviews. J Biomed Inform 2017; 70:27-34. [PMID: 28455150 DOI: 10.1016/j.jbi.2017.04.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Revised: 03/14/2017] [Accepted: 04/02/2017] [Indexed: 02/02/2023]
Abstract
INTRODUCTION Most data extraction efforts in epidemiology are focused on obtaining targeted information from clinical trials. In contrast, limited research has been conducted on the identification of information from observational studies, a major source for human evidence in many fields, including environmental health. The recognition of key epidemiological information (e.g., exposures) through text mining techniques can assist in the automation of systematic reviews and other evidence summaries. METHOD We designed and applied a knowledge-driven, rule-based approach to identify targeted information (study design, participant population, exposure, outcome, confounding factors, and the country where the study was conducted) from abstracts of epidemiological studies included in several systematic reviews of environmental health exposures. The rules were based on common syntactical patterns observed in text and are thus not specific to any systematic review. To validate the general applicability of our approach, we compared the data extracted using our approach versus hand curation for 35 epidemiological study abstracts manually selected for inclusion in two systematic reviews. RESULTS The returned F-score, precision, and recall ranged from 70% to 98%, 81% to 100%, and 54% to 97%, respectively. The highest precision was observed for exposure, outcome and population (100%) while recall was best for exposure and study design with 97% and 89%, respectively. The lowest recall was observed for the population (54%), which also had the lowest F-score (70%). CONCLUSION The generated performance of our text-mining approach demonstrated encouraging results for the identification of targeted information from observational epidemiological study abstracts related to environmental exposures. We have demonstrated that rules based on generic syntactic patterns in one corpus can be applied to other observational study design by simple interchanging the dictionaries aiming to identify certain characteristics (i.e., outcomes, exposures). At the document level, the recognised information can assist in the selection and categorization of studies included in a systematic review.
Collapse
Affiliation(s)
- George Karystianis
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia.
| | - Kristina Thayer
- National Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, NC, USA
| | - Mary Wolfe
- National Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, NC, USA
| | - Guy Tsafnat
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| |
Collapse
|
21
|
Larsson K, Baker S, Silins I, Guo Y, Stenius U, Korhonen A, Berglund M. Text mining for improved exposure assessment. PLoS One 2017; 12:e0173132. [PMID: 28257498 PMCID: PMC5336247 DOI: 10.1371/journal.pone.0173132] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Accepted: 02/15/2017] [Indexed: 01/24/2023] Open
Abstract
Chemical exposure assessments are based on information collected via different methods, such as biomonitoring, personal monitoring, environmental monitoring and questionnaires. The vast amount of chemical-specific exposure information available from web-based databases, such as PubMed, is undoubtedly a great asset to the scientific community. However, manual retrieval of relevant published information is an extremely time consuming task and overviewing the data is nearly impossible. Here, we present the development of an automatic classifier for chemical exposure information. First, nearly 3700 abstracts were manually annotated by an expert in exposure sciences according to a taxonomy exclusively created for exposure information. Natural Language Processing (NLP) techniques were used to extract semantic and syntactic features relevant to chemical exposure text. Using these features, we trained a supervised machine learning algorithm to automatically classify PubMed abstracts according to the exposure taxonomy. The resulting classifier demonstrates good performance in the intrinsic evaluation. We also show that the classifier improves information retrieval of chemical exposure data compared to keyword-based PubMed searches. Case studies demonstrate that the classifier can be used to assist researchers by facilitating information retrieval and classification, enabling data gap recognition and overviewing available scientific literature using chemical-specific publication profiles. Finally, we identify challenges to be addressed in future development of the system.
Collapse
Affiliation(s)
- Kristin Larsson
- Institute of Environmental Medicine, Karolinska Institute, Stockholm, Sweden
| | - Simon Baker
- Computer Laboratory, University of Cambridge, Cambridge, United Kingdom
| | - Ilona Silins
- Institute of Environmental Medicine, Karolinska Institute, Stockholm, Sweden
| | - Yufan Guo
- Computer Laboratory, University of Cambridge, Cambridge, United Kingdom
| | - Ulla Stenius
- Institute of Environmental Medicine, Karolinska Institute, Stockholm, Sweden
| | - Anna Korhonen
- Computer Laboratory, University of Cambridge, Cambridge, United Kingdom
- Language Technology Lab, DTAL, University of Cambridge, Cambridge, United Kingdom
| | - Marika Berglund
- Institute of Environmental Medicine, Karolinska Institute, Stockholm, Sweden
| |
Collapse
|
22
|
Guha N, Guyton KZ, Loomis D, Barupal DK. Prioritizing Chemicals for Risk Assessment Using Chemoinformatics: Examples from the IARC Monographs on Pesticides. ENVIRONMENTAL HEALTH PERSPECTIVES 2016; 124:1823-1829. [PMID: 27164621 PMCID: PMC5132635 DOI: 10.1289/ehp186] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Revised: 03/08/2016] [Accepted: 04/28/2016] [Indexed: 05/27/2023]
Abstract
BACKGROUND Identifying cancer hazards is the first step towards cancer prevention. The International Agency for Research on Cancer (IARC) Monographs Programme, which has evaluated nearly 1,000 agents for their carcinogenic potential since 1971, typically selects agents for hazard identification on the basis of public nominations, expert advice, published data on carcinogenicity, and public health importance. OBJECTIVES Here, we present a novel and complementary strategy for identifying agents for hazard evaluation using chemoinformatics, database integration, and automated text mining. DISCUSSION To inform selection among a broad range of pesticides nominated for evaluation, we identified and screened nearly 6,000 relevant chemical structures, after which we systematically compiled information on 980 pesticides, creating network maps that allowed cluster visualization by chemical similarity, pesticide class, and publicly available information concerning cancer epidemiology, cancer bioassays, and carcinogenic mechanisms. For the IARC Monograph meetings that took place in March and June 2015, this approach supported high-priority evaluation of glyphosate, malathion, parathion, tetrachlorvinphos, diazinon, p,p'-dichlorodiphenyltrichloroethane (DDT), lindane, and 2,4-dichlorophenoxyacetic acid (2,4-D). CONCLUSIONS This systematic approach, accounting for chemical similarity and overlaying multiple data sources, can be used by risk assessors as well as by researchers to systematize, inform, and increase efficiency in selecting and prioritizing agents for hazard identification, risk assessment, regulation, or further investigation. This approach could be extended to an array of outcomes and agents, including occupational carcinogens, drugs, and foods. Citation: Guha N, Guyton KZ, Loomis D, Barupal DK. 2016. Prioritizing chemicals for risk assessment using chemoinformatics: examples from the IARC Monographs on Pesticides. Environ Health Perspect 124:1823-1829; http://dx.doi.org/10.1289/EHP186.
Collapse
Affiliation(s)
- Neela Guha
- International Agency for Research on Cancer (IARC) Monographs Programme, and
| | - Kathryn Z. Guyton
- International Agency for Research on Cancer (IARC) Monographs Programme, and
| | - Dana Loomis
- International Agency for Research on Cancer (IARC) Monographs Programme, and
| | - Dinesh Kumar Barupal
- Section of Nutrition and Metabolism–Biomarkers Group, International Agency for Research on Cancer, Lyon, France
| |
Collapse
|
23
|
Ye Z, Tafti AP, He KY, Wang K, He MM. SparkText: Biomedical Text Mining on Big Data Framework. PLoS One 2016; 11:e0162721. [PMID: 27685652 PMCID: PMC5042555 DOI: 10.1371/journal.pone.0162721] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Accepted: 08/26/2016] [Indexed: 11/18/2022] Open
Abstract
Background Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. Results In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. Conclusions This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.
Collapse
Affiliation(s)
- Zhan Ye
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, 54449, United States of America
| | - Ahmad P Tafti
- Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, WI, 54449, United States of America.,Department of Computer Science, University of Wisconsin-Milwaukee, Milwaukee, WI, 53211, United States of America
| | - Karen Y He
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, 44106, United States of America
| | - Kai Wang
- Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, CA, 90089, United States of America.,Department of Psychiatry, University of Southern California, Los Angeles, CA, 90089, United States of America
| | - Max M He
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, 54449, United States of America.,Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, WI, 54449, United States of America.,Computation and Informatics in Biology and Medicine, University of Wisconsin-Madison, Madison, WI, 53706, United States of America
| |
Collapse
|
24
|
Papamokos G, Silins I. Combining QSAR Modeling and Text-Mining Techniques to Link Chemical Structures and Carcinogenic Modes of Action. Front Pharmacol 2016; 7:284. [PMID: 27625608 PMCID: PMC5003827 DOI: 10.3389/fphar.2016.00284] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2016] [Accepted: 08/18/2016] [Indexed: 12/28/2022] Open
Abstract
There is an increasing need for new reliable non-animal based methods to predict and test toxicity of chemicals. Quantitative structure-activity relationship (QSAR), a computer-based method linking chemical structures with biological activities, is used in predictive toxicology. In this study, we tested the approach to combine QSAR data with literature profiles of carcinogenic modes of action automatically generated by a text-mining tool. The aim was to generate data patterns to identify associations between chemical structures and biological mechanisms related to carcinogenesis. Using these two methods, individually and combined, we evaluated 96 rat carcinogens of the hematopoietic system, liver, lung, and skin. We found that skin and lung rat carcinogens were mainly mutagenic, while the group of carcinogens affecting the hematopoietic system and the liver also included a large proportion of non-mutagens. The automatic literature analysis showed that mutagenicity was a frequently reported endpoint in the literature of these carcinogens, however, less common endpoints such as immunosuppression and hormonal receptor-mediated effects were also found in connection with some of the carcinogens, results of potential importance for certain target organs. The combined approach, using QSAR and text-mining techniques, could be useful for identifying more detailed information on biological mechanisms and the relation with chemical structures. The method can be particularly useful in increasing the understanding of structure and activity relationships for non-mutagens.
Collapse
Affiliation(s)
- George Papamokos
- Department of Physics and School of Engineering and Applied Sciences, Harvard UniversityCambridge, MA, USA; Department of Physics, University of IoanninaIoannina, Greece; Biomedical Research Division, Institute of Molecular Biology and Biotechnology Foundation for Research and TechnologyHeraklion, Greece
| | - Ilona Silins
- Institute of Environmental Medicine, Karolinska Institutet Stockholm, Sweden
| |
Collapse
|
25
|
Ali I, Högberg J, Hsieh JH, Auerbach S, Korhonen A, Stenius U, Silins I. Gender differences in cancer susceptibility: role of oxidative stress. Carcinogenesis 2016; 37:985-992. [PMID: 27481070 DOI: 10.1093/carcin/bgw076] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 07/26/2016] [Indexed: 01/07/2023] Open
Abstract
Cancer is a leading cause of death worldwide and environmental factors, including chemicals, have been suggested as major etiological incitements. Cancer statistics indicates that men get more cancer than women. However, differences in the known risk factors including life style or occupational exposure only offer partial explanation. Using a text mining tool, we have investigated the scientific literature concerning male- and female-specific rat carcinogens that induced tumors only in one gender in NTP 2-year cancer bioassay. Our evaluation shows that oxidative stress, although frequently reported for both male- and female-specific rat carcinogens, was mentioned significantly more in literature concerning male-specific rat carcinogens. Literature analysis of testosterone and estradiol showed the same pattern. Tox21 high-throughput assay results, although showing only weak association of oxidative stress-related processes for male- and female-specific rat carcinogens, provide additional support. We also analyzed the literature concerning 26 established human carcinogens (IARC group 1). Oxidative stress was more frequently reported for the majority of these carcinogens, and the Tox21 data resembled that of male-specific rat carcinogens. Thus, our data, based on about 600000 scientific abstracts and Tox21 screening assays, suggest a link between male-specific carcinogens, testosterone and oxidative stress. This implies that a different cellular response to oxidative stress in men and women may be a critical factor in explaining the greater cancer susceptibility observed in men. Although the IARC carcinogens are classified as human carcinogens, their classification largely based on epidemiological evidence from male cohorts, which raises the question whether carcinogen classifications should be gender specific.
Collapse
Affiliation(s)
| | | | - Jui-Hua Hsieh
- Division of the National Toxicology Program, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina 27709, USA and
| | - Scott Auerbach
- Division of the National Toxicology Program, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina 27709, USA and
| | - Anna Korhonen
- Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge CB3 9DA, UK
| | | | | |
Collapse
|
26
|
Abbe A, Grouin C, Zweigenbaum P, Falissard B. Text mining applications in psychiatry: a systematic literature review. Int J Methods Psychiatr Res 2016; 25:86-100. [PMID: 26184780 PMCID: PMC6877250 DOI: 10.1002/mpr.1481] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Revised: 01/21/2015] [Accepted: 04/09/2015] [Indexed: 11/08/2022] Open
Abstract
The expansion of biomedical literature is creating the need for efficient tools to keep pace with increasing volumes of information. Text mining (TM) approaches are becoming essential to facilitate the automated extraction of useful biomedical information from unstructured text. We reviewed the applications of TM in psychiatry, and explored its advantages and limitations. A systematic review of the literature was carried out using the CINAHL, Medline, EMBASE, PsycINFO and Cochrane databases. In this review, 1103 papers were screened, and 38 were included as applications of TM in psychiatric research. Using TM and content analysis, we identified four major areas of application: (1) Psychopathology (i.e. observational studies focusing on mental illnesses) (2) the Patient perspective (i.e. patients' thoughts and opinions), (3) Medical records (i.e. safety issues, quality of care and description of treatments), and (4) Medical literature (i.e. identification of new scientific information in the literature). The information sources were qualitative studies, Internet postings, medical records and biomedical literature. Our work demonstrates that TM can contribute to complex research tasks in psychiatry. We discuss the benefits, limits, and further applications of this tool in the future. Copyright © 2015 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Adeline Abbe
- Inserm, U669, Paris, France.,University Paris-Sud and University Paris Descartes, UMR-S0669, Paris, France
| | | | | | - Bruno Falissard
- Inserm, U669, Paris, France.,University Paris-Sud and University Paris Descartes, UMR-S0669, Paris, France
| |
Collapse
|
27
|
Ali I, Guo Y, Silins I, Högberg J, Stenius U, Korhonen A. Grouping chemicals for health risk assessment: A text mining-based case study of polychlorinated biphenyls (PCBs). Toxicol Lett 2016; 241:32-7. [PMID: 26562772 DOI: 10.1016/j.toxlet.2015.11.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2015] [Revised: 11/04/2015] [Accepted: 11/04/2015] [Indexed: 12/21/2022]
Abstract
As many chemicals act as carcinogens, chemical health risk assessment is critically important. A notoriously time consuming process, risk assessment could be greatly supported by classifying chemicals with similar toxicological profiles so that they can be assessed in groups rather than individually. We have previously developed a text mining (TM)-based tool that can automatically identify the mode of action (MOA) of a carcinogen based on the scientific evidence in literature, and it can measure the MOA similarity between chemicals on the basis of their literature profiles (Korhonen et al., 2009, 2012). A new version of the tool (2.0) was recently released and here we apply this tool for the first time to investigate and identify meaningful groups of chemicals for risk assessment. We used published literature on polychlorinated biphenyls (PCBs)-persistent, widely spread toxic organic compounds comprising of 209 different congeners. Although chemically similar, these compounds are heterogeneous in terms of MOA. We show that our TM tool, when applied to 1648 PubMed abstracts, produces a MOA profile for a subgroup of dioxin-like PCBs (DL-PCBs) which differs clearly from that for the rest of PCBs. This suggests that the tool could be used to effectively identify homogenous groups of chemicals and, when integrated in real-life risk assessment, could help and significantly improve the efficiency of the process.
Collapse
Affiliation(s)
- Imran Ali
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm SE-171 77, Sweden.
| | - Yufan Guo
- Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge CB3 9DA, UK
| | - Ilona Silins
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm SE-171 77, Sweden
| | - Johan Högberg
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm SE-171 77, Sweden
| | - Ulla Stenius
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm SE-171 77, Sweden
| | - Anna Korhonen
- Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge CB3 9DA, UK
| |
Collapse
|
28
|
Badal VD, Kundrotas PJ, Vakser IA. Text Mining for Protein Docking. PLoS Comput Biol 2015; 11:e1004630. [PMID: 26650466 PMCID: PMC4674139 DOI: 10.1371/journal.pcbi.1004630] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Accepted: 10/29/2015] [Indexed: 11/18/2022] Open
Abstract
The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate. Protein interactions are central for many cellular processes. Physical characterization of these interactions is essential for understanding of life processes and applications in biology and medicine. Because of the inherent limitations of experimental techniques and rapid development of computational power and methodology, computer modeling is a tool of choice in many studies. Publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for modeling of proteins and protein complexes. A major paradigm shift in modeling of protein complexes is emerging due to the rapidly expanding amount of such information, which can be used as modeling constraints. Text mining has been widely used in recreating networks of protein interactions, as well as in detecting small molecule binding sites on proteins. Combining and expanding these two well-developed areas of research, we applied the text mining to physical modeling of protein complexes (protein docking). Our procedure retrieves published abstracts on a protein-protein interaction and extracts the relevant information. The results show that correct information on binding can be obtained for about half of protein complexes. The extracted constraints were incorporated in a modeling procedure, significantly improving its performance.
Collapse
Affiliation(s)
- Varsha D. Badal
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, United States of America
| | - Petras J. Kundrotas
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, United States of America
- * E-mail: (IAV); (PJK)
| | - Ilya A. Vakser
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, United States of America
- Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, United States of America
- * E-mail: (IAV); (PJK)
| |
Collapse
|
29
|
Baker S, Silins I, Guo Y, Ali I, Högberg J, Stenius U, Korhonen A. Automatic semantic classification of scientific literature according to the hallmarks of cancer. Bioinformatics 2015; 32:432-40. [PMID: 26454282 DOI: 10.1093/bioinformatics/btv585] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Accepted: 09/28/2015] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION The hallmarks of cancer have become highly influential in cancer research. They reduce the complexity of cancer into 10 principles (e.g. resisting cell death and sustaining proliferative signaling) that explain the biological capabilities acquired during the development of human tumors. Since new research depends crucially on existing knowledge, technology for semantic classification of scientific literature according to the hallmarks of cancer could greatly support literature review, knowledge discovery and applications in cancer research. RESULTS We present the first step toward the development of such technology. We introduce a corpus of 1499 PubMed abstracts annotated according to the scientific evidence they provide for the 10 currently known hallmarks of cancer. We use this corpus to train a system that classifies PubMed literature according to the hallmarks. The system uses supervised machine learning and rich features largely based on biomedical text mining. We report good performance in both intrinsic and extrinsic evaluations, demonstrating both the accuracy of the methodology and its potential in supporting practical cancer research. We discuss how this approach could be developed and applied further in the future. AVAILABILITY AND IMPLEMENTATION The corpus of hallmark-annotated PubMed abstracts and the software for classification are available at: http://www.cl.cam.ac.uk/∼sb895/HoC.html. CONTACT simon.baker@cl.cam.ac.uk.
Collapse
Affiliation(s)
- Simon Baker
- Computer Laboratory, University of Cambridge, Cambridge, UK
| | - Ilona Silins
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden and
| | - Yufan Guo
- Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge, UK
| | - Imran Ali
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden and
| | - Johan Högberg
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden and
| | - Ulla Stenius
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden and
| | - Anna Korhonen
- Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge, UK
| |
Collapse
|
30
|
Abdelillah A, Houcine B, Halima D, Meriem CS, Imane Z, Eddine SD, Abdallah M, Daoudi CS. Evaluation of antifungal activity of free fatty acids methyl esters fraction isolated from Algerian Linum usitatissimum L. seeds against toxigenic Aspergillus. Asian Pac J Trop Biomed 2015; 3:443-8. [PMID: 23730556 DOI: 10.1016/s2221-1691(13)60094-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2013] [Accepted: 05/20/2013] [Indexed: 10/27/2022] Open
Abstract
OBJECTIVE The aim of this study was to evaluate the antifungal activity of the major fraction of fatty acids methyl esters (FAMEs) isolated from Linum usitatissimum L. seeds oil collected from Bechar department (Algeria). METHODS The assessment of antifungal activity was carried out in terms of percentage of radial growth on solid medium (potatoes dextrose agar PDA) and biomass growth inhibition on liquid medium (potatoes dextrose broth PDB) against two fungi. RESULTS The FAMEs was found to be effective in inhibiting the radial mycelial growth of Aspergillus flavus more than Aspergillus ochraceus on all tested concentrations. The highest antifungal index was found to be (54.19%) compared to Aspergillus ochraceus (40.48%). The results of the antifungal activity of the FAMEs inhibition of biomass on liquid medium gave no discounted results, but this does not exclude the antifungal activity. CONCLUSIONS We can assume that the observed antifungal potency may be due to the abundance of linoleic and α-linolenic acids in linseed oil which appears to be promising to treat fungal infections, storage fungi and food spoilage in food industry field.
Collapse
Affiliation(s)
- Amrouche Abdelillah
- Laboratory of Natural Products Research (LAPRONA) University of Tlemcen, Algeria ; Laboratory of Plant Resource Development and Food Security in Semi Arid Areas, South West of Algeria, University of Bechar, Algeria
| | | | | | | | | | | | | | | |
Collapse
|
31
|
Text mining of cancer-related information: review of current status and future directions. Int J Med Inform 2014; 83:605-23. [PMID: 25008281 DOI: 10.1016/j.ijmedinf.2014.06.009] [Citation(s) in RCA: 112] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2013] [Revised: 06/12/2014] [Accepted: 06/14/2014] [Indexed: 12/21/2022]
Abstract
PURPOSE This paper reviews the research literature on text mining (TM) with the aim to find out (1) which cancer domains have been the subject of TM efforts, (2) which knowledge resources can support TM of cancer-related information and (3) to what extent systems that rely on knowledge and computational methods can convert text data into useful clinical information. These questions were used to determine the current state of the art in this particular strand of TM and suggest future directions in TM development to support cancer research. METHODS A review of the research on TM of cancer-related information was carried out. A literature search was conducted on the Medline database as well as IEEE Xplore and ACM digital libraries to address the interdisciplinary nature of such research. The search results were supplemented with the literature identified through Google Scholar. RESULTS A range of studies have proven the feasibility of TM for extracting structured information from clinical narratives such as those found in pathology or radiology reports. In this article, we provide a critical overview of the current state of the art for TM related to cancer. The review highlighted a strong bias towards symbolic methods, e.g. named entity recognition (NER) based on dictionary lookup and information extraction (IE) relying on pattern matching. The F-measure of NER ranges between 80% and 90%, while that of IE for simple tasks is in the high 90s. To further improve the performance, TM approaches need to deal effectively with idiosyncrasies of the clinical sublanguage such as non-standard abbreviations as well as a high degree of spelling and grammatical errors. This requires a shift from rule-based methods to machine learning following the success of similar trends in biological applications of TM. Machine learning approaches require large training datasets, but clinical narratives are not readily available for TM research due to privacy and confidentiality concerns. This issue remains the main bottleneck for progress in this area. In addition, there is a need for a comprehensive cancer ontology that would enable semantic representation of textual information found in narrative reports.
Collapse
|
32
|
Silins I, Korhonen A, Stenius U. Evaluation of carcinogenic modes of action for pesticides in fruit on the Swedish market using a text-mining tool. Front Pharmacol 2014; 5:145. [PMID: 25002848 PMCID: PMC4066588 DOI: 10.3389/fphar.2014.00145] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2014] [Accepted: 06/02/2014] [Indexed: 12/16/2022] Open
Abstract
Toxicity caused by chemical mixtures has emerged as a significant challenge for toxicologists and risk assessors. Information on individual chemicals' modes of action is an important part of the hazard identification step. In this study, an automatic text mining-based tool was employed as a method to identify the carcinogenic modes of action of pesticides frequently found in fruit on the Swedish market. The current available scientific literature on the 26 most common pesticides found in apples and oranges was evaluated. The literature was classified according to a taxonomy that specifies the main type of scientific evidence used for determining carcinogenic properties of chemicals. The publication profiles of many pesticides were similar, containing evidence for both genotoxic and non-genotoxic modes of action, including effects such as oxidative stress, chromosomal changes and cell proliferation. We also found that 18 of the 26 pesticides studied here had previously caused tumors in at least one animal species, findings which support the mode of action data. This study shows how a text-mining tool could be used to identify carcinogenic modes of action for a group of chemicals in large quantities of text. This strategy could support the risk assessment process of chemical mixtures.
Collapse
Affiliation(s)
- Ilona Silins
- Institute of Environmental Medicine, Karolinska Institutet Stockholm, Sweden ; Computer Laboratory, University of Cambridge Cambridge, UK
| | - Anna Korhonen
- Computer Laboratory, University of Cambridge Cambridge, UK
| | - Ulla Stenius
- Institute of Environmental Medicine, Karolinska Institutet Stockholm, Sweden
| |
Collapse
|
33
|
Biomedical text mining and its applications in cancer research. J Biomed Inform 2013; 46:200-11. [DOI: 10.1016/j.jbi.2012.10.007] [Citation(s) in RCA: 159] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2012] [Revised: 10/30/2012] [Accepted: 10/30/2012] [Indexed: 11/21/2022]
|
34
|
Tamaddoni-Nezhad A, Milani GA, Raybould A, Muggleton S, Bohan DA. Construction and Validation of Food Webs Using Logic-Based Machine Learning and Text Mining. ADV ECOL RES 2013. [DOI: 10.1016/b978-0-12-420002-9.00004-4] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
35
|
Kadekar S, Silins I, Korhonen A, Dreij K, Al-Anati L, Högberg J, Stenius U. Exocrine pancreatic carcinogenesis and autotaxin expression. PLoS One 2012; 7:e43209. [PMID: 22952646 PMCID: PMC3430650 DOI: 10.1371/journal.pone.0043209] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2011] [Accepted: 07/18/2012] [Indexed: 12/12/2022] Open
Abstract
Exocrine pancreatic cancer is an aggressive disease with an exceptionally high mortality rate. Genetic analysis suggests a causative role for environmental factors, but consistent epidemiological support is scarce and no biomarkers for monitoring the effects of chemical pancreatic carcinogens are available. With the objective to identify common traits for chemicals inducing pancreatic tumors we studied the National Toxicology Program (NTP) bioassay database. We found that male rats were affected more often than female rats and identified eight chemicals that induced exocrine pancreatic tumors in males only. For a hypothesis generating process we used a text mining tool to analyse published literature for suggested mode of actions (MOA). The resulting MOA analysis suggested inflammatory responses as common feature. In cell studies we found that all the chemicals increased protein levels of the inflammatory protein autotaxin (ATX) in Panc-1, MIA PaCa-2 or Capan-2 cells. Induction of MMP-9 and increased invasive migration were also frequent effects, consistent with ATX activation. Testosterone has previously been implicated in pancreatic carcinogenesis and we found that it increased ATX levels. Our data show that ATX is a target for chemicals inducing pancreatic tumors in rats. Several lines of evidence implicate ATX and its product lysophosphatidic acid in human pancreatic cancer. Mechanisms of action may include stimulated invasive growth and metastasis. ATX may interact with hormones or onco- or suppressor-genes often deregulated in exocrine pancreatic cancer. Our data suggest that ATX is a target for chemicals promoting pancreatic tumor development.
Collapse
Affiliation(s)
- Sandeep Kadekar
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Ilona Silins
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
- * E-mail:
| | - Anna Korhonen
- Computer Laboratory, University of Cambridge, Cambridge, United Kingdom
| | - Kristian Dreij
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Lauy Al-Anati
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Johan Högberg
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Ulla Stenius
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|