Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Korhonen A, Séaghdha DO, Silins I, Sun L, Högberg J, Stenius U. Text mining for literature review and knowledge discovery in cancer risk assessment and research. PLoS One 2012;7:e33427. [PMID: 22511921 PMCID: PMC3325219 DOI: 10.1371/journal.pone.0033427] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2011] [Accepted: 02/08/2012] [Indexed: 12/14/2022] Open

For:	Korhonen A, Séaghdha DO, Silins I, Sun L, Högberg J, Stenius U. Text mining for literature review and knowledge discovery in cancer risk assessment and research. PLoS One 2012;7:e33427. [PMID: 22511921 PMCID: PMC3325219 DOI: 10.1371/journal.pone.0033427] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2011] [Accepted: 02/08/2012] [Indexed: 12/14/2022] Open

Number

Cited by Other Article(s)

Högberg J, Järnberg J. Approaches for the setting of occupational exposure limits (OELs) for carcinogens. Crit Rev Toxicol 2023:1-37. [PMID: 37366107 DOI: 10.1080/10408444.2023.2218887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 05/12/2023] [Accepted: 05/22/2023] [Indexed: 06/28/2023]

A Narrative Literature Review of Natural Language Processing Applied to the Occupational Exposome. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022;19:ijerph19148544. [PMID: 35886395 PMCID: PMC9316260 DOI: 10.3390/ijerph19148544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 07/07/2022] [Accepted: 07/11/2022] [Indexed: 02/05/2023]

Using semantics to scale up evidence-based chemical risk-assessments. PLoS One 2021;16:e0260712. [PMID: 34910747 PMCID: PMC8673667 DOI: 10.1371/journal.pone.0260712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 11/15/2021] [Indexed: 11/19/2022] Open

Abstract

BACKGROUND

The manual processes used for risk assessments are not scaling to the amount of data available. Although automated approaches appear promising, they must be transparent in a public policy setting.

OBJECTIVE

Our goal is to create an automated approach that moves beyond retrieval to the extraction step of the information synthesis process, where evidence is characterized as supporting, refuting, or neutral with respect to a given outcome.

METHODS

We combine knowledge resources and natural language processing to resolve coordinated ellipses and thus avoid surface level differences between concepts in an ontology and outcomes in an abstract. As with a systematic review, the search criterion, and inclusion and exclusion criterion are explicit.

RESULTS

The system scales to 482K abstracts on 27 chemicals. Results for three endpoints that are critical for cancer risk assessments show that refuting evidence (where the outcome decreased) was higher for cell proliferation (45.9%), and general cell changes (37.7%) than for cell death (25.0%). Moreover, cell death was the only end point where supporting claims were the majority (61.3%). If the number of abstracts that measure an outcome was used as a proxy for association there would be a stronger association with cell proliferation than cell death (20/27 chemicals). However, if the amount of supporting evidence was used (where the outcome increased) the conclusion would change for 21/27 chemicals (20 from proliferation to death and 1 from death to proliferation).

CONCLUSIONS

We provide decision makers with a visual representation of supporting, neutral, and refuting evidence whilst maintaining the reproducibility and transparency needed for public policy. Our findings show that results from the retrieval step where the number of abstracts that measure an outcome are reported can be misleading if not accompanied with results from the extraction step where the directionality of the outcome is established.

Collapse

Zengul AG, Zengul FD, Ozaydin B, Oner N, Fiveash JB. Identifying research themes and trends in the top 20 cancer journals through textual analysis. J Cancer Policy 2021;30:100313. [DOI: 10.1016/j.jcpo.2021.100313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 09/28/2021] [Accepted: 10/27/2021] [Indexed: 11/28/2022]

Conceição SIR, Couto FM. Text Mining for Building Biomedical Networks Using Cancer as a Case Study. Biomolecules 2021;11:biom11101430. [PMID: 34680062 PMCID: PMC8533101 DOI: 10.3390/biom11101430] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 09/24/2021] [Accepted: 09/27/2021] [Indexed: 12/15/2022] Open

Ali I, Dreij K, Baker S, Högberg J, Korhonen A, Stenius U. Application of Text Mining in Risk Assessment of Chemical Mixtures: A Case Study of Polycyclic Aromatic Hydrocarbons (PAHs). ENVIRONMENTAL HEALTH PERSPECTIVES 2021;129:67008. [PMID: 34165340 PMCID: PMC8318069 DOI: 10.1289/ehp6702] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Revised: 05/07/2021] [Accepted: 05/10/2021] [Indexed: 05/08/2023]

Abstract

BACKGROUND

Cancer risk assessment of complex exposures, such as exposure to mixtures of polycyclic aromatic hydrocarbons (PAHs), is challenging due to the diverse biological activities of these compounds. With the help of text mining (TM), we have developed TM tools-the latest iteration of the Cancer Risk Assessment using Biomedical literature tool (CRAB3) and a Cancer Hallmarks Analytics Tool (CHAT)-that could be useful for automatic literature analyses in cancer risk assessment and research. Although CRAB3 analyses are based on carcinogenic modes of action (MOAs) and cover almost all the key characteristics of carcinogens, CHAT evaluates literature according to the hallmarks of cancer referring to the alterations in cellular behavior that characterize the cancer cell.

OBJECTIVES

The objective was to evaluate the usefulness of these tools to support cancer risk assessment by performing a case study of 22 European Union and U.S. Environmental Protection Agency priority PAHs and diesel exhaust and a case study of PAH interactions with silica.

METHODS

We analyzed PubMed literature, comprising 57,498 references concerning priority PAHs and complex PAH mixtures, using CRAB3 and CHAT.

RESULTS

CRAB3 analyses correctly identified similarities and differences in genotoxic and nongenotoxic MOAs of the 22 priority PAHs and grouped them according to their known carcinogenic potential. CHAT had the same capacity and complemented the CRAB output when comparing, for example, benzo[a]pyrene and dibenzo[a,l]pyrene. Both CRAB3 and CHAT analyses highlighted potentially interacting mechanisms within and across complex PAH mixtures and mechanisms of possible importance for interactions with silica.

CONCLUSION

These data suggest that our TM approach can be useful in the hazard identification of PAHs and mixtures including PAHs. The tools can assist in grouping chemicals and identifying similarities and differences in carcinogenic MOAs and their interactions. https://doi.org/10.1289/EHP6702.

Collapse

Ding Z, Liu R, Yuan H. A text mining-based thematic model for analyzing construction and demolition waste management studies. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2021;28:30499-30527. [PMID: 33905057 DOI: 10.1007/s11356-021-13989-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 04/13/2021] [Indexed: 06/12/2023]

Ovalle A, Goldstein O, Kachuee M, Wu ESC, Hong C, Holloway IW, Sarrafzadeh M. Leveraging Social Media Activity and Machine Learning for HIV and Substance Abuse Risk Assessment: Development and Validation Study. J Med Internet Res 2021;23:e22042. [PMID: 33900200 PMCID: PMC8111510 DOI: 10.2196/22042] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 11/25/2020] [Accepted: 01/31/2021] [Indexed: 01/08/2023] Open

Abstract

Background

Social media networks provide an abundance of diverse information that can be leveraged for data-driven applications across various social and physical sciences. One opportunity to utilize such data exists in the public health domain, where data collection is often constrained by organizational funding and limited user adoption. Furthermore, the efficacy of health interventions is often based on self-reported data, which are not always reliable. Health-promotion strategies for communities facing multiple vulnerabilities, such as men who have sex with men, can benefit from an automated system that not only determines health behavior risk but also suggests appropriate intervention targets.

Objective

This study aims to determine the value of leveraging social media messages to identify health risk behavior for men who have sex with men.

Methods

The Gay Social Networking Analysis Program was created as a preliminary framework for intelligent web-based health-promotion intervention. The program consisted of a data collection system that automatically gathered social media data, health questionnaires, and clinical results for sexually transmitted diseases and drug tests across 51 participants over 3 months. Machine learning techniques were utilized to assess the relationship between social media messages and participants' offline sexual health and substance use biological outcomes. The F1 score, a weighted average of precision and recall, was used to evaluate each algorithm. Natural language processing techniques were employed to create health behavior risk scores from participant messages.

Results

Offline HIV, amphetamine, and methamphetamine use were correctly identified using only social media data, with machine learning models obtaining F1 scores of 82.6%, 85.9%, and 85.3%, respectively. Additionally, constructed risk scores were found to be reasonably comparable to risk scores adapted from the Center for Disease Control.

Conclusions

To our knowledge, our study is the first empirical evaluation of a social media–based public health intervention framework for men who have sex with men. We found that social media data were correlated with offline sexual health and substance use, verified through biological testing. The proof of concept and initial results validate that public health interventions can indeed use social media–based systems to successfully determine offline health risk behaviors. The findings demonstrate the promise of deploying a social media–based just-in-time adaptive intervention to target substance use and HIV risk behavior.

Collapse

Acheson E, Purves RS. Extracting and modeling geographic information from scientific articles. PLoS One 2021;16:e0244918. [PMID: 33406109 PMCID: PMC7787447 DOI: 10.1371/journal.pone.0244918] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Accepted: 12/20/2020] [Indexed: 11/29/2022] Open

A Thematic Network-Based Methodology for the Research Trend Identification in Building Energy Management. ENERGIES 2020. [DOI: 10.3390/en13184621] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Wang N, Yang Y, Pang M, Du C, Chen Y, Li S, Tian Z, Feng F, Wang Y, Chen Z, Liu B, Rong L. MicroRNA-135a-5p Promotes the Functional Recovery of Spinal Cord Injury by Targeting SP1 and ROCK. MOLECULAR THERAPY. NUCLEIC ACIDS 2020;22:1063-1077. [PMID: 33294293 PMCID: PMC7691148 DOI: 10.1016/j.omtn.2020.08.035] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Accepted: 08/28/2020] [Indexed: 01/18/2023]

Affiliation(s)

Nanxiang Wang Department of Spine Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
Yang Yang Department of Spine Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
Mao Pang Department of Spine Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
Cong Du Cell-Gene Therapy Translational Medicine Research Center, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
Yuyong Chen Department of Spine Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
Simin Li Department of Cariology, Endodontology and Periodontology, University Leipzig, Liebigstrasse 12, 04103 Leipzig, Germany
Zhenming Tian Department of Spine Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
Feng Feng Department of Spine Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
Yang Wang Department of Spine Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
Zhenxiang Chen Department of Spine Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
Bin Liu Department of Spine Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China
Limin Rong Department of Spine Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, No. 600 Tianhe Road, Tianhe District, Guangzhou, Guangdong Province, People's Republic of China

Collapse

DES-ROD: Exploring Literature to Develop New Links between RNA Oxidation and Human Diseases. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2020;2020:5904315. [PMID: 32308806 PMCID: PMC7142358 DOI: 10.1155/2020/5904315] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 02/21/2020] [Indexed: 12/27/2022]

Abstract

Normal cellular physiology and biochemical processes require undamaged RNA molecules. However, RNAs are frequently subjected to oxidative damage. Overproduction of reactive oxygen species (ROS) leads to RNA oxidation and disturbs redox (oxidation-reduction reaction) homeostasis. When oxidation damage affects RNA carrying protein-coding information, this may result in the synthesis of aberrant proteins as well as a lower efficiency of translation. Both of these, as well as imbalanced redox homeostasis, may lead to numerous human diseases. The number of studies on the effects of RNA oxidative damage in mammals is increasing by year due to the understanding that this oxidation fundamentally leads to numerous human diseases. To enable researchers in this field to explore information relevant to RNA oxidation and effects on human diseases, we developed DES-ROD, an online knowledgebase that contains processed information from 298,603 relevant documents that consist of PubMed abstracts and PubMed Central full-text articles. The system utilizes concepts/terms from 38 curated thematic dictionaries mapped to the analyzed documents. Researchers can explore enriched concepts, as well as enriched pairs of putatively associated concepts. In this way, one can explore mutual relationships between any combinations of two concepts from used dictionaries. Dictionaries cover a wide range of biomedical topics, such as human genes and proteins, pathways, Gene Ontology categories, mutations, noncoding RNAs, enzymes, toxins, metabolites, and diseases. This makes insights into different facets of the effects of RNA oxidation and the control of this process possible. The usefulness of the DES-ROD system is demonstrated by case studies on some known information, as well as potentially novel information involving RNA oxidation and diseases. DES-ROD is the first knowledgebase based on text and data mining that focused on the exploration of RNA oxidation and human diseases.

Collapse

Wittwehr C, Blomstedt P, Gosling JP, Peltola T, Raffael B, Richarz AN, Sienkiewicz M, Whaley P, Worth A, Whelan M. Artificial Intelligence for chemical risk assessment. ACTA ACUST UNITED AC 2020;13:100114. [PMID: 32140631 PMCID: PMC7043333 DOI: 10.1016/j.comtox.2019.100114] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Revised: 09/10/2019] [Accepted: 11/25/2019] [Indexed: 02/03/2023]

Barrón Cuenca J, Tirado N, Barral J, Ali I, Levi M, Stenius U, Berglund M, Dreij K. Increased levels of genotoxic damage in a Bolivian agricultural population exposed to mixtures of pesticides. THE SCIENCE OF THE TOTAL ENVIRONMENT 2019;695:133942. [PMID: 31756860 DOI: 10.1016/j.scitotenv.2019.133942] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Revised: 08/13/2019] [Accepted: 08/14/2019] [Indexed: 05/25/2023]

Abstract

During the past decades, farmers in low to middle-income countries have increased their use of pesticides, and thereby the risk of being exposed to potentially genotoxic chemicals that can cause adverse health effects. Here, the aim was to investigate the correlation between exposure to pesticides and genotoxic damage in a Bolivian agricultural population. Genotoxic effects were assessed in peripheral blood samples by comet and micronucleus (MN) assays, and exposure levels by measurements of 10 urinary pesticide metabolites. Genetic susceptibility was assessed by determination of null frequency of GSTM1 and GSTT1 genotypes. The results showed higher MN frequency in women and farmers active ≥8 years compared to their counterpart (P < 0.05). In addition, age, GST genotype, alcohol consumption, and type of water source influenced levels of genotoxic damage. Individuals with high exposure to tebuconazole, 2,4-D, or cyfluthrin displayed increased levels of genotoxic damage (P < 0.05-0.001). Logistic regression was conducted to evaluate associations between pesticide exposure and risk of genotoxic damage. After adjustment for confounders, a significant increased risk of DNA strand breaks was found for high exposure to 2,4-D, odds ratio (OR) = 1.99 (P < 0.05). In contrast, high exposure to pyrethroids was associated with a reduced risk of DNA strand breaks, OR = 0.49 (P < 0.05). It was also found that high exposure to certain mixtures of pesticides (containing mainly 2,4-D or cyfluthrin) was significantly associated with increased level and risk of genotoxic damage (P < 0.05). In conclusion, our data show that high exposure levels to some pesticides is associated with an increased risk of genotoxic damage among Bolivian farmers, suggesting that their use should be better controlled or limited.

Collapse

ProtFus: A Comprehensive Method Characterizing Protein-Protein Interactions of Fusion Proteins. PLoS Comput Biol 2019;15:e1007239. [PMID: 31437145 PMCID: PMC6705771 DOI: 10.1371/journal.pcbi.1007239] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Accepted: 07/03/2019] [Indexed: 01/10/2023] Open

Abstract

Tailored therapy aims to cure cancer patients effectively and safely, based on the complex interactions between patients' genomic features, disease pathology and drug metabolism. Thus, the continual increase in scientific literature drives the need for efficient methods of data mining to improve the extraction of useful information from texts based on patients' genomic features. An important application of text mining to tailored therapy in cancer encompasses the use of mutations and cancer fusion genes as moieties that change patients' cellular networks to develop cancer, and also affect drug metabolism. Fusion proteins, which are derived from the slippage of two parental genes, are produced in cancer by chromosomal aberrations and trans-splicing. Given that the two parental proteins for predicted fusion proteins are known, we used our previously developed method for identifying chimeric protein-protein interactions (ChiPPIs) associated with the fusion proteins. Here, we present a validation approach that receives fusion proteins of interest, predicts their cellular network alterations by ChiPPI and validates them by our new method, ProtFus, using an online literature search. This process resulted in a set of 358 fusion proteins and their corresponding protein interactions, as a training set for a Naïve Bayes classifier, to identify predicted fusion proteins that have reliable evidence in the literature and that were confirmed experimentally. Next, for a test group of 1817 fusion proteins, we were able to identify from the literature 2908 PPIs in total, across 18 cancer types. The described method, ProtFus, can be used for screening the literature to identify unique cases of fusion proteins and their PPIs, as means of studying alterations of protein networks in cancers. Availability: http://protfus.md.biu.ac.il/.

Collapse

Literature-Based Enrichment Insights into Redox Control of Vascular Biology. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2019;2019:1769437. [PMID: 31223421 PMCID: PMC6542245 DOI: 10.1155/2019/1769437] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 04/11/2019] [Accepted: 05/02/2019] [Indexed: 02/07/2023]

Ferrández A, Peral J. MergedTrie: Efficient textual indexing. PLoS One 2019;14:e0215288. [PMID: 31013282 PMCID: PMC6478299 DOI: 10.1371/journal.pone.0215288] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Accepted: 03/30/2019] [Indexed: 11/18/2022] Open

Azam MF, Musa A, Dehmer M, Yli-Harja OP, Emmert-Streib F. Global Genetics Research in Prostate Cancer: A Text Mining and Computational Network Theory Approach. Front Genet 2019;10:70. [PMID: 30838019 PMCID: PMC6383410 DOI: 10.3389/fgene.2019.00070] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2018] [Accepted: 01/28/2019] [Indexed: 11/13/2022] Open

Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 2017;117:7673-7761. [PMID: 28475312 DOI: 10.1021/acs.chemrev.6b00851] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]

Karystianis G, Thayer K, Wolfe M, Tsafnat G. Evaluation of a rule-based method for epidemiological document classification towards the automation of systematic reviews. J Biomed Inform 2017;70:27-34. [PMID: 28455150 DOI: 10.1016/j.jbi.2017.04.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Revised: 03/14/2017] [Accepted: 04/02/2017] [Indexed: 02/02/2023]

Abstract

INTRODUCTION

Most data extraction efforts in epidemiology are focused on obtaining targeted information from clinical trials. In contrast, limited research has been conducted on the identification of information from observational studies, a major source for human evidence in many fields, including environmental health. The recognition of key epidemiological information (e.g., exposures) through text mining techniques can assist in the automation of systematic reviews and other evidence summaries.

METHOD

We designed and applied a knowledge-driven, rule-based approach to identify targeted information (study design, participant population, exposure, outcome, confounding factors, and the country where the study was conducted) from abstracts of epidemiological studies included in several systematic reviews of environmental health exposures. The rules were based on common syntactical patterns observed in text and are thus not specific to any systematic review. To validate the general applicability of our approach, we compared the data extracted using our approach versus hand curation for 35 epidemiological study abstracts manually selected for inclusion in two systematic reviews.

RESULTS

The returned F-score, precision, and recall ranged from 70% to 98%, 81% to 100%, and 54% to 97%, respectively. The highest precision was observed for exposure, outcome and population (100%) while recall was best for exposure and study design with 97% and 89%, respectively. The lowest recall was observed for the population (54%), which also had the lowest F-score (70%).

CONCLUSION

The generated performance of our text-mining approach demonstrated encouraging results for the identification of targeted information from observational epidemiological study abstracts related to environmental exposures. We have demonstrated that rules based on generic syntactic patterns in one corpus can be applied to other observational study design by simple interchanging the dictionaries aiming to identify certain characteristics (i.e., outcomes, exposures). At the document level, the recognised information can assist in the selection and categorization of studies included in a systematic review.

Collapse

Larsson K, Baker S, Silins I, Guo Y, Stenius U, Korhonen A, Berglund M. Text mining for improved exposure assessment. PLoS One 2017;12:e0173132. [PMID: 28257498 PMCID: PMC5336247 DOI: 10.1371/journal.pone.0173132] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Accepted: 02/15/2017] [Indexed: 01/24/2023] Open

Guha N, Guyton KZ, Loomis D, Barupal DK. Prioritizing Chemicals for Risk Assessment Using Chemoinformatics: Examples from the IARC Monographs on Pesticides. ENVIRONMENTAL HEALTH PERSPECTIVES 2016;124:1823-1829. [PMID: 27164621 PMCID: PMC5132635 DOI: 10.1289/ehp186] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Revised: 03/08/2016] [Accepted: 04/28/2016] [Indexed: 05/27/2023]

Abstract

BACKGROUND

Identifying cancer hazards is the first step towards cancer prevention. The International Agency for Research on Cancer (IARC) Monographs Programme, which has evaluated nearly 1,000 agents for their carcinogenic potential since 1971, typically selects agents for hazard identification on the basis of public nominations, expert advice, published data on carcinogenicity, and public health importance.

OBJECTIVES

Here, we present a novel and complementary strategy for identifying agents for hazard evaluation using chemoinformatics, database integration, and automated text mining.

DISCUSSION

To inform selection among a broad range of pesticides nominated for evaluation, we identified and screened nearly 6,000 relevant chemical structures, after which we systematically compiled information on 980 pesticides, creating network maps that allowed cluster visualization by chemical similarity, pesticide class, and publicly available information concerning cancer epidemiology, cancer bioassays, and carcinogenic mechanisms. For the IARC Monograph meetings that took place in March and June 2015, this approach supported high-priority evaluation of glyphosate, malathion, parathion, tetrachlorvinphos, diazinon, p,p'-dichlorodiphenyltrichloroethane (DDT), lindane, and 2,4-dichlorophenoxyacetic acid (2,4-D).

CONCLUSIONS

This systematic approach, accounting for chemical similarity and overlaying multiple data sources, can be used by risk assessors as well as by researchers to systematize, inform, and increase efficiency in selecting and prioritizing agents for hazard identification, risk assessment, regulation, or further investigation. This approach could be extended to an array of outcomes and agents, including occupational carcinogens, drugs, and foods. Citation: Guha N, Guyton KZ, Loomis D, Barupal DK. 2016. Prioritizing chemicals for risk assessment using chemoinformatics: examples from the IARC Monographs on Pesticides. Environ Health Perspect 124:1823-1829; http://dx.doi.org/10.1289/EHP186.

Collapse

Ye Z, Tafti AP, He KY, Wang K, He MM. SparkText: Biomedical Text Mining on Big Data Framework. PLoS One 2016;11:e0162721. [PMID: 27685652 PMCID: PMC5042555 DOI: 10.1371/journal.pone.0162721] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Accepted: 08/26/2016] [Indexed: 11/18/2022] Open

Papamokos G, Silins I. Combining QSAR Modeling and Text-Mining Techniques to Link Chemical Structures and Carcinogenic Modes of Action. Front Pharmacol 2016;7:284. [PMID: 27625608 PMCID: PMC5003827 DOI: 10.3389/fphar.2016.00284] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2016] [Accepted: 08/18/2016] [Indexed: 12/28/2022] Open

Ali I, Högberg J, Hsieh JH, Auerbach S, Korhonen A, Stenius U, Silins I. Gender differences in cancer susceptibility: role of oxidative stress. Carcinogenesis 2016;37:985-992. [PMID: 27481070 DOI: 10.1093/carcin/bgw076] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 07/26/2016] [Indexed: 01/07/2023] Open

Abstract

Cancer is a leading cause of death worldwide and environmental factors, including chemicals, have been suggested as major etiological incitements. Cancer statistics indicates that men get more cancer than women. However, differences in the known risk factors including life style or occupational exposure only offer partial explanation. Using a text mining tool, we have investigated the scientific literature concerning male- and female-specific rat carcinogens that induced tumors only in one gender in NTP 2-year cancer bioassay. Our evaluation shows that oxidative stress, although frequently reported for both male- and female-specific rat carcinogens, was mentioned significantly more in literature concerning male-specific rat carcinogens. Literature analysis of testosterone and estradiol showed the same pattern. Tox21 high-throughput assay results, although showing only weak association of oxidative stress-related processes for male- and female-specific rat carcinogens, provide additional support. We also analyzed the literature concerning 26 established human carcinogens (IARC group 1). Oxidative stress was more frequently reported for the majority of these carcinogens, and the Tox21 data resembled that of male-specific rat carcinogens. Thus, our data, based on about 600000 scientific abstracts and Tox21 screening assays, suggest a link between male-specific carcinogens, testosterone and oxidative stress. This implies that a different cellular response to oxidative stress in men and women may be a critical factor in explaining the greater cancer susceptibility observed in men. Although the IARC carcinogens are classified as human carcinogens, their classification largely based on epidemiological evidence from male cohorts, which raises the question whether carcinogen classifications should be gender specific.

Collapse

Abbe A, Grouin C, Zweigenbaum P, Falissard B. Text mining applications in psychiatry: a systematic literature review. Int J Methods Psychiatr Res 2016;25:86-100. [PMID: 26184780 PMCID: PMC6877250 DOI: 10.1002/mpr.1481] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Revised: 01/21/2015] [Accepted: 04/09/2015] [Indexed: 11/08/2022] Open

Ali I, Guo Y, Silins I, Högberg J, Stenius U, Korhonen A. Grouping chemicals for health risk assessment: A text mining-based case study of polychlorinated biphenyls (PCBs). Toxicol Lett 2016;241:32-7. [PMID: 26562772 DOI: 10.1016/j.toxlet.2015.11.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2015] [Revised: 11/04/2015] [Accepted: 11/04/2015] [Indexed: 12/21/2022]

Badal VD, Kundrotas PJ, Vakser IA. Text Mining for Protein Docking. PLoS Comput Biol 2015;11:e1004630. [PMID: 26650466 PMCID: PMC4674139 DOI: 10.1371/journal.pcbi.1004630] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Accepted: 10/29/2015] [Indexed: 11/18/2022] Open

Abstract

The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate.

Protein interactions are central for many cellular processes. Physical characterization of these interactions is essential for understanding of life processes and applications in biology and medicine. Because of the inherent limitations of experimental techniques and rapid development of computational power and methodology, computer modeling is a tool of choice in many studies. Publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for modeling of proteins and protein complexes. A major paradigm shift in modeling of protein complexes is emerging due to the rapidly expanding amount of such information, which can be used as modeling constraints. Text mining has been widely used in recreating networks of protein interactions, as well as in detecting small molecule binding sites on proteins. Combining and expanding these two well-developed areas of research, we applied the text mining to physical modeling of protein complexes (protein docking). Our procedure retrieves published abstracts on a protein-protein interaction and extracts the relevant information. The results show that correct information on binding can be obtained for about half of protein complexes. The extracted constraints were incorporated in a modeling procedure, significantly improving its performance.

Collapse

Baker S, Silins I, Guo Y, Ali I, Högberg J, Stenius U, Korhonen A. Automatic semantic classification of scientific literature according to the hallmarks of cancer. Bioinformatics 2015;32:432-40. [PMID: 26454282 DOI: 10.1093/bioinformatics/btv585] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Accepted: 09/28/2015] [Indexed: 12/31/2022] Open

Abdelillah A, Houcine B, Halima D, Meriem CS, Imane Z, Eddine SD, Abdallah M, Daoudi CS. Evaluation of antifungal activity of free fatty acids methyl esters fraction isolated from Algerian Linum usitatissimum L. seeds against toxigenic Aspergillus. Asian Pac J Trop Biomed 2015;3:443-8. [PMID: 23730556 DOI: 10.1016/s2221-1691(13)60094-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2013] [Accepted: 05/20/2013] [Indexed: 10/27/2022] Open

Text mining of cancer-related information: review of current status and future directions. Int J Med Inform 2014;83:605-23. [PMID: 25008281 DOI: 10.1016/j.ijmedinf.2014.06.009] [Citation(s) in RCA: 112] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2013] [Revised: 06/12/2014] [Accepted: 06/14/2014] [Indexed: 12/21/2022]

Abstract

PURPOSE

This paper reviews the research literature on text mining (TM) with the aim to find out (1) which cancer domains have been the subject of TM efforts, (2) which knowledge resources can support TM of cancer-related information and (3) to what extent systems that rely on knowledge and computational methods can convert text data into useful clinical information. These questions were used to determine the current state of the art in this particular strand of TM and suggest future directions in TM development to support cancer research.

METHODS

A review of the research on TM of cancer-related information was carried out. A literature search was conducted on the Medline database as well as IEEE Xplore and ACM digital libraries to address the interdisciplinary nature of such research. The search results were supplemented with the literature identified through Google Scholar.

RESULTS

A range of studies have proven the feasibility of TM for extracting structured information from clinical narratives such as those found in pathology or radiology reports. In this article, we provide a critical overview of the current state of the art for TM related to cancer. The review highlighted a strong bias towards symbolic methods, e.g. named entity recognition (NER) based on dictionary lookup and information extraction (IE) relying on pattern matching. The F-measure of NER ranges between 80% and 90%, while that of IE for simple tasks is in the high 90s. To further improve the performance, TM approaches need to deal effectively with idiosyncrasies of the clinical sublanguage such as non-standard abbreviations as well as a high degree of spelling and grammatical errors. This requires a shift from rule-based methods to machine learning following the success of similar trends in biological applications of TM. Machine learning approaches require large training datasets, but clinical narratives are not readily available for TM research due to privacy and confidentiality concerns. This issue remains the main bottleneck for progress in this area. In addition, there is a need for a comprehensive cancer ontology that would enable semantic representation of textual information found in narrative reports.

Collapse

Silins I, Korhonen A, Stenius U. Evaluation of carcinogenic modes of action for pesticides in fruit on the Swedish market using a text-mining tool. Front Pharmacol 2014;5:145. [PMID: 25002848 PMCID: PMC4066588 DOI: 10.3389/fphar.2014.00145] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2014] [Accepted: 06/02/2014] [Indexed: 12/16/2022] Open

Biomedical text mining and its applications in cancer research. J Biomed Inform 2013;46:200-11. [DOI: 10.1016/j.jbi.2012.10.007] [Citation(s) in RCA: 159] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2012] [Revised: 10/30/2012] [Accepted: 10/30/2012] [Indexed: 11/21/2022]

Tamaddoni-Nezhad A, Milani GA, Raybould A, Muggleton S, Bohan DA. Construction and Validation of Food Webs Using Logic-Based Machine Learning and Text Mining. ADV ECOL RES 2013. [DOI: 10.1016/b978-0-12-420002-9.00004-4] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

Kadekar S, Silins I, Korhonen A, Dreij K, Al-Anati L, Högberg J, Stenius U. Exocrine pancreatic carcinogenesis and autotaxin expression. PLoS One 2012;7:e43209. [PMID: 22952646 PMCID: PMC3430650 DOI: 10.1371/journal.pone.0043209] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2011] [Accepted: 07/18/2012] [Indexed: 12/12/2022] Open