1
|
Stookey JD, Guendelman S, McCallister B, Whittemore P, Abu-Amara D, Elsasser MA, Dahir F, Armstrong A, Jackson R. Conceptual framework for preterm birth review in San Francisco. Front Public Health 2024; 12:1332972. [PMID: 38751590 PMCID: PMC11094341 DOI: 10.3389/fpubh.2024.1332972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 03/29/2024] [Indexed: 05/18/2024] Open
Abstract
Preterm birth persists as a leading cause of infant mortality and morbidity despite decades of intervention effort. Intervention null effects may reflect failure to account for social determinants of health (SDH) or jointly acting risk factors. In some communities, persistent preterm birth trends and disparities have been consistently associated with SDH such as race/ethnicity, zip code, and housing conditions. Health authorities recommend conceptual frameworks for targeted action on SDH and precision public health approaches for preterm birth prevention. We document San Francisco, California's experience identifying the need, rationale, methods, and pilot work for developing a conceptual framework for preterm birth review (PTBR) in San Francisco. The PTBR conceptual framework is intended to enable essential public health services in San Francisco that prevent a range of preterm birth phenotypes by guiding plans for data collection, hypothesis testing, analytical methods, reports, and intervention strategy. Key elements of the PTBR conceptual framework are described including, 10 domains of SDH, 9 domains at the whole person level, such as lived experience and health behaviors, 8 domains at the within-person level, such as biomarkers and clinical measures, 18 preterm birth phenotypes, and the interconnections between domains. Assumptions for the PTBR conceptual framework were supported by a scoping review of literature on SDH effects on preterm birth, health authority consensus reports, and PTBR pilot data. Researcher and health authority interest in each of the domains warrants the framework to prompt systematic consideration of variables in each proposed domain. PTBR pilot data, illustrated in heatmaps, confirm the feasibility of data collection based on the framework, prevalence of co-occurring risk factors, potential for joint effects on specific preterm birth phenotypes, and opportunity for intervention to block SDH effects on preterm birth. The proposed PTBR conceptual framework has practical implications for specifying (1) population groups at risk, (2) grids or heatmap visualization of risk factors, (3) multi-level analyses, and (4) multi-component intervention design in terms of patterns of co-occurring risk factors. Lessons learned about PTBR data collection logistics, variable choice, and data management will be incorporated into future work to build PTBR infrastructure based on the PTBR conceptual framework.
Collapse
Affiliation(s)
- Jodi D. Stookey
- Maternal, Child, and Adolescent Health Division, San Francisco Department of Public Health, San Francisco, CA, United States
| | - Sylvia Guendelman
- Center of Excellence in Maternal, Child, and Adolescent Health, School of Public Health, University of California, Berkeley, Berkeley, CA, United States
| | | | | | - Deena Abu-Amara
- School of Community Health Sciences, University of Nevada, Reno, NV, United States
| | - Maria A. Elsasser
- School of Nursing and Health Professions, University of San Francisco, San Francisco, CA, United States
| | - Fardowsa Dahir
- Maternal, Child, and Adolescent Health Division, San Francisco Department of Public Health, San Francisco, CA, United States
| | - Aline Armstrong
- Maternal, Child, and Adolescent Health Division, San Francisco Department of Public Health, San Francisco, CA, United States
| | - Rebecca Jackson
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of California, San Francisco, San Francisco, CA, United States
| |
Collapse
|
2
|
De Paoli F, Berardelli S, Limongelli I, Rizzo E, Zucca S. VarChat: the generative AI assistant for the interpretation of human genomic variations. Bioinformatics 2024; 40:btae183. [PMID: 38579245 PMCID: PMC11055464 DOI: 10.1093/bioinformatics/btae183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 03/05/2024] [Accepted: 04/04/2024] [Indexed: 04/07/2024] Open
Abstract
MOTIVATION In the modern era of genomic research, the scientific community is witnessing an explosive growth in the volume of published findings. While this abundance of data offers invaluable insights, it also places a pressing responsibility on genetic professionals and researchers to stay informed about the latest findings and their clinical significance. Genomic variant interpretation is currently facing a challenge in identifying the most up-to-date and relevant scientific papers, while also extracting meaningful information to accelerate the process from clinical assessment to reporting. Computer-aided literature search and summarization can play a pivotal role in this context. By synthesizing complex genomic findings into concise, interpretable summaries, this approach facilitates the translation of extensive genomic datasets into clinically relevant insights. RESULTS To bridge this gap, we present VarChat (varchat.engenome.com), an innovative tool based on generative AI, developed to find and summarize the fragmented scientific literature associated with genomic variants into brief yet informative texts. VarChat provides users with a concise description of specific genetic variants, detailing their impact on related proteins and possible effects on human health. In addition, VarChat offers direct links to related scientific trustable sources, and encourages deeper research. AVAILABILITY AND IMPLEMENTATION varchat.engenome.com.
Collapse
Affiliation(s)
| | - Silvia Berardelli
- enGenome srl, via Ferrata, 5, Pavia, 27100, Italy
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, via Ferrata, 5, Pavia, 27100, Italy
| | | | - Ettore Rizzo
- enGenome srl, via Ferrata, 5, Pavia, 27100, Italy
| | | |
Collapse
|
3
|
Yadav S, Ramesh S, Saha S, Ekbal A. Relation Extraction From Biomedical and Clinical Text: Unified Multitask Learning Framework. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1105-1116. [PMID: 32853152 DOI: 10.1109/tcbb.2020.3020016] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
MOTIVATION To minimize the accelerating amount of time invested on the biomedical literature search, numerous approaches for automated knowledge extraction have been proposed. Relation extraction is one such task where semantic relations between the entities are identified from the free text. In the biomedical domain, extraction of regulatory pathways, metabolic processes, adverse drug reaction or disease models necessitates knowledge from the individual relations, for example, physical or regulatory interactions between genes, proteins, drugs, chemical, disease or phenotype. RESULTS In this paper, we study the relation extraction task from three major biomedical and clinical tasks, namely drug-drug interaction, protein-protein interaction, and medical concept relation extraction. Towards this, we model the relation extraction problem in a multi-task learning (MTL)framework, and introduce for the first time the concept of structured self-attentive network complemented with the adversarial learning approach for the prediction of relationships from the biomedical and clinical text. The fundamental notion of MTL is to simultaneously learn multiple problems together by utilizing the concepts of the shared representation. Additionally, we also generate the highly efficient single task model which exploits the shortest dependency path embedding learned over the attentive gated recurrent unit to compare our proposed MTL models. The framework we propose significantly improves over all the baselines (deep learning techniques)and single-task models for predicting the relationships, without compromising on the performance of all the tasks.
Collapse
|
4
|
Kropiwnicki E, Lachmann A, Clarke DJB, Xie Z, Jagodnik KM, Ma’ayan A. DrugShot: querying biomedical search terms to retrieve prioritized lists of small molecules. BMC Bioinformatics 2022; 23:76. [PMID: 35183110 PMCID: PMC8858480 DOI: 10.1186/s12859-022-04590-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 01/28/2022] [Indexed: 11/29/2022] Open
Abstract
Background PubMed contains millions of abstracts that co-mention terms that describe drugs with other biomedical terms such as genes or diseases. Unique opportunities exist for leveraging these co-mentions by integrating them with other drug-drug similarity resources such as the Library of Integrated Network-based Cellular Signatures (LINCS) L1000 signatures to develop novel hypotheses. Results DrugShot is a web-based server application and an Appyter that enables users to enter any biomedical search term into a simple input form to receive ranked lists of drugs and other small molecules based on their relevance to the search term. To produce ranked lists of small molecules, DrugShot cross-references returned PubMed identifiers (PMIDs) with DrugRIF or AutoRIF, which are curated resources of drug-PMID associations, to produce an associated small molecule list where each small molecule is ranked according to total co-mentions with the search term from shared PubMed IDs. Additionally, using two types of drug-drug similarity matrices, lists of small molecules are predicted to be associated with the search term. Such predictions are based on literature co-mentions and signature similarity from LINCS L1000 drug-induced gene expression profiles. Conclusions DrugShot prioritizes drugs and small molecules associated with biomedical search terms. In addition to listing known associations, DrugShot predicts additional drugs and small molecules related to any search term. Hence, DrugShot can be used to prioritize drugs and preclinical compounds for drug repurposing and suggest indications and adverse events for preclinical compounds. DrugShot is freely and openly available at: https://maayanlab.cloud/drugshot and https://appyters.maayanlab.cloud/#/DrugShot. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04590-5.
Collapse
|
5
|
Proliferation and Apoptosis Pathways and Factors in Oral Squamous Cell Carcinoma. Int J Mol Sci 2022; 23:ijms23031562. [PMID: 35163485 PMCID: PMC8836072 DOI: 10.3390/ijms23031562] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2021] [Revised: 01/24/2022] [Accepted: 01/27/2022] [Indexed: 12/24/2022] Open
Abstract
Oral cancer is the most common form of head and neck squamous cell carcinoma (HNSCC) and most frequently presents as oral squamous cell carcinoma (OSCC), which is associated with an alarmingly high mortality rate. Internationally, a plethora of research to further our understanding of the molecular pathways related to oral cancer is performed. This research is of value for early diagnosis, prognosis, and the investigation of new drugs that can ameliorate the harmful effects of oral cancer and provide optimal patient outcomes with minimal long-term complications. Two pathways on which the progression of OSCC depends on are those of proliferation and apoptosis, which overlap at many junctions. Herein, we aim to review these pathways and factors related to OSCC progression. Publicly available search engines, PubMed and Google Scholar, were used with the following keywords to identify relevant literature: oral cancer, proliferation, proliferation factors, genes, mutations, and tumor suppressor. We anticipate that the use of information provided through this review will further progress translational cancer research work in the field of oral cancer.
Collapse
|
6
|
Jacobs P, Gigerenzer G. Using variation between countries to estimate demand for Cochrane reviews when access is free: a cost-benefit analysis. BMJ Open 2021; 11:e033310. [PMID: 34312188 PMCID: PMC8314729 DOI: 10.1136/bmjopen-2019-033310] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
OBJECTIVES Cochrane reviews are currently of limited use as many healthcare professionals and patients have no access to them. Most member states of the Organisation for Economic Co-operation and Development (OECD) choose not to pay for nationwide access to the reviews, possibly uncertain whether there is enough demand to warrant the costs of a national subscription. This study estimates the demand for review downloads and summary views under free access across all OECD countries. DESIGN The study employs a retrospective design in analysing observational data of web traffic to Cochrane websites in 2014. Specifically, we model for each country downloads of Cochrane reviews and views of online summaries as a function of free access status and alternative sources of variation across countries. The model is then used to estimate demand if a country with restricted access were to purchase free access. We use these estimates to perform a cost-benefit analysis. RESULTS For one group of eight OECD countries, the additional downloads under free access are estimated to cost between US$4 and more than US$20 each. Three countries are expected to save money under free access, as existing institutional subscriptions would no longer be needed. For the largest group of 17 member states, free access is estimated to cost US$0.05-US$2 per additional review download. On average, the increase in review downloads does not appear to be associated with a decrease in the number of summary views. Instead, translations of plain-language summaries into national languages can serve as an additional strategy for dissemination. CONCLUSIONS We estimate that free access would cost less than US$2 per additional download for 20 of the 28 OECD countries without national subscriptions, including Canada, Germany and Israel. These countries may be encouraged by our findings to provide free access to their citizens.
Collapse
Affiliation(s)
- Perke Jacobs
- Harding Center for Risk Literacy, Max Planck Institute for Human Development, Berlin, Germany
| | - Gerd Gigerenzer
- Harding Center for Risk Literacy, Max Planck Institute for Human Development, Berlin, Germany
| |
Collapse
|
7
|
Frainay C, Pitarch Y, Filippi S, Evangelou M, Custovic A. Atopic dermatitis or eczema? Consequences of ambiguity in disease name for biomedical literature mining. Clin Exp Allergy 2021; 51:1185-1194. [PMID: 34213816 DOI: 10.1111/cea.13981] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Accepted: 06/30/2021] [Indexed: 11/26/2022]
Abstract
BACKGROUND Biomedical research increasingly relies on computational approaches to extract relevant information from large corpora of publications. OBJECTIVE To investigate the consequence of the ambiguity between the use of terms "Eczema" and "Atopic Dermatitis" (AD) from the Information Retrieval perspective, and its impact on meta-analyses, systematic reviews and text mining. METHODS Articles were retrieved by querying the PubMed using terms 'eczema' (D003876) and "dermatitis, atopic" (D004485). We used machine learning to investigate the differences between the contexts in which each term is used. We used a decision tree approach and trained model to predict if an article would be indexed with eczema or AD tags. We used text-mining tools to extract biological entities associated with eczema and AD, and investigated the discrepancy regarding the retrieval of key findings according to the terminology used. RESULTS Atopic dermatitis query yielded more articles related to veterinary science, biochemistry, cellular and molecular biology; the eczema query linked to public health, infectious disease and respiratory system. Medical Subject Headings terms associated with "AD" or "Eczema" differed, with an agreement between the top 40 lists of 52%. The presence of terms related to cellular mechanisms, especially allergies and inflammation, characterized AD literature. The metabolites mentioned more frequently than expected in articles with AD tag differed from those indexed with eczema. Fewer enriched genes were retrieved when using eczema compared to AD query. CONCLUSIONS AND CLINICAL RELEVANCE There is a considerable discrepancy when using text mining to extract bio-entities related to eczema or AD. Our results suggest that any systematic approach (particularly when looking for metabolites or genes related to the condition) should be performed using both terms jointly. We propose to use decision tree learning as a tool to spot and characterize ambiguity, and provide the source code for disambiguation at https://github.com/cfrainay/ResearchCodeBase.
Collapse
Affiliation(s)
- Clément Frainay
- Department of Epidemiology and Biostatistics, School of Public Health, Faculty of Medicine, Imperial College London, London, UK.,Toxalim (Research Center in Food Toxicology), INRAE, ENVT, INP-PURPAN, UPS, Université de Toulouse, Toulouse, France
| | - Yoann Pitarch
- UMR5505, IRIT, Université de Toulouse, Toulouse, France
| | - Sarah Filippi
- Department of Mathematics, Faculty of Natural Sciences, Imperial College London, London, UK
| | - Marina Evangelou
- Department of Epidemiology and Biostatistics, School of Public Health, Faculty of Medicine, Imperial College London, London, UK.,Department of Mathematics, Faculty of Natural Sciences, Imperial College London, London, UK
| | - Adnan Custovic
- National Heart and Lung Institute, Imperial College London, London, UK
| |
Collapse
|
8
|
Allot A, Lee K, Chen Q, Luo L, Lu Z. LitSuggest: a web-based system for literature recommendation and curation using machine learning. Nucleic Acids Res 2021; 49:W352-W358. [PMID: 33950204 DOI: 10.1093/nar/gkab326] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 04/16/2021] [Accepted: 04/20/2021] [Indexed: 01/02/2023] Open
Abstract
Searching and reading relevant literature is a routine practice in biomedical research. However, it is challenging for a user to design optimal search queries using all the keywords related to a given topic. As such, existing search systems such as PubMed often return suboptimal results. Several computational methods have been proposed as an effective alternative to keyword-based query methods for literature recommendation. However, those methods require specialized knowledge in machine learning and natural language processing, which can make them difficult for biologists to utilize. In this paper, we propose LitSuggest, a web server that provides an all-in-one literature recommendation and curation service to help biomedical researchers stay up to date with scientific literature. LitSuggest combines advanced machine learning techniques for suggesting relevant PubMed articles with high accuracy. In addition to innovative text-processing methods, LitSuggest offers multiple advantages over existing tools. First, LitSuggest allows users to curate, organize, and download classification results in a single interface. Second, users can easily fine-tune LitSuggest results by updating the training corpus. Third, results can be readily shared, enabling collaborative analysis and curation of scientific literature. Finally, LitSuggest provides an automated personalized weekly digest of newly published articles for each user's project. LitSuggest is publicly available at https://www.ncbi.nlm.nih.gov/research/litsuggest.
Collapse
Affiliation(s)
- Alexis Allot
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kyubum Lee
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA.,Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
| | - Qingyu Chen
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Ling Luo
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA
| |
Collapse
|
9
|
Islamaj R, Wei CH, Cissel D, Miliaras N, Printseva O, Rodionov O, Sekiya K, Ward J, Lu Z. NLM-Gene, a richly annotated gold standard dataset for gene entities that addresses ambiguity and multi-species gene recognition. J Biomed Inform 2021; 118:103779. [PMID: 33839304 PMCID: PMC11037554 DOI: 10.1016/j.jbi.2021.103779] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 03/14/2021] [Accepted: 04/05/2021] [Indexed: 10/21/2022]
Abstract
The automatic recognition of gene names and their corresponding database identifiers in biomedical text is an important first step for many downstream text-mining applications. While current methods for tagging gene entities have been developed for biomedical literature, their performance on species other than human is substantially lower due to the lack of annotation data. We therefore present the NLM-Gene corpus, a high-quality manually annotated corpus for genes developed at the US National Library of Medicine (NLM), covering ambiguous gene names, with an average of 29 gene mentions (10 unique identifiers) per document, and a broader representation of different species (including Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Arabidopsis thaliana, Danio rerio, etc.) when compared to previous gene annotation corpora. NLM-Gene consists of 550 PubMed abstracts from 156 biomedical journals, doubly annotated by six experienced NLM indexers, randomly paired for each document to control for bias. The annotators worked in three annotation rounds until they reached complete agreement. This gold-standard corpus can serve as a benchmark to develop & test new gene text mining algorithms. Using this new resource, we have developed a new gene finding algorithm based on deep learning which improved both on precision and recall from existing tools. The NLM-Gene annotated corpus is freely available at ftp://ftp.ncbi.nlm.nih.gov/pub/lu/NLMGene. We have also applied this tool to the entire PubMed/PMC with their results freely accessible through our web-based tool PubTator (www.ncbi.nlm.nih.gov/research/pubtator).
Collapse
Affiliation(s)
- Rezarta Islamaj
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Chih-Hsuan Wei
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - David Cissel
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Nicholas Miliaras
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Olga Printseva
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Oleg Rodionov
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Keiko Sekiya
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Janice Ward
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Zhiyong Lu
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
10
|
Gomes JDA, Olstad EW, Kowalski TW, Gervin K, Vianna FSL, Schüler-Faccini L, Nordeng HME. Genetic Susceptibility to Drug Teratogenicity: A Systematic Literature Review. Front Genet 2021; 12:645555. [PMID: 33981330 PMCID: PMC8107476 DOI: 10.3389/fgene.2021.645555] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 03/19/2021] [Indexed: 12/19/2022] Open
Abstract
Since the 1960s, drugs have been known to cause teratogenic effects in humans. Such teratogenicity has been postulated to be influenced by genetics. The aim of this review was to provide an overview of the current knowledge on genetic susceptibility to drug teratogenicity in humans and reflect on future directions within the field of genetic teratology. We focused on 12 drugs and drug classes with evidence of teratogenic action, as well as 29 drugs and drug classes with conflicting evidence of fetal safety in humans. An extensive literature search was performed in the PubMed and EMBASE databases using terms related to the drugs of interest, congenital anomalies and fetal development abnormalities, and genetic variation and susceptibility. A total of 29 studies were included in the final data extraction. The eligible studies were published between 1999 and 2020 in 10 different countries, and comprised 28 candidate gene and 1 whole-exome sequencing studies. The sample sizes ranged from 20 to 9,774 individuals. Several drugs were investigated, including antidepressants (nine studies), thalidomide (seven studies), antiepileptic drugs (five studies), glucocorticoids (four studies), acetaminophen (two studies), and sex hormones (estrogens, one study; 17-alpha hydroxyprogesterone caproate, one study). The main neonatal phenotypic outcomes included perinatal complications, cardiovascular congenital anomalies, and neurodevelopmental outcomes. The review demonstrated that studies on genetic teratology are generally small, heterogeneous, and exhibit inconsistent results. The most convincing findings were genetic variants in SLC6A4, MTHFR, and NR3C1, which were associated with drug teratogenicity by antidepressants, antiepileptics, and glucocorticoids, respectively. Notably, this review demonstrated the large knowledge gap regarding genetic susceptibility to drug teratogenicity, emphasizing the need for further efforts in the field. Future studies may be improved by increasing the sample size and applying genome-wide approaches to promote the interpretation of results. Such studies could support the clinical implementation of genetic screening to provide safer drug use in pregnant women in need of drugs.
Collapse
Affiliation(s)
- Julia do Amaral Gomes
- Programa de Pós-Graduação em Genética e Biologia Molecular (PPGBM), Departamento de Genética, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
- Sistema Nacional de Informação sobre Agentes Teratogênicos (SIAT), Serviço de Genética Médica, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
- Laboratório de Medicina Genômica, Centro de Pesquisa Experimental, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
- Instituto Nacional de Genética Médica Populacional (INAGEMP), Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
| | - Emilie Willoch Olstad
- Pharmacoepidemiology and Drug Safety Research Group, Department of Pharmacy, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway
- PharmaTox Strategic Research Initiative, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway
| | - Thayne Woycinck Kowalski
- Programa de Pós-Graduação em Genética e Biologia Molecular (PPGBM), Departamento de Genética, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
- Laboratório de Medicina Genômica, Centro de Pesquisa Experimental, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
- Instituto Nacional de Genética Médica Populacional (INAGEMP), Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
- Complexo de Ensino Superior de Cachoeirinha (CESUCA), Cachoeirinha, Brazil
| | - Kristina Gervin
- Pharmacoepidemiology and Drug Safety Research Group, Department of Pharmacy, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway
- PharmaTox Strategic Research Initiative, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway
- Division of Clinical Neuroscience, Department of Research and Innovation, Oslo University Hospital, Oslo, Norway
| | - Fernanda Sales Luiz Vianna
- Programa de Pós-Graduação em Genética e Biologia Molecular (PPGBM), Departamento de Genética, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
- Sistema Nacional de Informação sobre Agentes Teratogênicos (SIAT), Serviço de Genética Médica, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
- Laboratório de Medicina Genômica, Centro de Pesquisa Experimental, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
- Instituto Nacional de Genética Médica Populacional (INAGEMP), Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
| | - Lavínia Schüler-Faccini
- Programa de Pós-Graduação em Genética e Biologia Molecular (PPGBM), Departamento de Genética, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
- Sistema Nacional de Informação sobre Agentes Teratogênicos (SIAT), Serviço de Genética Médica, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
- Instituto Nacional de Genética Médica Populacional (INAGEMP), Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
| | - Hedvig Marie Egeland Nordeng
- Pharmacoepidemiology and Drug Safety Research Group, Department of Pharmacy, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway
- PharmaTox Strategic Research Initiative, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway
- Department of Child Health and Development, Norwegian Institute of Public Health, Oslo, Norway
| |
Collapse
|
11
|
Abstract
Scientific communication has evolved over time and the formats of scientific writing, including its stylistic modules, have changed accordingly. Research articles from the past fit a research world that had not been taken over by the internet, electronic searches, the new media and even the science mass production of today and reflect a reality where scientific publications were designed to be read and appreciated by actual readers. It is therefore useful to have a look back to what science looked like in the past and examine the biomedical literature from older archives because several features of those publications may actually harbor vital insights for today’s communication. Maintaining a vivid awareness of the evolution of science language and modalities of communication may ensure a better and steadfast progression and ameliorate academic writing in the years to come. With this goal in mind, the present commentary set out to review a 1948 scientific report by I.L. Bennett Jr, entitled “A study on the relationship between the fevers caused by bacterial pyrogens and by the intravenous injection of the sterile exudates of acute inflammation”, which appeared in the Journal of Experimental Medicine in September 1948.
Collapse
|
12
|
Allot A, Peng Y, Wei CH, Lee K, Phan L, Lu Z. LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC. Nucleic Acids Res 2019; 46:W530-W536. [PMID: 29762787 PMCID: PMC6030971 DOI: 10.1093/nar/gky355] [Citation(s) in RCA: 73] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 05/08/2018] [Indexed: 01/10/2023] Open
Abstract
The identification and interpretation of genomic variants play a key role in the diagnosis of genetic diseases and related research. These tasks increasingly rely on accessing relevant manually curated information from domain databases (e.g. SwissProt or ClinVar). However, due to the sheer volume of medical literature and high cost of expert curation, curated variant information in existing databases are often incomplete and out-of-date. In addition, the same genetic variant can be mentioned in publications with various names (e.g. ‘A146T’ versus ‘c.436G>A’ versus ‘rs121913527’). A search in PubMed using only one name usually cannot retrieve all relevant articles for the variant of interest. Hence, to help scientists, healthcare professionals, and database curators find the most up-to-date published variant research, we have developed LitVar for the search and retrieval of standardized variant information. In addition, LitVar uses advanced text mining techniques to compute and extract relationships between variants and other associated entities such as diseases and chemicals/drugs. LitVar is publicly available at https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/LitVar.
Collapse
Affiliation(s)
- Alexis Allot
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Yifan Peng
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Chih-Hsuan Wei
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kyubum Lee
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Lon Phan
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA
| |
Collapse
|
13
|
Al-Aamri A, Taha K, Al-Hammadi Y, Maalouf M, Homouz D. Analyzing a co-occurrence gene-interaction network to identify disease-gene association. BMC Bioinformatics 2019; 20:70. [PMID: 30736752 PMCID: PMC6368766 DOI: 10.1186/s12859-019-2634-7] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Accepted: 01/17/2019] [Indexed: 12/03/2022] Open
Abstract
Background Understanding the genetic networks and their role in chronic diseases (e.g., cancer) is one of the important objectives of biological researchers. In this work, we present a text mining system that constructs a gene-gene-interaction network for the entire human genome and then performs network analysis to identify disease-related genes. We recognize the interacting genes based on their co-occurrence frequency within the biomedical literature and by employing linear and non-linear rare-event classification models. We analyze the constructed network of genes by using different network centrality measures to decide on the importance of each gene. Specifically, we apply betweenness, closeness, eigenvector, and degree centrality metrics to rank the central genes of the network and to identify possible cancer-related genes. Results We evaluated the top 15 ranked genes for different cancer types (i.e., Prostate, Breast, and Lung Cancer). The average precisions for identifying breast, prostate, and lung cancer genes vary between 80-100%. On a prostate case study, the system predicted an average of 80% prostate-related genes. Conclusions The results show that our system has the potential for improving the prediction accuracy of identifying gene-gene interaction and disease-gene associations. We also conduct a prostate cancer case study by using the threshold property in logistic regression, and we compare our approach with some of the state-of-the-art methods. Electronic supplementary material The online version of this article (10.1186/s12859-019-2634-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Amira Al-Aamri
- Department of Electrical and Computer Engineering, Abu Dhabi, United Arab Emirates
| | - Kamal Taha
- Department of Electrical and Computer Engineering, Abu Dhabi, United Arab Emirates
| | - Yousof Al-Hammadi
- Department of Electrical and Computer Engineering, Abu Dhabi, United Arab Emirates
| | - Maher Maalouf
- Department of Industrial and Systems Engineering, Abu Dhabi, United Arab Emirates
| | - Dirar Homouz
- Department of Physics, Khalifa University of Science and Technology, Abu Dhabi, P.O. Box 127788,, United Arab Emirates.
| |
Collapse
|
14
|
Yadav S, Ekbal A, Saha S, Kumar A, Bhattacharyya P. Feature assisted stacked attentive shortest dependency path based Bi-LSTM model for protein–protein interaction. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2018.11.020] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
15
|
Dubin JA, Greenberg DR, Iglinski-Benjamin KC, Abrams GD. Effect of micro-RNA on tenocytes and tendon-related gene expression: A systematic review. J Orthop Res 2018; 36:2823-2829. [PMID: 29873411 DOI: 10.1002/jor.24064] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 05/21/2018] [Indexed: 02/04/2023]
Abstract
The purpose of the review was to synthesize the current literature regarding the effect of miRNA on biological processes known to be involved in tendon and tenocyte development and homeostasis. Using multiple databases, a systematic review was performed with a customized search term crafted to identify any study examining micro-RNA in relation to tendon and/or tenocytes. Results were classified based on the following categories: Gene expression, tenocyte development and differentiation, tendon tissue repair, and tenocyte senescence. A total of 3,112 potentially relevant studies were reviewed, and after exclusion criteria was applied, 15 investigations were included in the final analysis. There were 14 specific miRNA included in this review, with 11 studies reporting on tendon-related gene expression, five reporting on tendon development and/or tenocyte differentiation, six reporting on tendon tissue repair, and five reporting on tenocyte senescence. The miR-29 family was the most commonly reported micro-RNA in the investigation. We also report on a number of micro-RNA which are associated with both positive and negative effects on tendon homeostasis. © 2018 Orthopaedic Research Society. Published by Wiley Periodicals, Inc. J Orthop Res 36:2823-2829, 2018.
Collapse
Affiliation(s)
- Jeremy A Dubin
- Veterans Administration-Palo Alto, Palo Alto, California
| | - Daniel R Greenberg
- Department of Orthopedic Surgery, Stanford University School of Medicine, 341 Galvez Street Mail Code 6175, Stanford, California
| | - Kag C Iglinski-Benjamin
- Veterans Administration-Palo Alto, Palo Alto, California.,Department of Orthopedic Surgery, Stanford University School of Medicine, 341 Galvez Street Mail Code 6175, Stanford, California
| | - Geoffrey D Abrams
- Veterans Administration-Palo Alto, Palo Alto, California.,Department of Orthopedic Surgery, Stanford University School of Medicine, 341 Galvez Street Mail Code 6175, Stanford, California
| |
Collapse
|
16
|
Rindflesch TC, Blake CL, Fiszman M, Kilicoglu H, Rosemblat G, Schneider J, Zeiss CJ. Informatics Support for Basic Research in Biomedicine. ILAR J 2017; 58:80-89. [PMID: 28838071 DOI: 10.1093/ilar/ilx004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Accepted: 01/13/2017] [Indexed: 11/13/2022] Open
Abstract
Informatics methodologies exploit computer-assisted techniques to help biomedical researchers manage large amounts of information. In this paper, we focus on the biomedical research literature (MEDLINE). We first provide an overview of some text mining techniques that offer assistance in research by identifying biomedical entities (e.g., genes, substances, and diseases) and relations between them in text.We then discuss Semantic MEDLINE, an application that integrates PubMed document retrieval, concept and relation identification, and visualization, thus enabling a user to explore concepts and relations from within a set of retrieved citations. Semantic MEDLINE provides a roadmap through content and helps users discern patterns in large numbers of retrieved citations. We illustrate its use with an informatics method we call "discovery browsing," which provides a principled way of navigating through selected aspects of some biomedical research area. The method supports an iterative process that accommodates learning and hypothesis formation in which a user is provided with high level connections before delving into details.As a use case, we examine current developments in basic research on mechanisms of Alzheimer's disease. Out of the nearly 90 000 citations returned by the PubMed query "Alzheimer's disease," discovery browsing led us to 73 citations on sortilin and that disorder. We provide a synopsis of the basic research reported in 15 of these. There is wide-spread consensus among researchers working with a range of animal models and human cells that increased sortilin expression and decreased receptor expression are associated with amyloid beta and/or amyloid precursor protein.
Collapse
Affiliation(s)
- Thomas C Rindflesch
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois, Urbana-Champaign; Center for Informatics in Science and Scholarship. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, Illinois. Yale University School of Medicine, New Haven, Connecticut
| | - Catherine L Blake
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois, Urbana-Champaign; Center for Informatics in Science and Scholarship. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, Illinois. Yale University School of Medicine, New Haven, Connecticut
| | - Marcelo Fiszman
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois, Urbana-Champaign; Center for Informatics in Science and Scholarship. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, Illinois. Yale University School of Medicine, New Haven, Connecticut
| | - Halil Kilicoglu
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois, Urbana-Champaign; Center for Informatics in Science and Scholarship. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, Illinois. Yale University School of Medicine, New Haven, Connecticut
| | - Graciela Rosemblat
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois, Urbana-Champaign; Center for Informatics in Science and Scholarship. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, Illinois. Yale University School of Medicine, New Haven, Connecticut
| | - Jodi Schneider
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois, Urbana-Champaign; Center for Informatics in Science and Scholarship. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, Illinois. Yale University School of Medicine, New Haven, Connecticut
| | - Caroline J Zeiss
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois, Urbana-Champaign; Center for Informatics in Science and Scholarship. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, Illinois. Yale University School of Medicine, New Haven, Connecticut
| |
Collapse
|
17
|
Abstract
The Abstract Sifter is a Microsoft Excel based application that enhances existing search capabilities of PubMed. The Abstract Sifter assists researchers to search effectively, triage results, and keep track of articles of interest. The tool implements an innovative “sifter” functionality for relevance ranking, giving the researcher a way to find articles of interest quickly. The tool also gives researchers a view of the literature landscape for a set of entities such as chemicals or genes. The Abstract Sifter is available as a Microsoft Excel macro-enabled workbook application.
Collapse
Affiliation(s)
| | - Thomas Knudsen
- National Center for Computational Toxicology, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Antony Williams
- National Center for Computational Toxicology, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA
| |
Collapse
|
18
|
Wijewickrema M, Petras V. Journal selection criteria in an open access environment: A comparison between the medicine and social sciences. LEARNED PUBLISHING 2017. [DOI: 10.1002/leap.1113] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Manjula Wijewickrema
- Berlin School of Library and Information Science; Humboldt University of Berlin; Berlin Germany
| | - Vivien Petras
- Berlin School of Library and Information Science; Humboldt University of Berlin; Berlin Germany
| |
Collapse
|
19
|
Sernadela P, Oliveira JL. A semantic-based workflow for biomedical literature annotation. Database (Oxford) 2017; 2017:4635750. [PMID: 29220478 PMCID: PMC5691355 DOI: 10.1093/database/bax088] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Revised: 10/02/2017] [Accepted: 10/30/2017] [Indexed: 11/12/2022]
Abstract
Computational annotation of textual information has taken on an important role in knowledge extraction from the biomedical literature, since most of the relevant information from scientific findings is still maintained in text format. In this endeavour, annotation tools can assist in the identification of biomedical concepts and their relationships, providing faster reading and curation processes, with reduced costs. However, the separate usage of distinct annotation systems results in highly heterogeneous data, as it is difficult to efficiently combine and exchange this valuable asset. Moreover, despite the existence of several annotation formats, there is no unified way to integrate miscellaneous annotation outcomes into a reusable, sharable and searchable structure. Taking up this challenge, we present a modular architecture for textual information integration using semantic web features and services. The solution described allows the migration of curation data into a common model, providing a suitable transition process in which multiple annotation data can be integrated and enriched, with the possibility of being shared, compared and reused across semantic knowledge bases.
Collapse
Affiliation(s)
- Pedro Sernadela
- University of Aveiro, DETI/IEETA, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal
| | - José Luís Oliveira
- University of Aveiro, DETI/IEETA, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal
| |
Collapse
|
20
|
Teixeira da Silva JA, Al-Khatib A. The Macro and Micro Scale of Open Access Predation. PUBLISHING RESEARCH QUARTERLY 2016. [DOI: 10.1007/s12109-016-9495-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
21
|
Döring K, Grüning BA, Telukunta KK, Thomas P, Günther S. PubMedPortable: A Framework for Supporting the Development of Text Mining Applications. PLoS One 2016; 11:e0163794. [PMID: 27706202 PMCID: PMC5051953 DOI: 10.1371/journal.pone.0163794] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Accepted: 09/14/2016] [Indexed: 11/18/2022] Open
Abstract
Information extraction from biomedical literature is continuously growing in scope and importance. Many tools exist that perform named entity recognition, e.g. of proteins, chemical compounds, and diseases. Furthermore, several approaches deal with the extraction of relations between identified entities. The BioCreative community supports these developments with yearly open challenges, which led to a standardised XML text annotation format called BioC. PubMed provides access to the largest open biomedical literature repository, but there is no unified way of connecting its data to natural language processing tools. Therefore, an appropriate data environment is needed as a basis to combine different software solutions and to develop customised text mining applications. PubMedPortable builds a relational database and a full text index on PubMed citations. It can be applied either to the complete PubMed data set or an arbitrary subset of downloaded PubMed XML files. The software provides the infrastructure to combine stand-alone applications by exporting different data formats, e.g. BioC. The presented workflows show how to use PubMedPortable to retrieve, store, and analyse a disease-specific data set. The provided use cases are well documented in the PubMedPortable wiki. The open-source software library is small, easy to use, and scalable to the user's system requirements. It is freely available for Linux on the web at https://github.com/KerstenDoering/PubMedPortable and for other operating systems as a virtual container. The approach was tested extensively and applied successfully in several projects.
Collapse
Affiliation(s)
- Kersten Döring
- Pharmaceutical Bioinformatics, Institute of Pharmaceutical Sciences, Albert-Ludwigs University, 79104 Freiburg, Germany
| | - Björn A. Grüning
- Bioinformatics, Institute of Computer Science, Albert-Ludwigs University, 79110 Freiburg, Germany
| | - Kiran K. Telukunta
- Bioinformatics, Institute of Computer Science, Albert-Ludwigs University, 79110 Freiburg, Germany
| | - Philippe Thomas
- Language Technology Lab, German Research Center for Artificial Intelligence, DFKI GmbH, 10559 Berlin, Germany
| | - Stefan Günther
- Pharmaceutical Bioinformatics, Institute of Pharmaceutical Sciences, Albert-Ludwigs University, 79104 Freiburg, Germany
- * E-mail:
| |
Collapse
|
22
|
Huang CC, Lu Z. Community challenges in biomedical text mining over 10 years: success, failure and the future. Brief Bioinform 2015; 17:132-44. [PMID: 25935162 DOI: 10.1093/bib/bbv024] [Citation(s) in RCA: 97] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Indexed: 11/13/2022] Open
Abstract
One effective way to improve the state of the art is through competitions. Following the success of the Critical Assessment of protein Structure Prediction (CASP) in bioinformatics research, a number of challenge evaluations have been organized by the text-mining research community to assess and advance natural language processing (NLP) research for biomedicine. In this article, we review the different community challenge evaluations held from 2002 to 2014 and their respective tasks. Furthermore, we examine these challenge tasks through their targeted problems in NLP research and biomedical applications, respectively. Next, we describe the general workflow of organizing a Biomedical NLP (BioNLP) challenge and involved stakeholders (task organizers, task data producers, task participants and end users). Finally, we summarize the impact and contributions by taking into account different BioNLP challenges as a whole, followed by a discussion of their limitations and difficulties. We conclude with future trends in BioNLP challenge evaluations.
Collapse
|
23
|
Khare R, Good BM, Leaman R, Su AI, Lu Z. Crowdsourcing in biomedicine: challenges and opportunities. Brief Bioinform 2015; 17:23-32. [PMID: 25888696 DOI: 10.1093/bib/bbv021] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
The use of crowdsourcing to solve important but complex problems in biomedical and clinical sciences is growing and encompasses a wide variety of approaches. The crowd is diverse and includes online marketplace workers, health information seekers, science enthusiasts and domain experts. In this article, we review and highlight recent studies that use crowdsourcing to advance biomedicine. We classify these studies into two broad categories: (i) mining big data generated from a crowd (e.g. search logs) and (ii) active crowdsourcing via specific technical platforms, e.g. labor markets, wikis, scientific games and community challenges. Through describing each study in detail, we demonstrate the applicability of different methods in a variety of domains in biomedical research, including genomics, biocuration and clinical research. Furthermore, we discuss and highlight the strengths and limitations of different crowdsourcing platforms. Finally, we identify important emerging trends, opportunities and remaining challenges for future crowdsourcing research in biomedicine.
Collapse
|