1
|
Song Y, Luximon A, Luximon Y. Facial Anthropomorphic Trustworthiness Scale for Social Robots: A Hybrid Approach. Biomimetics (Basel) 2023; 8:335. [PMID: 37622940 PMCID: PMC10452404 DOI: 10.3390/biomimetics8040335] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 07/14/2023] [Accepted: 07/24/2023] [Indexed: 08/26/2023] Open
Abstract
Social robots serve as autonomous systems for performing social behaviors and assuming social roles. However, there is a lack of research focusing on the specific measurement of facial trustworthiness toward anthropomorphic robots, particularly during initial interactions. To address this research gap, a hybrid deep convolution approach was employed in this study, involving a crowdsourcing platform for data collection and deep convolution and factor analysis for data processing. The goal was to develop a scale, called Facial Anthropomorphic Trustworthiness towards Social Robots (FATSR-17), to measure the trustworthiness of a robot's facial appearance. The final measurement scale comprised four dimensions, "ethics concern", "capability", "positive affect", and "anthropomorphism", consisting of 17 items. An iterative examination and a refinement process were conducted to ensure the scale's reliability and validity. The study contributes to the field of robot design by providing designers with a structured toolkit to create robots that appear trustworthy to users.
Collapse
Affiliation(s)
- Yao Song
- Digital Convergence Laboratory of Chinese Cultural Inheritance and Global Communication, Sichuan University, Chengdu 610065, China;
- College of Literature and Journalism, Sichuan University, Chengdu 610065, China
- School of Design, The Hong Kong Polytechnic University, Hung Hom, Hong Kong 999077, China
| | - Ameersing Luximon
- Georgia Tech Shenzhen Institute, Tianjin University, Shenzhen 518071, China;
| | - Yan Luximon
- School of Design, The Hong Kong Polytechnic University, Hung Hom, Hong Kong 999077, China
| |
Collapse
|
2
|
Poleksic A. Overcoming Sparseness of Biomedical Networks to Identify Drug Repositioning Candidates. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2377-2384. [PMID: 33591920 DOI: 10.1109/tcbb.2021.3059807] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Modeling complex biological systems is necessary to understand biochemical interactions behind pharmacological effects of drugs. Successful in silico drug repurposing relies on exploration of diverse biochemical concepts and their relationships, including drug's adverse reactions, drug targets, disease symptoms, as well as disease associated genes and their pathways, to name a few. We present a computational method for inferring drug-disease associations from complex but incomplete and biased biological networks. Our method employs matrix completion to overcome the sparseness of biomedical data and to enrich the set of relationships between different biomedical entities. We present a strategy for identifying network paths supportive of drug efficacy as well as a computational procedure capable of combining different network patterns to better distinguish treatments from non-treatments. The algorithms is available at http://bioinfo.cs.uni.edu/AEONET.html.
Collapse
|
3
|
Relation extraction from DailyMed structured product labels by optimally combining crowd, experts and machines. J Biomed Inform 2021; 122:103902. [PMID: 34481057 DOI: 10.1016/j.jbi.2021.103902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 08/25/2021] [Accepted: 08/26/2021] [Indexed: 11/22/2022]
Abstract
The effectiveness of machine learning models to provide accurate and consistent results in drug discovery and clinical decision support is strongly dependent on the quality of the data used. However, substantive amounts of open data that drive drug discovery suffer from a number of issues including inconsistent representation, inaccurate reporting, and incomplete context. For example, databases of FDA-approved drug indications used in computational drug repositioning studies do not distinguish between treatments that simply offer symptomatic relief from those that target the underlying pathology. Moreover, drug indication sources often lack proper provenance and have little overlap. Consequently, new predictions can be of poor quality as they offer little in the way of new insights. Hence, work remains to be done to establish higher quality databases of drug indications that are suitable for use in drug discovery and repositioning studies. Here, we report on the combination of weak supervision (i.e., programmatic labeling and crowdsourcing) and deep learning methods for relation extraction from DailyMed text to create a higher quality drug-disease relation dataset. The generated drug-disease relation data shows a high overlap with DrugCentral, a manually curated dataset. Using this dataset, we constructed a machine learning model to classify relations between drugs and diseases from text into four categories; treatment, symptomatic relief, contradiction, and effect, exhibiting an improvement of 15.5% with Bi-LSTM (F1 score of 71.8%) over the best performing discrete method. Access to high quality data is crucial to building accurate and reliable drug repurposing prediction models. Our work suggests how the combination of crowds, experts, and machine learning methods can go hand-in-hand to improve datasets and predictive models.
Collapse
|
4
|
Bhatt A, Roberts R, Chen X, Li T, Connor S, Hatim Q, Mikailov M, Tong W, Liu Z. DICE: A Drug Indication Classification and Encyclopedia for AI-Based Indication Extraction. Front Artif Intell 2021; 4:711467. [PMID: 34409286 PMCID: PMC8366025 DOI: 10.3389/frai.2021.711467] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 07/19/2021] [Indexed: 11/13/2022] Open
Abstract
Drug labeling contains an ‘INDICATIONS AND USAGE’ that provides vital information to support clinical decision making and regulatory management. Effective extraction of drug indication information from free-text based resources could facilitate drug repositioning projects and help collect real-world evidence in support of secondary use of approved medicines. To enable AI-powered language models for the extraction of drug indication information, we used manual reading and curation to develop a Drug Indication Classification and Encyclopedia (DICE) based on FDA approved human prescription drug labeling. A DICE scheme with 7,231 sentences categorized into five classes (indications, contradictions, side effects, usage instructions, and clinical observations) was developed. To further elucidate the utility of the DICE, we developed nine different AI-based classifiers for the prediction of indications based on the developed DICE to comprehensively assess their performance. We found that the transformer-based language models yielded an average MCC of 0.887, outperforming the word embedding-based Bidirectional long short-term memory (BiLSTM) models (0.862) with a 2.82% improvement on the test set. The best classifiers were also used to extract drug indication information in DrugBank and achieved a high enrichment rate (>0.930) for this task. We found that domain-specific training could provide more explainable models without performance sacrifices and better generalization for external validation datasets. Altogether, the proposed DICE could be a standard resource for the development and evaluation of task-specific AI-powered, natural language processing (NLP) models.
Collapse
Affiliation(s)
- Arjun Bhatt
- Division of Bioinformatics & Biostatistics, National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, United States.,Dartmouth College, Hanover, NH, United States.,Brody School of Medicine, East Carolina University School of Medicine, Greenville, NC, United States
| | - Ruth Roberts
- ApconiX Ltd, Alderley Edge, United Kingdom.,Department of Biosciences, University of Birmingham, Birmingham, United Kingdom
| | - Xi Chen
- Division of Bioinformatics & Biostatistics, National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, United States
| | - Ting Li
- Division of Bioinformatics & Biostatistics, National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, United States
| | - Skylar Connor
- Division of Bioinformatics & Biostatistics, National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, United States
| | - Qais Hatim
- Office of Translational Sciences, Center for Drug Evaluation and Research, US FDA, Silver Spring, MD, United States
| | - Mike Mikailov
- Office of Science and Engineering Labs, Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, MD, United States
| | - Weida Tong
- Division of Bioinformatics & Biostatistics, National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, United States
| | - Zhichao Liu
- Division of Bioinformatics & Biostatistics, National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, United States
| |
Collapse
|
5
|
Moodley K, Rieswijk L, Oprea TI, Dumontier M. InContext: curation of medical context for drug indications. J Biomed Semantics 2021; 12:2. [PMID: 33579375 PMCID: PMC7881657 DOI: 10.1186/s13326-021-00234-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 01/21/2021] [Indexed: 11/10/2022] Open
Abstract
Accurate and precise information about the therapeutic uses (indications) of a drug is essential for applications in drug repurposing and precision medicine. Leading online drug resources such as DrugCentral and DrugBank provide rich information about various properties of drugs, including their indications. However, because indications in such databases are often partly automatically mined, some may prove to be inaccurate or imprecise. Particularly challenging for text mining methods is the task of distinguishing between general disease mentions in drug product labels and actual indications for the drug. For this, the qualifying medical context of the disease mentions in the text should be studied. Some examples include contraindications, co-prescribed drugs and target patient qualifications. No existing indication curation efforts attempt to capture such information in a precise way. Here we fill this gap by presenting a novel curation protocol for extracting indications and machine processable annotations of contextual information about the therapeutic use of a drug. We implemented the protocol on a reference set of FDA-approved drug product labels on the DailyMed website to curate indications for 150 anti-cancer and cardiovascular drugs. The resulting corpus - InContext - focuses on anti-cancer and cardiovascular drugs because of the heightened societal interest in cancer and heart disease. In order to understand how InContext relates with existing reputable drug indication databases, we analysed it’s overlap with a state-of-the-art indications database - LabeledIn - as well as a reputable online drug compendium - DrugCentral. We found that 40% of indications sampled from DrugCentral (and 23% from LabeledIn) respectively, could not be accounted for in InContext. This raises questions about the veracity of indications not appearing in InContext. The additional contextual information curated by InContext about disease mentions in drug SPLs provides a foundation for more precise, structured and formal representations of knowledge related to drug therapeutic use, in order to increase accuracy and agreement of drug indication extraction methods for in silico drug repurposing.
Collapse
Affiliation(s)
- Kody Moodley
- Institute of Data Science, Maastricht University, Paul-Henri Spaaklaan 1, 6229 GT, Maastricht, The Netherlands.
| | - Linda Rieswijk
- Institute of Data Science, Maastricht University, Paul-Henri Spaaklaan 1, 6229 GT, Maastricht, The Netherlands
| | - Tudor I Oprea
- Translational Informatics Division, Department of Internal Medicine, MSC09-5025, One University of New Mexico, Albuquerque, New Mexico, 87131, USA.,UNM Comprehensive Cancer Center, 1201 Camino de Salud, Albuquerque, New Mexico, 87102, USA
| | - Michel Dumontier
- Institute of Data Science, Maastricht University, Paul-Henri Spaaklaan 1, 6229 GT, Maastricht, The Netherlands
| |
Collapse
|
6
|
Gartland A, Bate A, Painter JL, Casperson TA, Powell GE. Developing Crowdsourced Training Data Sets for Pharmacovigilance Intelligent Automation. Drug Saf 2020; 44:373-382. [PMID: 33354751 DOI: 10.1007/s40264-020-01028-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/23/2020] [Indexed: 11/29/2022]
Abstract
INTRODUCTION Machine learning offers an alluring solution to developing automated approaches to the increasing individual case safety report burden being placed upon pharmacovigilance. Leveraging crowdsourcing to annotate unstructured data may provide accurate, efficient, and contemporaneous training data sets in support of machine learning. OBJECTIVE The objective of this study was to evaluate whether crowdsourcing can be used to accurately and efficiently develop training data sets in support of pharmacovigilance automation. MATERIALS AND METHODS Pharmacovigilance experts created a reference dataset by reviewing 15,490 de-identified social media posts of narratives pertaining to 15 drugs and 22 medically relevant topics. A random sampling of posts from the reference dataset was published on Amazon Turk and its users (Turkers) were asked a series of questions about those same medical concepts. Accuracy, price elasticity, and time efficiency were evaluated. RESULTS Accuracy of crowdsourced curation exceeded 90% when compared to the reference dataset and was completed in about 5% of the time. There was an increase in time efficiency with higher pay, but there was no significant difference in accuracy. Additionally, having a social media post reviewed by more than one Turker (using a voting system) did not offer significant improvements in terms of accuracy. CONCLUSIONS Crowdsourcing is an accurate and efficient method that can be used to develop training data sets in support of pharmacovigilance automation. More research is needed to better understand the breadth and depth of possible uses as well as strengths, limitations, and generalizability of results.
Collapse
Affiliation(s)
- Alex Gartland
- College of Medicine, University of Central Florida, Orlando, FL, USA
| | - Andrew Bate
- Safety and Medical Governance, GlaxoSmithKline, London, UK
| | | | - Tim A Casperson
- North American Medical Affairs, GlaxoSmithKline, Research Triangle Park, NC, USA
| | - Gregory Eugene Powell
- Pharma Safety, GlaxoSmithKline, 5 Moore Dr., Research Triangle Park, NC, 27709, USA.
| |
Collapse
|
7
|
Sousa D, Lamurias A, Couto FM. A hybrid approach toward biomedical relation extraction training corpora: combining distant supervision with crowdsourcing. Database (Oxford) 2020; 2020:baaa104. [PMID: 33258966 PMCID: PMC7706181 DOI: 10.1093/database/baaa104] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 09/02/2020] [Accepted: 11/12/2020] [Indexed: 12/14/2022]
Abstract
Biomedical relation extraction (RE) datasets are vital in the construction of knowledge bases and to potentiate the discovery of new interactions. There are several ways to create biomedical RE datasets, some more reliable than others, such as resorting to domain expert annotations. However, the emerging use of crowdsourcing platforms, such as Amazon Mechanical Turk (MTurk), can potentially reduce the cost of RE dataset construction, even if the same level of quality cannot be guaranteed. There is a lack of power of the researcher to control who, how and in what context workers engage in crowdsourcing platforms. Hence, allying distant supervision with crowdsourcing can be a more reliable alternative. The crowdsourcing workers would be asked only to rectify or discard already existing annotations, which would make the process less dependent on their ability to interpret complex biomedical sentences. In this work, we use a previously created distantly supervised human phenotype-gene relations (PGR) dataset to perform crowdsourcing validation. We divided the original dataset into two annotation tasks: Task 1, 70% of the dataset annotated by one worker, and Task 2, 30% of the dataset annotated by seven workers. Also, for Task 2, we added an extra rater on-site and a domain expert to further assess the crowdsourcing validation quality. Here, we describe a detailed pipeline for RE crowdsourcing validation, creating a new release of the PGR dataset with partial domain expert revision, and assess the quality of the MTurk platform. We applied the new dataset to two state-of-the-art deep learning systems (BiOnt and BioBERT) and compared its performance with the original PGR dataset, as well as combinations between the two, achieving a 0.3494 increase in average F-measure. The code supporting our work and the new release of the PGR dataset is available at https://github.com/lasigeBioTM/PGR-crowd.
Collapse
Affiliation(s)
- Diana Sousa
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Lisboa 1749-016, Portugal
| | - Andre Lamurias
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Lisboa 1749-016, Portugal
| | - Francisco M Couto
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Lisboa 1749-016, Portugal
| |
Collapse
|
8
|
Information on adverse drug reactions—Proof of principle for a structured database that allows customization of drug information. Int J Med Inform 2020; 133:103970. [DOI: 10.1016/j.ijmedinf.2019.103970] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 08/07/2019] [Accepted: 09/14/2019] [Indexed: 11/17/2022]
|
9
|
Tsueng G, Nanis M, Fouquier JT, Mayers M, Good BM, Su AI. Applying citizen science to gene, drug and disease relationship extraction from biomedical abstracts. Bioinformatics 2019; 36:1226-1233. [PMID: 31504205 PMCID: PMC8104067 DOI: 10.1093/bioinformatics/btz678] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 08/05/2019] [Accepted: 08/29/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Biomedical literature is growing at a rate that outpaces our ability to harness the knowledge contained therein. To mine valuable inferences from the large volume of literature, many researchers use information extraction algorithms to harvest information in biomedical texts. Information extraction is usually accomplished via a combination of manual expert curation and computational methods. Advances in computational methods usually depend on the time-consuming generation of gold standards by a limited number of expert curators. Citizen science is public participation in scientific research. We previously found that citizen scientists are willing and capable of performing named entity recognition of disease mentions in biomedical abstracts, but did not know if this was true with relationship extraction (RE). RESULTS In this article, we introduce the Relationship Extraction Module of the web-based application Mark2Cure (M2C) and demonstrate that citizen scientists can perform RE. We confirm the importance of accurate named entity recognition on user performance of RE and identify design issues that impacted data quality. We find that the data generated by citizen scientists can be used to identify relationship types not currently available in the M2C Relationship Extraction Module. We compare the citizen science-generated data with algorithm-mined data and identify ways in which the two approaches may complement one another. We also discuss opportunities for future improvement of this system, as well as the potential synergies between citizen science, manual biocuration and natural language processing. AVAILABILITY AND IMPLEMENTATION Mark2Cure platform: https://mark2cure.org; Mark2Cure source code: https://github.com/sulab/mark2cure; and data and analysis code for this article: https://github.com/gtsueng/M2C_rel_nb. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Max Nanis
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Jennifer T Fouquier
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Michael Mayers
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Benjamin M Good
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| |
Collapse
|
10
|
Créquit P, Mansouri G, Benchoufi M, Vivot A, Ravaud P. Mapping of Crowdsourcing in Health: Systematic Review. J Med Internet Res 2018; 20:e187. [PMID: 29764795 PMCID: PMC5974463 DOI: 10.2196/jmir.9330] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Revised: 02/10/2018] [Accepted: 03/14/2018] [Indexed: 11/22/2022] Open
Abstract
Background Crowdsourcing involves obtaining ideas, needed services, or content by soliciting Web-based contributions from a crowd. The 4 types of crowdsourced tasks (problem solving, data processing, surveillance or monitoring, and surveying) can be applied in the 3 categories of health (promotion, research, and care). Objective This study aimed to map the different applications of crowdsourcing in health to assess the fields of health that are using crowdsourcing and the crowdsourced tasks used. We also describe the logistics of crowdsourcing and the characteristics of crowd workers. Methods MEDLINE, EMBASE, and ClinicalTrials.gov were searched for available reports from inception to March 30, 2016, with no restriction on language or publication status. Results We identified 202 relevant studies that used crowdsourcing, including 9 randomized controlled trials, of which only one had posted results at ClinicalTrials.gov. Crowdsourcing was used in health promotion (91/202, 45.0%), research (73/202, 36.1%), and care (38/202, 18.8%). The 4 most frequent areas of application were public health (67/202, 33.2%), psychiatry (32/202, 15.8%), surgery (22/202, 10.9%), and oncology (14/202, 6.9%). Half of the reports (99/202, 49.0%) referred to data processing, 34.6% (70/202) referred to surveying, 10.4% (21/202) referred to surveillance or monitoring, and 5.9% (12/202) referred to problem-solving. Labor market platforms (eg, Amazon Mechanical Turk) were used in most studies (190/202, 94%). The crowd workers’ characteristics were poorly reported, and crowdsourcing logistics were missing from two-thirds of the reports. When reported, the median size of the crowd was 424 (first and third quartiles: 167-802); crowd workers’ median age was 34 years (32-36). Crowd workers were mainly recruited nationally, particularly in the United States. For many studies (58.9%, 119/202), previous experience in crowdsourcing was required, and passing a qualification test or training was seldom needed (11.9% of studies; 24/202). For half of the studies, monetary incentives were mentioned, with mainly less than US $1 to perform the task. The time needed to perform the task was mostly less than 10 min (58.9% of studies; 119/202). Data quality validation was used in 54/202 studies (26.7%), mainly by attention check questions or by replicating the task with several crowd workers. Conclusions The use of crowdsourcing, which allows access to a large pool of participants as well as saving time in data collection, lowering costs, and speeding up innovations, is increasing in health promotion, research, and care. However, the description of crowdsourcing logistics and crowd workers’ characteristics is frequently missing in study reports and needs to be precisely reported to better interpret the study findings and replicate them.
Collapse
Affiliation(s)
- Perrine Créquit
- INSERM UMR1153, Methods Team, Epidemiology and Statistics Sorbonne Paris Cité Research Center, Paris Descartes University, Paris, France.,Centre d'Epidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, Paris, France.,Cochrane France, Paris, France
| | - Ghizlène Mansouri
- INSERM UMR1153, Methods Team, Epidemiology and Statistics Sorbonne Paris Cité Research Center, Paris Descartes University, Paris, France
| | - Mehdi Benchoufi
- Centre d'Epidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, Paris, France
| | - Alexandre Vivot
- INSERM UMR1153, Methods Team, Epidemiology and Statistics Sorbonne Paris Cité Research Center, Paris Descartes University, Paris, France.,Centre d'Epidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, Paris, France
| | - Philippe Ravaud
- INSERM UMR1153, Methods Team, Epidemiology and Statistics Sorbonne Paris Cité Research Center, Paris Descartes University, Paris, France.,Centre d'Epidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, Paris, France.,Cochrane France, Paris, France.,Department of Epidemiology, Columbia University, Mailman School of Public Health, New York, NY, United States
| |
Collapse
|
11
|
Abstract
Background Crowdsourcing is a nascent phenomenon that has grown exponentially since it was coined in 2006. It involves a large group of people solving a problem or completing a task for an individual or, more commonly, for an organisation. While the field of crowdsourcing has developed more quickly in information technology, it has great promise in health applications. This review examines uses of crowdsourcing in global health and health, broadly. Methods Semantic searches were run in Google Scholar for “crowdsourcing,” “crowdsourcing and health,” and similar terms. 996 articles were retrieved and all abstracts were scanned. 285 articles related to health. This review provides a narrative overview of the articles identified. Results Eight areas where crowdsourcing has been used in health were identified: diagnosis; surveillance; nutrition; public health and environment; education; genetics; psychology; and, general medicine/other. Many studies reported crowdsourcing being used in a diagnostic or surveillance capacity. Crowdsourcing has been widely used across medical disciplines; however, it is important for future work using crowdsourcing to consider the appropriateness of the crowd being used to ensure the crowd is capable and has the adequate knowledge for the task at hand. Gamification of tasks seems to improve accuracy; other innovative methods of analysis including introducing thresholds and measures of trustworthiness should be considered. Conclusion Crowdsourcing is a new field that has been widely used and is innovative and adaptable. With the exception of surveillance applications that are used in emergency and disaster situations, most uses of crowdsourcing have only been used as pilots. These exceptions demonstrate that it is possible to take crowdsourcing applications to scale. Crowdsourcing has the potential to provide more accessible health care to more communities and individuals rapidly and to lower costs of care.
Collapse
Affiliation(s)
- Kerri Wazny
- Centre for Global Health Research, Usher Institute of Informatics and Population Sciences, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
12
|
Comparing Amazon's Mechanical Turk Platform to Conventional Data Collection Methods in the Health and Medical Research Literature. J Gen Intern Med 2018; 33:533-538. [PMID: 29302882 PMCID: PMC5880761 DOI: 10.1007/s11606-017-4246-0] [Citation(s) in RCA: 240] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Revised: 09/29/2017] [Accepted: 11/21/2017] [Indexed: 10/18/2022]
Abstract
BACKGROUND The goal of this article is to conduct an assessment of the peer-reviewed primary literature with study objectives to analyze Amazon.com 's Mechanical Turk (MTurk) as a research tool in a health services research and medical context. METHODS Searches of Google Scholar and PubMed databases were conducted in February 2017. We screened article titles and abstracts to identify relevant articles that compare data from MTurk samples in a health and medical context to another sample, expert opinion, or other gold standard. Full-text manuscript reviews were conducted for the 35 articles that met the study criteria. RESULTS The vast majority of the studies supported the use of MTurk for a variety of academic purposes. DISCUSSION The literature overwhelmingly concludes that MTurk is an efficient, reliable, cost-effective tool for generating sample responses that are largely comparable to those collected via more conventional means. Caveats include survey responses may not be generalizable to the US population.
Collapse
|
13
|
Demner-Fushman D, Shooshan SE, Rodriguez L, Aronson AR, Lang F, Rogers W, Roberts K, Tonning J. A dataset of 200 structured product labels annotated for adverse drug reactions. Sci Data 2018; 5:180001. [PMID: 29381145 PMCID: PMC5789866 DOI: 10.1038/sdata.2018.1] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Accepted: 12/13/2017] [Indexed: 11/09/2022] Open
Abstract
Adverse drug reactions (ADRs), unintended and sometimes dangerous effects that a drug may have, are one of the leading causes of morbidity and mortality during medical care. To date, there is no structured machine-readable authoritative source of known ADRs. The United States Food and Drug Administration (FDA) partnered with the National Library of Medicine to create a pilot dataset containing standardised information about known adverse reactions for 200 FDA-approved drugs. The Structured Product Labels (SPLs), the documents FDA uses to exchange information about drugs and other products, were manually annotated for adverse reactions at the mention level to facilitate development and evaluation of text mining tools for extraction of ADRs from all SPLs. The ADRs were then normalised to the Unified Medical Language System (UMLS) and to the Medical Dictionary for Regulatory Activities (MedDRA). We present the curation process and the structure of the publicly available database SPL-ADR-200db containing 5,098 distinct ADRs. The database is available at https://bionlp.nlm.nih.gov/tac2017adversereactions/; the code for preparing and validating the data is available at https://github.com/lhncbc/fda-ars.
Collapse
Affiliation(s)
- Dina Demner-Fushman
- U.S. National Library of Medicine, NIH, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Sonya E Shooshan
- U.S. National Library of Medicine, NIH, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Laritza Rodriguez
- U.S. National Library of Medicine, NIH, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Alan R Aronson
- U.S. National Library of Medicine, NIH, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Francois Lang
- U.S. National Library of Medicine, NIH, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Willie Rogers
- U.S. National Library of Medicine, NIH, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kirk Roberts
- UT Health School of Biomedical Informatics, 7000 Fannin St., Houston, TX 77030, USA
| | - Joseph Tonning
- Office of New Drugs, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, 10001 New Hampshire Ave, Silver Spring, MD 20903, USA
| |
Collapse
|
14
|
Himmelstein DS, Lizee A, Hessler C, Brueggeman L, Chen SL, Hadley D, Green A, Khankhanian P, Baranzini SE. Systematic integration of biomedical knowledge prioritizes drugs for repurposing. eLife 2017; 6:26726. [PMID: 28936969 PMCID: PMC5640425 DOI: 10.7554/elife.26726] [Citation(s) in RCA: 233] [Impact Index Per Article: 33.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2017] [Accepted: 09/11/2017] [Indexed: 12/16/2022] Open
Abstract
The ability to computationally predict whether a compound treats a disease would improve the economy and success rate of drug approval. This study describes Project Rephetio to systematically model drug efficacy based on 755 existing treatments. First, we constructed Hetionet (neo4j.het.io), an integrative network encoding knowledge from millions of biomedical studies. Hetionet v1.0 consists of 47,031 nodes of 11 types and 2,250,197 relationships of 24 types. Data were integrated from 29 public resources to connect compounds, diseases, genes, anatomies, pathways, biological processes, molecular functions, cellular components, pharmacologic classes, side effects, and symptoms. Next, we identified network patterns that distinguish treatments from non-treatments. Then, we predicted the probability of treatment for 209,168 compound-disease pairs (het.io/repurpose). Our predictions validated on two external sets of treatment and provided pharmacological insights on epilepsy, suggesting they will help prioritize drug repurposing candidates. This study was entirely open and received realtime feedback from 40 community members.
Collapse
Affiliation(s)
- Daniel Scott Himmelstein
- Biological and Medical Informatics Program, University of California, San Francisco, San Francisco, United States.,Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, United States
| | - Antoine Lizee
- Department of Neurology, University of California, San Francisco, San Francisco, United States.,ITUN-CRTI-UMR 1064 Inserm, University of Nantes, Nantes, France
| | - Christine Hessler
- Department of Neurology, University of California, San Francisco, San Francisco, United States
| | - Leo Brueggeman
- Department of Neurology, University of California, San Francisco, San Francisco, United States.,University of Iowa, Iowa City, United States
| | - Sabrina L Chen
- Department of Neurology, University of California, San Francisco, San Francisco, United States.,Johns Hopkins University, Baltimore, United States
| | - Dexter Hadley
- Department of Pediatrics, University of California, San Fransisco, San Fransisco, United States.,Institute for Computational Health Sciences, University of California, San Francisco, San Francisco, United States
| | - Ari Green
- Department of Neurology, University of California, San Francisco, San Francisco, United States
| | - Pouya Khankhanian
- Department of Neurology, University of California, San Francisco, San Francisco, United States.,Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, United States
| | - Sergio E Baranzini
- Biological and Medical Informatics Program, University of California, San Francisco, San Francisco, United States.,Department of Neurology, University of California, San Francisco, San Francisco, United States
| |
Collapse
|
15
|
Jagodnik KM, Koplev S, Jenkins SL, Ohno-Machado L, Paten B, Schurer SC, Dumontier M, Verborgh R, Bui A, Ping P, McKenna NJ, Madduri R, Pillai A, Ma'ayan A. Developing a framework for digital objects in the Big Data to Knowledge (BD2K) commons: Report from the Commons Framework Pilots workshop. J Biomed Inform 2017; 71:49-57. [PMID: 28501646 DOI: 10.1016/j.jbi.2017.05.006] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Revised: 05/01/2017] [Accepted: 05/08/2017] [Indexed: 12/11/2022]
Abstract
The volume and diversity of data in biomedical research have been rapidly increasing in recent years. While such data hold significant promise for accelerating discovery, their use entails many challenges including: the need for adequate computational infrastructure, secure processes for data sharing and access, tools that allow researchers to find and integrate diverse datasets, and standardized methods of analysis. These are just some elements of a complex ecosystem that needs to be built to support the rapid accumulation of these data. The NIH Big Data to Knowledge (BD2K) initiative aims to facilitate digitally enabled biomedical research. Within the BD2K framework, the Commons initiative is intended to establish a virtual environment that will facilitate the use, interoperability, and discoverability of shared digital objects used for research. The BD2K Commons Framework Pilots Working Group (CFPWG) was established to clarify goals and work on pilot projects that address existing gaps toward realizing the vision of the BD2K Commons. This report reviews highlights from a two-day meeting involving the BD2K CFPWG to provide insights on trends and considerations in advancing Big Data science for biomedical research in the United States.
Collapse
Affiliation(s)
- Kathleen M Jagodnik
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1215, New York, NY 10029, USA
| | - Simon Koplev
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1215, New York, NY 10029, USA
| | - Sherry L Jenkins
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1215, New York, NY 10029, USA
| | - Lucila Ohno-Machado
- Health System Department of Biomedical Informatics, University of California San Diego, 9500 Gilman Dr., La Jolla, CA 92083, USA; Health Services Research, San Diego Veterans Administration Health System, San Diego, CA 92083, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High St., Santa Cruz, CA 95060, USA
| | - Stephan C Schurer
- Department of Molecular and Cellular Pharmacology, University of Miami, 331461120 NW 14th Street, CRB 650 (M-857), Miami, FL 33136, USA
| | - Michel Dumontier
- Institute for Data Science, Universiteit Maastricht, Minderbroedersberg 4-6, 6211 LK Maastricht, Netherlands
| | - Ruben Verborgh
- Ghent University - iMinds Research Foundation Flanders, St. Pietersnieuwstraat 33, 9000 Gent, Belgium
| | - Alex Bui
- Department of Radiological Sciences, UCLA School of Medicine, Los Angeles, CA 90095, USA; Department of Bioengineering, UCLA Henri Samueli School of Engineering, Los Angeles, CA 90095, USA
| | - Peipei Ping
- Departments of Physiology, Medicine, and Bioinformatics, UCLA School of Medicine, Los Angeles, CA 90095, USA
| | - Neil J McKenna
- Department of Molecular and Cellular Biology, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 77030, USA
| | - Ravi Madduri
- Department of Mathematics and Computer Science, Argonne National Laboratory, 9700 S. Cass Avenue, Argonne, IL 60439, USA
| | - Ajay Pillai
- Division of Genome Sciences, National Human Genome Research Institute, National Institutes of Health, 31 Center Drive, MSC 2152, 9000 Rockville Pike, Bethesda, MD 20892, USA
| | - Avi Ma'ayan
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1215, New York, NY 10029, USA.
| |
Collapse
|
16
|
Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 2017; 117:7673-7761. [PMID: 28475312 DOI: 10.1021/acs.chemrev.6b00851] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
Collapse
Affiliation(s)
- Martin Krallinger
- Structural Computational Biology Group, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre , C/Melchor Fernández Almagro 3, Madrid E-28029, Spain
| | - Obdulia Rabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Anália Lourenço
- ESEI - Department of Computer Science, University of Vigo , Edificio Politécnico, Campus Universitario As Lagoas s/n, Ourense E-32004, Spain.,Centro de Investigaciones Biomédicas (Centro Singular de Investigación de Galicia) , Campus Universitario Lagoas-Marcosende, Vigo E-36310, Spain.,CEB-Centre of Biological Engineering, University of Minho , Campus de Gualtar, Braga 4710-057, Portugal
| | - Julen Oyarzabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Alfonso Valencia
- Life Science Department, Barcelona Supercomputing Centre (BSC-CNS) , C/Jordi Girona, 29-31, Barcelona E-08034, Spain.,Joint BSC-IRB-CRG Program in Computational Biology, Parc Científic de Barcelona , C/ Baldiri Reixac 10, Barcelona E-08028, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA) , Passeig de Lluís Companys 23, Barcelona E-08010, Spain
| |
Collapse
|
17
|
Cocos A, Qian T, Callison-Burch C, Masino AJ. Crowd control: Effectively utilizing unscreened crowd workers for biomedical data annotation. J Biomed Inform 2017; 69:86-92. [DOI: 10.1016/j.jbi.2017.04.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Revised: 03/27/2017] [Accepted: 04/02/2017] [Indexed: 01/17/2023]
|
18
|
Wang Z, Monteiro CD, Jagodnik KM, Fernandez NF, Gundersen GW, Rouillard AD, Jenkins SL, Feldmann AS, Hu KS, McDermott MG, Duan Q, Clark NR, Jones MR, Kou Y, Goff T, Woodland H, Amaral FMR, Szeto GL, Fuchs O, Schüssler-Fiorenza Rose SM, Sharma S, Schwartz U, Bausela XB, Szymkiewicz M, Maroulis V, Salykin A, Barra CM, Kruth CD, Bongio NJ, Mathur V, Todoric RD, Rubin UE, Malatras A, Fulp CT, Galindo JA, Motiejunaite R, Jüschke C, Dishuck PC, Lahl K, Jafari M, Aibar S, Zaravinos A, Steenhuizen LH, Allison LR, Gamallo P, de Andres Segura F, Dae Devlin T, Pérez-García V, Ma'ayan A. Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd. Nat Commun 2016; 7:12846. [PMID: 27667448 PMCID: PMC5052684 DOI: 10.1038/ncomms12846] [Citation(s) in RCA: 156] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 08/05/2016] [Indexed: 12/14/2022] Open
Abstract
Gene expression data are accumulating exponentially in public repositories. Reanalysis and integration of themed collections from these studies may provide new insights, but requires further human curation. Here we report a crowdsourcing project to annotate and reanalyse a large number of gene expression profiles from Gene Expression Omnibus (GEO). Through a massive open online course on Coursera, over 70 participants from over 25 countries identify and annotate 2,460 single-gene perturbation signatures, 839 disease versus normal signatures, and 906 drug perturbation signatures. All these signatures are unique and are manually validated for quality. Global analysis of these signatures confirms known associations and identifies novel associations between genes, diseases and drugs. The manually curated signatures are used as a training set to develop classifiers for extracting similar signatures from the entire GEO repository. We develop a web portal to serve these signatures for query, download and visualization.
Collapse
Affiliation(s)
- Zichen Wang
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Illuminating the Druggable Genome Knowledge Management Center, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, New York 10029, USA
| | - Caroline D. Monteiro
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Illuminating the Druggable Genome Knowledge Management Center, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, New York 10029, USA
| | - Kathleen M. Jagodnik
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Illuminating the Druggable Genome Knowledge Management Center, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, New York 10029, USA
- Fluid Physics and Transport Processes Branch, NASA Glenn Research Center, 21000 Brookpark Rd, Cleveland, Ohio 44135, USA
- Center for Space Medicine, Baylor College of Medicine, 1 Baylor Plaza, Houston, Texas 77030, USA
| | - Nicolas F. Fernandez
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Illuminating the Druggable Genome Knowledge Management Center, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, New York 10029, USA
| | - Gregory W. Gundersen
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Illuminating the Druggable Genome Knowledge Management Center, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, New York 10029, USA
| | - Andrew D. Rouillard
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Illuminating the Druggable Genome Knowledge Management Center, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, New York 10029, USA
| | - Sherry L. Jenkins
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Illuminating the Druggable Genome Knowledge Management Center, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, New York 10029, USA
| | - Axel S. Feldmann
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Illuminating the Druggable Genome Knowledge Management Center, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, New York 10029, USA
| | - Kevin S. Hu
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Illuminating the Druggable Genome Knowledge Management Center, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, New York 10029, USA
| | - Michael G. McDermott
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Illuminating the Druggable Genome Knowledge Management Center, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, New York 10029, USA
| | - Qiaonan Duan
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Illuminating the Druggable Genome Knowledge Management Center, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, New York 10029, USA
| | - Neil R. Clark
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Illuminating the Druggable Genome Knowledge Management Center, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, New York 10029, USA
| | - Matthew R. Jones
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Illuminating the Druggable Genome Knowledge Management Center, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, New York 10029, USA
| | - Yan Kou
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Illuminating the Druggable Genome Knowledge Management Center, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, New York 10029, USA
| | - Troy Goff
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Illuminating the Druggable Genome Knowledge Management Center, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, New York 10029, USA
| | | | - Fabio M R. Amaral
- School of Biosciences, University of Nottingham, Sutton Bonington Campus, Sutton Bonington, Leicestershire LE12 5RD, UK
| | - Gregory L. Szeto
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Materials Science & Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- The Ragon Institute of MGH, MIT, and Harvard, 400 Technology Square, Cambridge, Massachusetts 02139, USA
| | - Oliver Fuchs
- Paediatric Allergology and Pulmonology, Dr von Hauner University Children's Hospital, Ludwig-Maximilians-University of Munich, Member of the German Centre for Lung Research (DZL), Lindwurmstrasse 4, Munich 80337, Germany
| | - Sophia M. Schüssler-Fiorenza Rose
- Spinal Cord Injury Service, Veteran Affairs Palo Alto Health Care System, Palo Alto, California 94304, USA
- Department of Neurosurgery, Stanford School of Medicine, Stanford, California 94304, USA
| | - Shvetank Sharma
- Department of Research, Institute of Liver & Biliary Sciences, D1, Vasant Kunj, New Delhi 110070, India
| | - Uwe Schwartz
- Department of Biochemistry III, University of Regensburg, Universitätsstrasse 31, Regensburg 93053, Germany
| | - Xabier Bengoetxea Bausela
- Department of Pharmacology and Toxicology, University of Navarra, Pamplona, Irunlarrea 1, Pamplona 31008, Spain
| | - Maciej Szymkiewicz
- Warsaw School of Information Technology under the auspices of the Polish Academy of Sciences, 6 Newelska St, Warsaw 01–447, Poland
| | | | - Anton Salykin
- Department of Biology, Faculty of Medicine, Masaryk University, Brno 625 00, Czech Republic
| | - Carolina M. Barra
- IMIM-Hospital Del Mar, PRBB Barcelona, Dr Aiguader, Barcelona 88.08003, Spain
| | | | - Nicholas J. Bongio
- Department of Biology, Shenandoah University, 1460 University Dr Winchester, Winchester, Virginia 22601, USA
| | | | | | - Udi E. Rubin
- Department of Biological Sciences, 600 Fairchild Center, Mail Code 2402, Columbia University, New York, New York 10032, USA
| | - Apostolos Malatras
- Center for Research in Myology, Sorbonne Universités, UPMC Univ Paris 06, INSERM UMRS975, CNRS FRE3617, 47 Boulevard de l'hôpital, Paris 75013, France
| | - Carl T. Fulp
- 13-1, Higashi 4-chome Shibuya-ku, Tokyo 150-0011, Japan
| | - John A. Galindo
- Department of Biology and Institute of Genetics, Universidad Nacional de Colombia, Bogota, Cr. 30 # 45-08, Colombia
| | - Ruta Motiejunaite
- Center for Interdisciplinary Cardiovascular Sciences, Brigham and Women's Hospital, 3 Blackfan Circle, Boston, Massachusetts 02115, USA
| | - Christoph Jüschke
- Department of Human Genetics, Faculty of Medicine and Health Sciences, University of Oldenburg, Ammerländer Heerstrasse 114-118, Oldenburg 26129, Germany
| | | | - Katharina Lahl
- Technical University of Denmark, National Veterinary Institute, Bülowsvej 27 Building 2-3, Frederiksberg C 1870, Denmark
| | - Mohieddin Jafari
- Protein Chemistry and Proteomics Unit, Biotechnology Research Center, Pasteur Institute of Iran, No. 358, 12th Farwardin Ave, Jomhhoori St, Tehran 13164, Iran
- School of Biological Sciences, Institute for Researches in Fundamental Sciences, Niavaran Square, P.O.Box, Tehran 19395-5746, Iran
| | - Sara Aibar
- University of Salamanca, Salamanca, Madrid 37008, Spain
| | - Apostolos Zaravinos
- Division of Clinical Immunology, Department of Laboratory Medicine, Karolinska Institute, Alfred Nobels Allé 8, level 7, Stockholm SE141 86, Sweden
- Department of Life Sciences, School of Sciences, European University Cyprus, 6 Diogenes Str. Engomi, P.O.Box 22006, Nicosia 1516, Cyprus
| | | | | | | | - Fernando de Andres Segura
- CICAB, Clinical Research Centre, Extremadura University Hospital, Elvas Av., s/n. 06006 Badajoz 06006, Spain
| | | | - Vicente Pérez-García
- Consejo Superior de Investigaciones Científicas, Centro Nacional de Biotecnología, Department of Immunology and Oncology, c/Darwin, 3 Madrid 28049, Spain
| | - Avi Ma'ayan
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Illuminating the Druggable Genome Knowledge Management Center, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, New York 10029, USA
| |
Collapse
|
19
|
Wang Q, S Abdul S, Almeida L, Ananiadou S, Balderas-Martínez YI, Batista-Navarro R, Campos D, Chilton L, Chou HJ, Contreras G, Cooper L, Dai HJ, Ferrell B, Fluck J, Gama-Castro S, George N, Gkoutos G, Irin AK, Jensen LJ, Jimenez S, Jue TR, Keseler I, Madan S, Matos S, McQuilton P, Milacic M, Mort M, Natarajan J, Pafilis E, Pereira E, Rao S, Rinaldi F, Rothfels K, Salgado D, Silva RM, Singh O, Stefancsik R, Su CH, Subramani S, Tadepally HD, Tsaprouni L, Vasilevsky N, Wang X, Chatr-Aryamontri A, Laulederkind SJF, Matis-Mitchell S, McEntyre J, Orchard S, Pundir S, Rodriguez-Esteban R, Van Auken K, Lu Z, Schaeffer M, Wu CH, Hirschman L, Arighi CN. Overview of the interactive task in BioCreative V. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw119. [PMID: 27589961 PMCID: PMC5009325 DOI: 10.1093/database/baw119] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Accepted: 07/28/2016] [Indexed: 11/14/2022]
Abstract
Fully automated text mining (TM) systems promote efficient literature searching, retrieval, and review but are not sufficient to produce ready-to-consume curated documents. These systems are not meant to replace biocurators, but instead to assist them in one or more literature curation steps. To do so, the user interface is an important aspect that needs to be considered for tool adoption. The BioCreative Interactive task (IAT) is a track designed for exploring user-system interactions, promoting development of useful TM tools, and providing a communication channel between the biocuration and the TM communities. In BioCreative V, the IAT track followed a format similar to previous interactive tracks, where the utility and usability of TM tools, as well as the generation of use cases, have been the focal points. The proposed curation tasks are user-centric and formally evaluated by biocurators. In BioCreative V IAT, seven TM systems and 43 biocurators participated. Two levels of user participation were offered to broaden curator involvement and obtain more feedback on usability aspects. The full level participation involved training on the system, curation of a set of documents with and without TM assistance, tracking of time-on-task, and completion of a user survey. The partial level participation was designed to focus on usability aspects of the interface and not the performance per se. In this case, biocurators navigated the system by performing pre-designed tasks and then were asked whether they were able to achieve the task and the level of difficulty in completing the task. In this manuscript, we describe the development of the interactive task, from planning to execution and discuss major findings for the systems tested. Database URL:http://www.biocreative.org
Collapse
Affiliation(s)
- Qinghua Wang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19711, USA Department of Computer and Information Sciences, University of Delaware, Newark, DE, 19711, USA
| | - Shabbir S Abdul
- International Centre of Health Information Technology, Taipei Medical University, Taipei, Taiwan
| | - Lara Almeida
- DETI/IEETA, University of Aveiro, Campus Universitário de Santiago, Aveiro 3810-193, Portugal
| | - Sophia Ananiadou
- National Centre for Text Mining, University of Manchester, Manchester, UK
| | | | | | | | - Lucy Chilton
- Northern Institute for Cancer Research, Newcastle University, New Castle, UK
| | - Hui-Jou Chou
- Rutgers University-Camden, Camden, NJ 08102, USA
| | - Gabriela Contreras
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, 04510 Ciudad de México, México
| | - Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University Corvallis, OR 97331, USA
| | - Hong-Jie Dai
- Department of Computer Science and Information Engineering, National Taitung University, Taitung, Taiwan
| | - Barbra Ferrell
- College of Agriculture and Natural Resources, University of Delaware, Newark, DE 19711, USA
| | - Juliane Fluck
- Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, 53754 St. Augustin, Germany
| | - Socorro Gama-Castro
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, 04510 Ciudad de México, México
| | | | - Georgios Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham B15 2TT, UK Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, Birmingham B15 2TT, UK
| | - Afroza K Irin
- Life Science Informatics, University of Bonn, Bonn, Germany
| | - Lars J Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Silvia Jimenez
- Blue Brain Project, École Polytechnique Fédérale de Lausanne (EPFL) Biotech Campus, Geneva, Switzerland
| | - Toni R Jue
- Prince of Wales Clinical School, University of New South Wales NSW, Sydney, New South Wales, Australia
| | | | - Sumit Madan
- Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, 53754 St. Augustin, Germany
| | - Sérgio Matos
- DETI/IEETA, University of Aveiro, Campus Universitário de Santiago, Aveiro 3810-193, Portugal
| | | | - Marija Milacic
- Department of Informatics and Bio-Computing, Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Matthew Mort
- HGMD, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff, UK
| | - Jeyakumar Natarajan
- Department of Bioinformatics, Bharathiar University, Coimbatore, Tamil Nadu, India
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Emiliano Pereira
- Microbial Genomics and Bioinformatics Group, Max Planck Institute for Marine Microbiology, Bremen, Germany
| | - Shruti Rao
- Innovation Center for Biomedical Informatics (ICBI), Georgetown University, Washington, DC 20007, USA
| | - Fabio Rinaldi
- Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland
| | - Karen Rothfels
- Department of Informatics and Bio-Computing, Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - David Salgado
- GMGF, Aix-Marseille Universite, 13385 Marseille, France Inserm, UMR_S 910, 13385 Marseille, France
| | - Raquel M Silva
- Department of Medical Sciences, iBiMED & IEETA, University of Aveiro, 3810-193 Aveiro, Portugal
| | - Onkar Singh
- Taipei Medical University Graduate Institute of Biomedical informatics, Taipei, Taiwan
| | | | - Chu-Hsien Su
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Suresh Subramani
- Department of Bioinformatics, Bharathiar University, Coimbatore, Tamil Nadu, India
| | | | - Loukia Tsaprouni
- Institute of Sport and Physical Activity Research (ISPAR), University of Bedfordshire, Bedford, UK
| | - Nicole Vasilevsky
- Ontology Development Group, Oregon Health & Science University, Portland, OR 97239, USA
| | - Xiaodong Wang
- WormBase Consortium, Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | | | | | | | | | - Sandra Orchard
- European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Sangya Pundir
- European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | - Kimberly Van Auken
- WormBase Consortium, Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Institutes of Health, Bethesda, MD 20894, USA
| | - Mary Schaeffer
- MaizeGDB USDA ARS and University of Missouri, Columbia, MO 65211, USA
| | - Cathy H Wu
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19711, USA Department of Computer and Information Sciences, University of Delaware, Newark, DE, 19711, USA
| | | | - Cecilia N Arighi
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19711, USA Department of Computer and Information Sciences, University of Delaware, Newark, DE, 19711, USA
| |
Collapse
|
20
|
Hirschman L, Fort K, Boué S, Kyrpides N, Islamaj Doğan R, Cohen KB. Crowdsourcing and curation: perspectives from biology and natural language processing. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw115. [PMID: 27504010 PMCID: PMC4976298 DOI: 10.1093/database/baw115] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Accepted: 07/11/2016] [Indexed: 12/27/2022]
Abstract
Crowdsourcing is increasingly utilized for performing tasks in both natural language processing and biocuration. Although there have been many applications of crowdsourcing in these fields, there have been fewer high-level discussions of the methodology and its applicability to biocuration. This paper explores crowdsourcing for biocuration through several case studies that highlight different ways of leveraging 'the crowd'; these raise issues about the kind(s) of expertise needed, the motivations of participants, and questions related to feasibility, cost and quality. The paper is an outgrowth of a panel session held at BioCreative V (Seville, September 9-11, 2015). The session consisted of four short talks, followed by a discussion. In their talks, the panelists explored the role of expertise and the potential to improve crowd performance by training; the challenge of decomposing tasks to make them amenable to crowdsourcing; and the capture of biological data and metadata through community editing.Database URL: http://www.mitre.org/publications/technical-papers/crowdsourcing-and-curation-perspectives.
Collapse
Affiliation(s)
| | - Karën Fort
- University of Paris-Sorbonne/STIH Team, Paris, France
| | - Stéphanie Boué
- Philip Morris International R&D, Philip Morris Products S.A., Neuchâtel, Switzerland
| | | | - Rezarta Islamaj Doğan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | |
Collapse
|
21
|
Carrell DS, Cronkite DJ, Malin BA, Aberdeen JS, Hirschman L. Is the Juice Worth the Squeeze? Costs and Benefits of Multiple Human Annotators for Clinical Text De-identification. Methods Inf Med 2016; 55:356-64. [PMID: 27405787 DOI: 10.3414/me15-01-0122] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Accepted: 04/18/2016] [Indexed: 11/09/2022]
Abstract
BACKGROUND Clinical text contains valuable information but must be de-identified before it can be used for secondary purposes. Accurate annotation of personally identifiable information (PII) is essential to the development of automated de-identification systems and to manual redaction of PII. Yet the accuracy of annotations may vary considerably across individual annotators and annotation is costly. As such, the marginal benefit of incorporating additional annotators has not been well characterized. OBJECTIVES This study models the costs and benefits of incorporating increasing numbers of independent human annotators to identify the instances of PII in a corpus. We used a corpus with gold standard annotations to evaluate the performance of teams of annotators of increasing size. METHODS Four annotators independently identified PII in a 100-document corpus consisting of randomly selected clinical notes from Family Practice clinics in a large integrated health care system. These annotations were pooled and validated to generate a gold standard corpus for evaluation. RESULTS Recall rates for all PII types ranged from 0.90 to 0.98 for individual annotators to 0.998 to 1.0 for teams of three, when meas-ured against the gold standard. Median cost per PII instance discovered during corpus annotation ranged from $ 0.71 for an individual annotator to $ 377 for annotations discovered only by a fourth annotator. CONCLUSIONS Incorporating a second annotator into a PII annotation process reduces unredacted PII and improves the quality of annotations to 0.99 recall, yielding clear benefit at reasonable cost; the cost advantages of annotation teams larger than two diminish rapidly.
Collapse
Affiliation(s)
- David S Carrell
- David S. Carrell, PhD, Group Health Research Institute, 1730 Minor Ave, Suite 1600, Seattle, WA 98101, USA, E-mail:
| | | | | | | | | |
Collapse
|
22
|
Li TS, Bravo À, Furlong LI, Good BM, Su AI. A crowdsourcing workflow for extracting chemical-induced disease relations from free text. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw051. [PMID: 27087308 PMCID: PMC4834205 DOI: 10.1093/database/baw051] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2015] [Accepted: 03/17/2016] [Indexed: 01/05/2023]
Abstract
Relations between chemicals and diseases are one of the most queried biomedical interactions. Although expert manual curation is the standard method for extracting these relations from the literature, it is expensive and impractical to apply to large numbers of documents, and therefore alternative methods are required. We describe here a crowdsourcing workflow for extracting chemical-induced disease relations from free text as part of the BioCreative V Chemical Disease Relation challenge. Five non-expert workers on the CrowdFlower platform were shown each potential chemical-induced disease relation highlighted in the original source text and asked to make binary judgments about whether the text supported the relation. Worker responses were aggregated through voting, and relations receiving four or more votes were predicted as true. On the official evaluation dataset of 500 PubMed abstracts, the crowd attained a 0.505 F-score (0.475 precision, 0.540 recall), with a maximum theoretical recall of 0.751 due to errors with named entity recognition. The total crowdsourcing cost was $1290.67 ($2.58 per abstract) and took a total of 7 h. A qualitative error analysis revealed that 46.66% of sampled errors were due to task limitations and gold standard errors, indicating that performance can still be improved. All code and results are publicly available at https://github.com/SuLab/crowd_cid_relex Database URL: https://github.com/SuLab/crowd_cid_relex
Collapse
Affiliation(s)
- Tong Shu Li
- Department of Molecular and Experimental Medicine, the Scripps Research Institute, La Jolla, CA 92037, USA
| | - Àlex Bravo
- Research Programme on Biomedical Informatics (GRIB), IMIM, UPF, Barcelona, Spain
| | - Laura I Furlong
- Research Programme on Biomedical Informatics (GRIB), IMIM, UPF, Barcelona, Spain
| | - Benjamin M Good
- Department of Molecular and Experimental Medicine, the Scripps Research Institute, La Jolla, CA 92037, USA
| | - Andrew I Su
- Department of Molecular and Experimental Medicine, the Scripps Research Institute, La Jolla, CA 92037, USA
| |
Collapse
|
23
|
Hochheiser H, Ning Y, Hernandez A, Horn JR, Jacobson R, Boyce RD. Using Nonexperts for Annotating Pharmacokinetic Drug-Drug Interaction Mentions in Product Labeling: A Feasibility Study. JMIR Res Protoc 2016; 5:e40. [PMID: 27066806 PMCID: PMC4844909 DOI: 10.2196/resprot.5028] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Revised: 11/25/2015] [Accepted: 12/19/2015] [Indexed: 11/27/2022] Open
Abstract
BACKGROUND Because vital details of potential pharmacokinetic drug-drug interactions are often described in free-text structured product labels, manual curation is a necessary but expensive step in the development of electronic drug-drug interaction information resources. The use of nonexperts to annotate potential drug-drug interaction (PDDI) mentions in drug product label annotation may be a means of lessening the burden of manual curation. OBJECTIVE Our goal was to explore the practicality of using nonexpert participants to annotate drug-drug interaction descriptions from structured product labels. By presenting annotation tasks to both pharmacy experts and relatively naïve participants, we hoped to demonstrate the feasibility of using nonexpert annotators for drug-drug information annotation. We were also interested in exploring whether and to what extent natural language processing (NLP) preannotation helped improve task completion time, accuracy, and subjective satisfaction. METHODS Two experts and 4 nonexperts were asked to annotate 208 structured product label sections under 4 conditions completed sequentially: (1) no NLP assistance, (2) preannotation of drug mentions, (3) preannotation of drug mentions and PDDIs, and (4) a repeat of the no-annotation condition. Results were evaluated within the 2 groups and relative to an existing gold standard. Participants were asked to provide reports on the time required to complete tasks and their perceptions of task difficulty. RESULTS One of the experts and 3 of the nonexperts completed all tasks. Annotation results from the nonexpert group were relatively strong in every scenario and better than the performance of the NLP pipeline. The expert and 2 of the nonexperts were able to complete most tasks in less than 3 hours. Usability perceptions were generally positive (3.67 for expert, mean of 3.33 for nonexperts). CONCLUSIONS The results suggest that nonexpert annotation might be a feasible option for comprehensive labeling of annotated PDDIs across a broader range of drug product labels. Preannotation of drug mentions may ease the annotation task. However, preannotation of PDDIs, as operationalized in this study, presented the participants with difficulties. Future work should test if these issues can be addressed by the use of better performing NLP and a different approach to presenting the PDDI preannotations to users during the annotation workflow.
Collapse
Affiliation(s)
- Harry Hochheiser
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States.
| | | | | | | | | | | |
Collapse
|
24
|
Rodriguez-Esteban R. Biocuration with insufficient resources and fixed timelines. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav116. [PMID: 26708987 PMCID: PMC4691339 DOI: 10.1093/database/bav116] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 11/17/2015] [Indexed: 11/14/2022]
Abstract
Biological curation, or biocuration, is often studied from the perspective of creating and maintaining databases that have the goal of mapping and tracking certain areas of biology. However, much biocuration is, in fact, dedicated to finite and time-limited projects in which insufficient resources demand trade-offs. This typically more ephemeral type of curation is nonetheless of importance in biomedical research. Here, I propose a framework to understand such restricted curation projects from the point of view of return on curation (ROC), value, efficiency and productivity. Moreover, I suggest general strategies to optimize these curation efforts, such as the ‘multiple strategies’ approach, as well as a metric called overhead that can be used in the context of managing curation resources.
Collapse
Affiliation(s)
- Raul Rodriguez-Esteban
- Roche Pharmaceutical Research and Early Development, pRED Informatics, Roche Innovation Center Basel, Basel 4070, Switzerland
| |
Collapse
|
25
|
Huang CC, Lu Z. Community challenges in biomedical text mining over 10 years: success, failure and the future. Brief Bioinform 2015; 17:132-44. [PMID: 25935162 DOI: 10.1093/bib/bbv024] [Citation(s) in RCA: 97] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Indexed: 11/13/2022] Open
Abstract
One effective way to improve the state of the art is through competitions. Following the success of the Critical Assessment of protein Structure Prediction (CASP) in bioinformatics research, a number of challenge evaluations have been organized by the text-mining research community to assess and advance natural language processing (NLP) research for biomedicine. In this article, we review the different community challenge evaluations held from 2002 to 2014 and their respective tasks. Furthermore, we examine these challenge tasks through their targeted problems in NLP research and biomedical applications, respectively. Next, we describe the general workflow of organizing a Biomedical NLP (BioNLP) challenge and involved stakeholders (task organizers, task data producers, task participants and end users). Finally, we summarize the impact and contributions by taking into account different BioNLP challenges as a whole, followed by a discussion of their limitations and difficulties. We conclude with future trends in BioNLP challenge evaluations.
Collapse
|
26
|
Khare R, Good BM, Leaman R, Su AI, Lu Z. Crowdsourcing in biomedicine: challenges and opportunities. Brief Bioinform 2015; 17:23-32. [PMID: 25888696 DOI: 10.1093/bib/bbv021] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
The use of crowdsourcing to solve important but complex problems in biomedical and clinical sciences is growing and encompasses a wide variety of approaches. The crowd is diverse and includes online marketplace workers, health information seekers, science enthusiasts and domain experts. In this article, we review and highlight recent studies that use crowdsourcing to advance biomedicine. We classify these studies into two broad categories: (i) mining big data generated from a crowd (e.g. search logs) and (ii) active crowdsourcing via specific technical platforms, e.g. labor markets, wikis, scientific games and community challenges. Through describing each study in detail, we demonstrate the applicability of different methods in a variety of domains in biomedical research, including genomics, biocuration and clinical research. Furthermore, we discuss and highlight the strengths and limitations of different crowdsourcing platforms. Finally, we identify important emerging trends, opportunities and remaining challenges for future crowdsourcing research in biomedicine.
Collapse
|
27
|
Li J, Zheng S, Chen B, Butte AJ, Swamidass SJ, Lu Z. A survey of current trends in computational drug repositioning. Brief Bioinform 2015; 17:2-12. [PMID: 25832646 DOI: 10.1093/bib/bbv020] [Citation(s) in RCA: 338] [Impact Index Per Article: 37.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Indexed: 12/26/2022] Open
Abstract
Computational drug repositioning or repurposing is a promising and efficient tool for discovering new uses from existing drugs and holds the great potential for precision medicine in the age of big data. The explosive growth of large-scale genomic and phenotypic data, as well as data of small molecular compounds with granted regulatory approval, is enabling new developments for computational repositioning. To achieve the shortest path toward new drug indications, advanced data processing and analysis strategies are critical for making sense of these heterogeneous molecular measurements. In this review, we show recent advancements in the critical areas of computational drug repositioning from multiple aspects. First, we summarize available data sources and the corresponding computational repositioning strategies. Second, we characterize the commonly used computational techniques. Third, we discuss validation strategies for repositioning studies, including both computational and experimental methods. Finally, we highlight potential opportunities and use-cases, including a few target areas such as cancers. We conclude with a brief discussion of the remaining challenges in computational drug repositioning.
Collapse
|