1
|
Kim S, Yu B, Li Q, Bolton EE. PubChem synonym filtering process using crowdsourcing. J Cheminform 2024; 16:69. [PMID: 38880887 PMCID: PMC11181558 DOI: 10.1186/s13321-024-00868-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 06/09/2024] [Indexed: 06/18/2024] Open
Abstract
PubChem ( https://pubchem.ncbi.nlm.nih.gov ) is a public chemical information resource containing more than 100 million unique chemical structures. One of the most requested tasks in PubChem and other chemical databases is to search chemicals by name (also commonly called a "chemical synonym"). PubChem performs this task by looking up chemical synonym-structure associations provided by individual depositors to PubChem. In addition, these synonyms are used for many purposes, including creating links between chemicals and PubMed articles (using Medical Subject Headings (MeSH) terms). However, these depositor-provided name-structure associations are subject to substantial discrepancies within and between depositors, making it difficult to unambiguously map a chemical name to a specific chemical structure. The present paper describes PubChem's crowdsourcing-based synonym filtering strategy, which resolves inter- and intra-depositor discrepancies in synonym-structure associations as well as in the chemical-MeSH associations. The PubChem synonym filtering process was developed based on the analysis of four crowd-voting strategies, which differ in the consistency threshold value employed (60% vs 70%) and how to resolve intra-depositor discrepancies (a single vote vs. multiple votes per depositor) prior to inter-depositor crowd-voting. The agreement of voting was determined at six levels of chemical equivalency, which considers varying isotopic composition, stereochemistry, and connectivity of chemical structures and their primary components. While all four strategies showed comparable results, Strategy I (one vote per depositor with a 60% consistency threshold) resulted in the most synonyms assigned to a single chemical structure as well as the most synonym-structure associations disambiguated at the six chemical equivalency contexts. Based on the results of this study, Strategy I was implemented in PubChem's filtering process that cleans up synonym-structure associations as well as chemical-MeSH associations. This consistency-based filtering process is designed to look for a consensus in name-structure associations but cannot attest to their correctness. As a result, it can fail to recognize correct name-structure associations (or incorrect ones), for example, when a synonym is provided by only one depositor or when many contributors are incorrect. However, this filtering process is an important starting point for quality control in name-structure associations in large chemical databases like PubChem.
Collapse
Affiliation(s)
- Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Bo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Qingliang Li
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Evan E Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
2
|
Park S, Yoo J, Lee Y, DeGuzman PB, Kang MJ, Dykes PC, Shin SY, Cha WC. Quantifying emergency department nursing workload at the task level using NASA-TLX: An exploratory descriptive study. Int Emerg Nurs 2024; 74:101424. [PMID: 38531213 DOI: 10.1016/j.ienj.2024.101424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 01/20/2024] [Accepted: 02/14/2024] [Indexed: 03/28/2024]
Abstract
BACKGROUND Emergency departments (ED) nurses experience high mental workloads because of unpredictable work environments; however, research evaluating ED nursing workload using a tool incorporating nurses' perception is lacking. Quantify ED nursing subjective workload and explore the impact of work experience on perceived workload. METHODS Thirty-two ED nurses at a tertiary academic hospital in the Republic of Korea were surveyed to assess their subjective workload for ED procedures using the National Aeronautics and Space Administration Task Load Index (NASA-TLX). Nonparametric statistical analysis was performed to describe the data, and linear regression analysis was conducted to estimate the impact of work experience on perceived workload. RESULTS Cardiopulmonary resuscitation (CPR) had the highest median workload, followed by interruption from a patient and their family members. Although inexperienced nurses perceived the 'special care' procedures (CPR and defibrillation) as more challenging compared with other categories, analysis revealed that nurses with more than 107 months of experience reported a significantly higher workload than those with less than 36 months of experience. CONCLUSION Addressing interruptions and customizing training can alleviate ED nursing workload. Quantified perceived workload is useful for identifying acceptable thresholds to maintain optimal workload, which ultimately contributes to predicting nursing staffing needs and ED crowding.
Collapse
Affiliation(s)
- Sookyung Park
- School of Nursing, University of Virginia, 225 Jeanette Lancaster Way, Charlottesville, VA 22903-3388, USA
| | - Junsang Yoo
- Department of Digital Health, Samsung Advanced Institute for Health Science & Technology (SAIHST), Sungkyunkwan University, 115 Irwon-ro Gangnam-gu, Seoul 06355, Republic of Korea
| | - Yerim Lee
- Department of Digital Health, Samsung Advanced Institute for Health Science & Technology (SAIHST), Sungkyunkwan University, 115 Irwon-ro Gangnam-gu, Seoul 06355, Republic of Korea
| | - Pamela Baker DeGuzman
- School of Nursing, University of Virginia, 225 Jeanette Lancaster Way, Charlottesville, VA 22903-3388, USA
| | - Min-Jeoung Kang
- Harvard Medical School, 25 Shattuck Street, Boston MA 02115, MA, USA; Department of Medicine, Division of General Internal Medicine and Primay Care, Brigham and Women's Hospital, 1620 Tremont Street, MA, USA
| | - Patricia C Dykes
- Harvard Medical School, 25 Shattuck Street, Boston MA 02115, MA, USA; Department of Medicine, Division of General Internal Medicine and Primay Care, Brigham and Women's Hospital, 1620 Tremont Street, MA, USA
| | - So Yeon Shin
- Department of Nursing, Samsung Medical Center, 81 Irwon-ro Gangnam-gu, Seoul 06351, Republic of Korea
| | - Won Chul Cha
- Department of Digital Health, Samsung Advanced Institute for Health Science & Technology (SAIHST), Sungkyunkwan University, 115 Irwon-ro Gangnam-gu, Seoul 06355, Republic of Korea; Department of Emergency Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, 115 Irwon-ro Gangnam-gu, Seoul 06355, Republic of Korea; Digital Innovation Center, Samsung Medical Center, 81 Irwon-ro Gangnam-gu, Seoul 06351, Republic of Korea.
| |
Collapse
|
3
|
Anderson BR, Herman PM, Hays RD. Predictors of Pain Management Strategies in Adults with Low-Back Pain: A Secondary Analysis of Amazon Mechanical Turk Survey Data. JOURNAL OF INTEGRATIVE AND COMPLEMENTARY MEDICINE 2024; 30:297-305. [PMID: 37646759 PMCID: PMC10954603 DOI: 10.1089/jicm.2023.0233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Objective: To evaluate the associations between baseline demographics, health conditions, pain management strategies, and health-related quality-of-life (HRQoL) measures with pain management strategies at 3-month follow-up in respondents reporting current low-back pain (LBP). Study design: Cohort study of survey data collected from adults with LBP sampled from Amazon Mechanical Turk crowdsourcing panel. Methods: Demographics, health conditions, and the Patient-Reported Outcomes Measurement Information System (PROMIS)-10 were included in the baseline survey. Respondents reporting LBP completed a more comprehensive survey inquiring about pain management strategies and several HRQoL measures. Bivariate then multivariate logistic regression estimated odds ratios (ORs) with 95% confidence intervals (CIs) for the association between baseline characteristics and pain management utilization at 3-month follow-up. Model fit statistics were evaluated to assess the predictive value. Results: The final cohort included 717 respondents with completed surveys. The most prevalent pain management strategy at follow-up was other care (n = 474), followed by no care (n = 94), conservative care only (n = 76), medical care only (n = 51), and medical and conservative care combined (n = 22). The conservative care only group had higher (better) mental and physical health PROMIS-10 scores as opposed to the medical care only and combination care groups, which had lower (worse) physical health scores. In multivariate models, estimated ORs (95% CIs) for the association between baseline and follow-up pain management ranged from 4.6 (2.7-7.8) for conservative care only to 16.8 (6.9-40.7) for medical care only. Additional significant baseline predictors included age, income, education, workman's compensation claim, Oswestry Disability Index score, and Global Chronic Pain Scale grade. Conclusions: This study provides important information regarding the association between patient characteristics, HRQoL measures, and LBP-related pain management utilization.
Collapse
Affiliation(s)
- Brian R. Anderson
- Palmer Center for Chiropractic Research, Palmer College of Chiropractic, Davenport, IA, USA
| | - Patricia M. Herman
- RAND Center for Collaborative Research in Complementary and Integrative Health, RAND Corporation, Santa Monica, CA, USA
| | - Ron D. Hays
- Department of Medicine, UCLA Division of General Internal Medicine & Health Services Research, Los Angeles, CA, USA
| |
Collapse
|
4
|
Zhou X, Guo S, Wu H. Research on the doctors' win in crowdsourcing competitions: perspectives on service content and competitive environment. BMC Med Inform Decis Mak 2023; 23:204. [PMID: 37798708 PMCID: PMC10557239 DOI: 10.1186/s12911-023-02309-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 09/27/2023] [Indexed: 10/07/2023] Open
Abstract
Medical crowdsourcing competitions can help patients get more efficient and comprehensive treatment advice than "one-to-one" service, and doctors should be encouraged to actively participate. In the crowdsourcing competitions, winning the crowdsourcing competition is the driving force for doctors to continue to participate in the service. Therefore, how to improve the winning probability needs to be revealed. From the service content and competitive environment perspectives, this study introduces doctor competence indicators to investigate the key influence factors of doctors' wins on the online platform. The results show that the emotional interaction in doctors' service content positively influences doctors' wins. However, the influence of information interaction presents heterogeneity. Conclusive information helps doctors win, while suggestive information negatively affects them. For the competitive environment, the competitive environment negatively moderates the relationship between doctors' service content and doctors' wins. The results of this study provide important contributions to the research on crowdsourcing competitions and online healthcare services and guide the participants of the competition, including patients, doctors, and platforms.
Collapse
Affiliation(s)
- Xiuxiu Zhou
- Department of Psychiatry, Wuhan Mental Health Center, Wuhan, China
| | - Shanshan Guo
- School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Hong Wu
- School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China.
| |
Collapse
|
5
|
Carusi A, Filipovska J, Wittwehr C, Clerbaux LA. CIAO: a living experiment in interdisciplinary large-scale collaboration facilitated by the Adverse Outcome Pathway framework. Front Public Health 2023; 11:1212544. [PMID: 37637826 PMCID: PMC10449328 DOI: 10.3389/fpubh.2023.1212544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 07/12/2023] [Indexed: 08/29/2023] Open
Abstract
Introduction The CIAO project was launched in Spring 2020 to address the need to make sense of the numerous and disparate data available on COVID-19 pathogenesis. Based on a crowdsourcing model of large-scale collaboration, the project has exploited the Adverse Outcome Pathway (AOP) knowledge management framework built to support chemical risk assessment driven by mechanistic understanding of the biological perturbations at the different organizational levels. Hence the AOPs might have real potential to integrate data produced through different approaches and from different disciplines as experienced in the context of COVID-19. In this study, we aim to address the effectiveness of the AOP framework (i) in supporting an interdisciplinary collaboration for a viral disease and (ii) in working as the conceptual mediator of a crowdsourcing model of collaboration. Methods We used a survey disseminated among the CIAO participants, a workshop open to all interested CIAO contributors, a series of interviews with some participants and a self-reflection on the processes. Results The project has supported genuine interdisciplinarity with exchange of knowledge. The framework provided a common reference point for discussion and collaboration. The diagram used in the AOPs assisted with making explicit what are the different perspectives brought to the knowledge about the pathways. The AOP-Wiki showed up many aspects about its usability for those not already in the world of AOPs. Meanwhile their use in CIAO highlighted needed adaptations. Introduction of new Wiki elements for modulating factors was potentially the most disruptive one. Regarding how well AOPs support a crowdsourcing model of large-scale collaboration, the CIAO project showed that this is successful when there is a strong central organizational impetus and when clarity about the terms of the collaboration is brought as early as possible. Discussion Extrapolate the successful CIAO approach and related processes to other areas of science where the AOP could foster interdisciplinary and systematic organization of the knowledge is an exciting perspective.
Collapse
Affiliation(s)
| | | | - Clemens Wittwehr
- European Commission, Joint Research Centre (JRC), Joint Research Centre, Ispra, Italy
| | - Laure-Alix Clerbaux
- European Commission, Joint Research Centre (JRC), Joint Research Centre, Ispra, Italy
| |
Collapse
|
6
|
Mondal H, Parvanov ED, Singla RK, Rayan RA, Nawaz FA, Ritschl V, Eibensteiner F, Siva Sai C, Cenanovic M, Devkota HP, Hribersek M, De R, Klager E, Kletecka-Pulker M, Völkl-Kernstock S, Khalid GM, Lordan R, Găman MA, Shen B, Stamm T, Willschke H, Atanasov AG. Twitter-based crowdsourcing: What kind of measures can help to end the COVID-19 pandemic faster? Front Med (Lausanne) 2022; 9:961360. [PMID: 36186802 PMCID: PMC9523003 DOI: 10.3389/fmed.2022.961360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Accepted: 08/24/2022] [Indexed: 11/13/2022] Open
Abstract
Background Crowdsourcing is a low-cost, adaptable, and innovative method to collect ideas from numerous contributors with diverse backgrounds. Crowdsourcing from social media like Twitter can be used for generating ideas in a noticeably brief time based on contributions from globally distributed users. The world has been challenged by the COVID-19 pandemic in the last several years. Measures to combat the pandemic continue to evolve worldwide, and ideas and opinions on optimal counteraction strategies are of high interest. Objective This study aimed to validate the use of Twitter as a crowdsourcing platform in order to gain an understanding of public opinion on what measures can help to end the COVID-19 pandemic faster. Methods This cross-sectional study was conducted during the period from December 22, 2021, to February 4, 2022. Tweets were posted by accounts operated by the authors, asking “How to faster end the COVID-19 pandemic?” and encouraging the viewers to comment on measures that they perceive would be effective to achieve this goal. The ideas from the users' comments were collected and categorized into two major themes – personal and institutional measures. In the final stage of the campaign, a Twitter poll was conducted to get additional comments and to estimate which of the two groups of measures were perceived to be important amongst Twitter users. Results The crowdsourcing campaign generated seventeen suggested measures categorized into two major themes (personal and institutional) that received a total of 1,727 endorsements (supporting comments, retweets, and likes). The poll received a total of 325 votes with 58% of votes underscoring the importance of both personal and institutional measures, 20% favoring personal measures, 11% favoring institutional measures, and 11% of the votes given just out of curiosity to see the vote results. Conclusions Twitter was utilized successfully for crowdsourcing ideas on strategies how to end the COVID-19 pandemic faster. The results indicate that the Twitter community highly values the significance of both personal responsibility and institutional measures to counteract the pandemic. This study validates the use of Twitter as a primary tool that could be used for crowdsourcing ideas with healthcare significance.
Collapse
Affiliation(s)
- Himel Mondal
- Saheed Laxman Nayak Medical College and Hospital, Koraput, Odisha, India
| | - Emil D. Parvanov
- Ludwig Boltzmann Institute for Digital Health and Patient Safety, Medical University of Vienna, Vienna, Austria
- Department of Translational Stem Cell Biology, Research Institute of the Medical University of Varna, Varna, Bulgaria
| | - Rajeev K. Singla
- Institutes for Systems Genetics, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
- School of Pharmaceutical Sciences, Lovely Professional University, Phagwara, India
- Rajeev K. Singla ;
| | - Rehab A. Rayan
- Department of Epidemiology, High Institute of Public Health, Alexandria University, Alexandria, Egypt
| | - Faisal A. Nawaz
- College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, United Arab Emirates
| | - Valentin Ritschl
- Section for Outcomes Research, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Vienna, Austria
- Ludwig Boltzmann Institute for Arthritis and Rehabilitation, Vienna, Austria
| | - Fabian Eibensteiner
- Ludwig Boltzmann Institute for Digital Health and Patient Safety, Medical University of Vienna, Vienna, Austria
- Division of Pediatric Nephrology and Gastroenterology, Department of Pediatrics and Adolescent Medicine, Comprehensive Center for Pediatrics, Medical University of Vienna, Vienna, Austria
| | - Chandragiri Siva Sai
- Amity Institute of Pharmacy, Amity University, Lucknow Campus, Lucknow, Uttar Pradesh, India
| | | | - Hari Prasad Devkota
- Graduate School of Pharmaceutical Sciences, Kumamoto University, Kumamoto, Japan
- Headquarters for Admissions and Education, Kumamoto University, Kumamoto, Japan
| | - Mojca Hribersek
- Ludwig Boltzmann Institute for Digital Health and Patient Safety, Medical University of Vienna, Vienna, Austria
| | - Ronita De
- ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Elisabeth Klager
- Ludwig Boltzmann Institute for Digital Health and Patient Safety, Medical University of Vienna, Vienna, Austria
| | - Maria Kletecka-Pulker
- Ludwig Boltzmann Institute for Digital Health and Patient Safety, Medical University of Vienna, Vienna, Austria
- Institute for Ethics and Law in Medicine, University of Vienna, Vienna, Austria
| | - Sabine Völkl-Kernstock
- Ludwig Boltzmann Institute for Digital Health and Patient Safety, Medical University of Vienna, Vienna, Austria
- Department of Child and Adolescent Psychiatry, Medical University Vienna, Vienna, Austria
| | - Garba M. Khalid
- Pharmaceutical Engineering Group, School of Pharmacy, Queen's University, Belfast, United Kingdom
| | - Ronan Lordan
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Mihnea-Alexandru Găman
- Faculty of Medicine, “Carol Davila” University of Medicine and Pharmacy, Bucharest, Romania
- Department of Hematology, Center of Hematology and Bone Marrow Transplantation, Fundeni Clinical Institute, Bucharest, Romania
| | - Bairong Shen
- Institutes for Systems Genetics, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
| | - Tanja Stamm
- Section for Outcomes Research, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Vienna, Austria
- Ludwig Boltzmann Institute for Arthritis and Rehabilitation, Vienna, Austria
| | - Harald Willschke
- Ludwig Boltzmann Institute for Digital Health and Patient Safety, Medical University of Vienna, Vienna, Austria
- Department of Anaesthesia, Intensive Care Medicine and Pain Medicine, Medical University Vienna, Vienna, Austria
| | - Atanas G. Atanasov
- Ludwig Boltzmann Institute for Digital Health and Patient Safety, Medical University of Vienna, Vienna, Austria
- Institute of Genetics and Animal Biotechnology of the Polish Academy of Sciences, Jastrzẹbiec, Poland
- *Correspondence: Atanas G. Atanasov
| |
Collapse
|
7
|
Ren Z, Chang Y, Bartl-Pokorny KD, Pokorny FB, Schuller BW. The Acoustic Dissection of Cough: Diving Into Machine Listening-based COVID-19 Analysis and Detection. J Voice 2022:S0892-1997(22)00166-7. [PMID: 35835648 PMCID: PMC9197794 DOI: 10.1016/j.jvoice.2022.06.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/25/2022] [Accepted: 06/09/2022] [Indexed: 12/05/2022]
Abstract
OBJECTIVES The coronavirus disease 2019 (COVID-19) has caused a crisis worldwide. Amounts of efforts have been made to prevent and control COVID-19's transmission, from early screenings to vaccinations and treatments. Recently, due to the spring up of many automatic disease recognition applications based on machine listening techniques, it would be fast and cheap to detect COVID-19 from recordings of cough, a key symptom of COVID-19. To date, knowledge of the acoustic characteristics of COVID-19 cough sounds is limited but would be essential for structuring effective and robust machine learning models. The present study aims to explore acoustic features for distinguishing COVID-19 positive individuals from COVID-19 negative ones based on their cough sounds. METHODS By applying conventional inferential statistics, we analyze the acoustic correlates of COVID-19 cough sounds based on the ComParE feature set, i.e., a standardized set of 6,373 acoustic higher-level features. Furthermore, we train automatic COVID-19 detection models with machine learning methods and explore the latent features by evaluating the contribution of all features to the COVID-19 status predictions. RESULTS The experimental results demonstrate that a set of acoustic parameters of cough sounds, e.g., statistical functionals of the root mean square energy and Mel-frequency cepstral coefficients, bear essential acoustic information in terms of effect sizes for the differentiation between COVID-19 positive and COVID-19 negative cough samples. Our general automatic COVID-19 detection model performs significantly above chance level, i.e., at an unweighted average recall (UAR) of 0.632, on a data set consisting of 1,411 cough samples (COVID-19 positive/negative: 210/1,201). CONCLUSIONS Based on the acoustic correlates analysis on the ComParE feature set and the feature analysis in the effective COVID-19 detection approach, we find that several acoustic features that show higher effects in conventional group difference testing are also higher weighted in the machine learning models.
Collapse
Affiliation(s)
- Zhao Ren
- EIHW - Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany; L3S Research Center, Hannover, Germany.
| | - Yi Chang
- GLAM - Group on Language, Audio, & Music, Imperial College London, London, United Kingdom
| | - Katrin D Bartl-Pokorny
- EIHW - Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany; Division of Phoniatrics, Medical University of Graz, Graz, Austria; Division of Physiology, Medical University of Graz, Graz, Austria.
| | - Florian B Pokorny
- EIHW - Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany; Division of Phoniatrics, Medical University of Graz, Graz, Austria; Division of Physiology, Medical University of Graz, Graz, Austria
| | - Björn W Schuller
- EIHW - Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany; GLAM - Group on Language, Audio, & Music, Imperial College London, London, United Kingdom
| |
Collapse
|
8
|
Hilton LG, Coulter ID, Ryan GW, Hays RD. Comparing the Recruitment of Research Participants With Chronic Low Back Pain Using Amazon Mechanical Turk With the Recruitment of Patients From Chiropractic Clinics: A Quasi-Experimental Study. J Manipulative Physiol Ther 2021; 44:601-611. [PMID: 35728997 PMCID: PMC11238473 DOI: 10.1016/j.jmpt.2022.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 02/21/2022] [Accepted: 02/22/2022] [Indexed: 11/29/2022]
Abstract
OBJECTIVE The purpose of this study was to compare the crowdsourcing platform Amazon Mechanical Turk (MTurk) with in-person recruitment and web-based surveys as a method to (1) recruit study participants and (2) obtain low-cost data quickly from chiropractic patients with chronic low back pain in the United States. METHODS In this 2-arm quasi-experimental study, we used in-person clinical sampling and web-based surveys from a separate study (RAND sample, n = 1677, data collected October 2016 to January 2017) compared with MTurk (n = 310, data collected November 2016) as a sampling and data collection tool. We gathered patient-reported health outcomes and other characteristics of adults with chronic low back pain receiving chiropractic care. Parametric and nonparametric tests were run. We assessed statistical and practical differences based on P values and effect sizes, respectively. RESULTS Compared with the RAND sample, the MTurk sample was statistically significantly younger (mean age 35.4 years, SD 9.7 vs 48.9, SD 14.8), made less money (24% vs 17% reported less than $30,000 annual income), and reported worst mental health than the RAND sample. Other differences were that the MTurk sample had more men (37% vs 29%), fewer White patients (87% vs 92%), more Hispanic patients (9% vs 5%), fewer people with a college degree (59% vs 68%), and patients were more likely to be working full time (62% vs 58%). The MTurk sample was more likely to have chronic low back pain (78% vs 66%) that differed in pain frequency and duration. The MTurk sample had less disability and better global health scores. In terms of efficiency, the surveys cost $2.50 per participant in incentives for the MTurk sample. Survey development took 2 weeks and data collection took 1 month. CONCLUSION Our results suggest that there may be differences between crowdsourcing and a clinic-based sample. These differences range from small to medium on demographics and self-reported health. The low incentive costs and rapid data collection of MTurk makes it an economically viable method of collecting data from chiropractic patients with low back pain. Further research is needed to explore the utility of MTurk for recruiting clinical samples, such as comparisons to nationally representative samples.
Collapse
Affiliation(s)
- Lara G Hilton
- University of Southern California, Los Angeles, California
| | - Ian D Coulter
- RAND Corporation, Health Division, Los Angeles, California.
| | - Gery W Ryan
- Kaiser Permanente Tyson Medical School, Los Angeles, California
| | - Ron D Hays
- Department of Medicine, University of California Los Angeles, Los Angeles, California
| |
Collapse
|
9
|
Peña-Chilet M, Roldán G, Perez-Florido J, Ortuño FM, Carmona R, Aquino V, Lopez-Lopez D, Loucera C, Fernandez-Rueda JL, Gallego A, García-Garcia F, González-Neira A, Pita G, Núñez-Torres R, Santoyo-López J, Ayuso C, Minguez P, Avila-Fernandez A, Corton M, Moreno-Pelayo MÁ, Morin M, Gallego-Martinez A, Lopez-Escamez JA, Borrego S, Antiñolo G, Amigo J, Salgado-Garrido J, Pasalodos-Sanchez S, Morte B, Carracedo Á, Alonso Á, Dopazo J. CSVS, a crowdsourcing database of the Spanish population genetic variability. Nucleic Acids Res 2021; 49:D1130-D1137. [PMID: 32990755 PMCID: PMC7778906 DOI: 10.1093/nar/gkaa794] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 09/08/2020] [Accepted: 09/10/2020] [Indexed: 01/01/2023] Open
Abstract
The knowledge of the genetic variability of the local population is of utmost importance in personalized medicine and has been revealed as a critical factor for the discovery of new disease variants. Here, we present the Collaborative Spanish Variability Server (CSVS), which currently contains more than 2000 genomes and exomes of unrelated Spanish individuals. This database has been generated in a collaborative crowdsourcing effort collecting sequencing data produced by local genomic projects and for other purposes. Sequences have been grouped by ICD10 upper categories. A web interface allows querying the database removing one or more ICD10 categories. In this way, aggregated counts of allele frequencies of the pseudo-control Spanish population can be obtained for diseases belonging to the category removed. Interestingly, in addition to pseudo-control studies, some population studies can be made, as, for example, prevalence of pharmacogenomic variants, etc. In addition, this genomic data has been used to define the first Spanish Genome Reference Panel (SGRP1.0) for imputation. This is the first local repository of variability entirely produced by a crowdsourcing effort and constitutes an example for future initiatives to characterize local variability worldwide. CSVS is also part of the GA4GH Beacon network. CSVS can be accessed at: http://csvs.babelomics.org/.
Collapse
Affiliation(s)
- María Peña-Chilet
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
- Bioinformatics in Rare Diseases (BiER), Center for Biomedical Network Research on Rare Diseases (CIBERER), ISCIII, Sevilla 41013, Spain
- Computational Systems Medicine group, Institute of Biomedicine of Seville (IBIS) Hospital Virgen del Rocío, Sevilla 41013, Spain
| | - Gema Roldán
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
| | - Javier Perez-Florido
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
- Computational Systems Medicine group, Institute of Biomedicine of Seville (IBIS) Hospital Virgen del Rocío, Sevilla 41013, Spain
- Functional Genomics Node, FPS/ELIXIR-ES, Hospital Virgen del Rocío, Sevilla 41013, Spain
| | - Francisco M Ortuño
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
- Computational Systems Medicine group, Institute of Biomedicine of Seville (IBIS) Hospital Virgen del Rocío, Sevilla 41013, Spain
- Functional Genomics Node, FPS/ELIXIR-ES, Hospital Virgen del Rocío, Sevilla 41013, Spain
| | - Rosario Carmona
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
| | - Virginia Aquino
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
| | - Daniel Lopez-Lopez
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
- Computational Systems Medicine group, Institute of Biomedicine of Seville (IBIS) Hospital Virgen del Rocío, Sevilla 41013, Spain
| | - Carlos Loucera
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
- Computational Systems Medicine group, Institute of Biomedicine of Seville (IBIS) Hospital Virgen del Rocío, Sevilla 41013, Spain
| | - Jose L Fernandez-Rueda
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
| | | | - Francisco García-Garcia
- Unidad de Bioinformática y Bioestadística, Centro de Investigación Príncipe Felipe (CIPF), Valencia 46012, Spain
| | - Anna González-Neira
- Human Genotyping Unit–Centro Nacional de Genotipado (CEGEN), Human Cancer Genetics Programme, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| | - Guillermo Pita
- Human Genotyping Unit–Centro Nacional de Genotipado (CEGEN), Human Cancer Genetics Programme, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| | - Rocío Núñez-Torres
- Human Genotyping Unit–Centro Nacional de Genotipado (CEGEN), Human Cancer Genetics Programme, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| | | | - Carmen Ayuso
- Department of Genetics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid 28040, Spain
| | - Pablo Minguez
- Department of Genetics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid 28040, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), ISCIII, Madrid 28040, Spain
| | - Almudena Avila-Fernandez
- Department of Genetics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid 28040, Spain
| | - Marta Corton
- Department of Genetics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid 28040, Spain
| | - Miguel Ángel Moreno-Pelayo
- Servicio de Genética, Ramón y Cajal Institute of Health Research (IRYCIS) and Biomedical Network Research Centre on Rare Diseases (CIBERER), Madrid 28034, Spain
| | - Matías Morin
- Servicio de Genética, Ramón y Cajal Institute of Health Research (IRYCIS) and Biomedical Network Research Centre on Rare Diseases (CIBERER), Madrid 28034, Spain
| | - Alvaro Gallego-Martinez
- Otology & Neurotology Group CTS 495, Department of Genomic Medicine, Centre for Genomics and Oncological Research (GENYO), Pfizer University of Granada, Granada 18016, Spain
- Department of Otolaryngology, Instituto de Investigación Biosanitaria, IBS. GRANADA, Hospital Universitario Virgen de las Nieves, Universidad de Granada, Granada 18016, Spain
| | - Jose A Lopez-Escamez
- Otology & Neurotology Group CTS 495, Department of Genomic Medicine, Centre for Genomics and Oncological Research (GENYO), Pfizer University of Granada, Granada 18016, Spain
- Department of Otolaryngology, Instituto de Investigación Biosanitaria, IBS. GRANADA, Hospital Universitario Virgen de las Nieves, Universidad de Granada, Granada 18016, Spain
| | - Salud Borrego
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville (IBIS), University Hospital Virgen del Rocío/CSIC/University of Seville, Seville 41013, Spain
- Centre for Biomedical Network Research on Rare Diseases (CIBERER), Seville 41013, Spain
| | - Guillermo Antiñolo
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville (IBIS), University Hospital Virgen del Rocío/CSIC/University of Seville, Seville 41013, Spain
- Centre for Biomedical Network Research on Rare Diseases (CIBERER), Seville 41013, Spain
| | - Jorge Amigo
- Fundación Pública Galega de Medicina Xenómica, SERGAS, IDIS, Santiago de Compostela 15706, Spain
| | - Josefa Salgado-Garrido
- Navarrabiomed-IdiSNA, Complejo Hospitalario de Navarra, Universidad Pública de Navarra (UPNA), IdiSNA (Navarra Institute for Health Research), Pamplona, Navarra 31008, Spain
| | - Sara Pasalodos-Sanchez
- Navarrabiomed-IdiSNA, Complejo Hospitalario de Navarra, Universidad Pública de Navarra (UPNA), IdiSNA (Navarra Institute for Health Research), Pamplona, Navarra 31008, Spain
| | - Beatriz Morte
- Undiagnosed Rare Diseases Programme (ENoD). Center for Biomedical Research on Rare Diseases (CIBERER), ISCIII, Madrid 28029, Spain
| | - Ángel Carracedo
- Fundación Pública Galega de Medicina Xenómica, SERGAS, IDIS, Santiago de Compostela 15706, Spain
- Grupo de Medicina Xenómica, Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), CIMUS, Universidade de Santiago de Compostela, Santiago de Compostela, España
| | - Ángel Alonso
- Navarrabiomed-IdiSNA, Complejo Hospitalario de Navarra, Universidad Pública de Navarra (UPNA), IdiSNA (Navarra Institute for Health Research), Pamplona, Navarra 31008, Spain
| | - Joaquín Dopazo
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
- Bioinformatics in Rare Diseases (BiER), Center for Biomedical Network Research on Rare Diseases (CIBERER), ISCIII, Sevilla 41013, Spain
- Computational Systems Medicine group, Institute of Biomedicine of Seville (IBIS) Hospital Virgen del Rocío, Sevilla 41013, Spain
- Functional Genomics Node, FPS/ELIXIR-ES, Hospital Virgen del Rocío, Sevilla 41013, Spain
| |
Collapse
|
10
|
Gartland A, Bate A, Painter JL, Casperson TA, Powell GE. Developing Crowdsourced Training Data Sets for Pharmacovigilance Intelligent Automation. Drug Saf 2020; 44:373-382. [PMID: 33354751 DOI: 10.1007/s40264-020-01028-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/23/2020] [Indexed: 11/29/2022]
Abstract
INTRODUCTION Machine learning offers an alluring solution to developing automated approaches to the increasing individual case safety report burden being placed upon pharmacovigilance. Leveraging crowdsourcing to annotate unstructured data may provide accurate, efficient, and contemporaneous training data sets in support of machine learning. OBJECTIVE The objective of this study was to evaluate whether crowdsourcing can be used to accurately and efficiently develop training data sets in support of pharmacovigilance automation. MATERIALS AND METHODS Pharmacovigilance experts created a reference dataset by reviewing 15,490 de-identified social media posts of narratives pertaining to 15 drugs and 22 medically relevant topics. A random sampling of posts from the reference dataset was published on Amazon Turk and its users (Turkers) were asked a series of questions about those same medical concepts. Accuracy, price elasticity, and time efficiency were evaluated. RESULTS Accuracy of crowdsourced curation exceeded 90% when compared to the reference dataset and was completed in about 5% of the time. There was an increase in time efficiency with higher pay, but there was no significant difference in accuracy. Additionally, having a social media post reviewed by more than one Turker (using a voting system) did not offer significant improvements in terms of accuracy. CONCLUSIONS Crowdsourcing is an accurate and efficient method that can be used to develop training data sets in support of pharmacovigilance automation. More research is needed to better understand the breadth and depth of possible uses as well as strengths, limitations, and generalizability of results.
Collapse
Affiliation(s)
- Alex Gartland
- College of Medicine, University of Central Florida, Orlando, FL, USA
| | - Andrew Bate
- Safety and Medical Governance, GlaxoSmithKline, London, UK
| | | | - Tim A Casperson
- North American Medical Affairs, GlaxoSmithKline, Research Triangle Park, NC, USA
| | - Gregory Eugene Powell
- Pharma Safety, GlaxoSmithKline, 5 Moore Dr., Research Triangle Park, NC, 27709, USA.
| |
Collapse
|
11
|
Ren C, Tucker JD, Tang W, Tao X, Liao M, Wang G, Jiao K, Xu Z, Zhao Z, Yan Y, Lin Y, Li C, Wang L, Li Y, Kang D, Ma W. Digital crowdsourced intervention to promote HIV testing among MSM in China: study protocol for a cluster randomized controlled trial. Trials 2020; 21:931. [PMID: 33203449 PMCID: PMC7673095 DOI: 10.1186/s13063-020-04860-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Accepted: 11/01/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Men who have sex with men (MSM) are an important HIV key population in China. However, HIV testing rates among MSM remain suboptimal. Digital crowdsourced media interventions may be a useful tool to reach this marginalized population. We define digital crowdsourced media as using social media, mobile phone applications, Internet, or other digital approaches to disseminate messages developed from crowdsourcing contests. The proposed cluster randomized controlled trial (RCT) study aims to assess the effectiveness of a digital crowdsourced intervention to increase HIV testing uptake and decrease risky sexual behaviors among Chinese MSM. METHODS A two-arm, cluster-randomized controlled trial will be implemented in eleven cities (ten clusters) in Shandong Province, China. Targeted study participants will be 250 MSM per arm and 50 participants per cluster. MSM who are 18 years old or above, live in the study city, have not been tested for HIV in the past 3 months, are not living with HIV or have never been tested for HIV, and are willing to provide informed consent will be enrolled. Participants will be recruited through banner advertisements on Blued, the largest gay dating app in China, and in-person at community-based organizations (CBOs). The intervention includes a series of crowdsourced intervention materials (24 images and four short videos about HIV testing and safe sexual behaviors) and HIV self-test services provided by the study team. The intervention was developed through a series of participatory crowdsourcing contests before this study. The self-test kits will be sent to the participants in the intervention group at the 2nd and 3rd follow-ups. Participants will be followed up quarterly during the 12-month period. The primary outcome will be self-reported HIV testing uptake at 12 months. Secondary outcomes will include changes in condomless sex, self-test efficacy, social network engagement, HIV testing social norms, and testing stigma. DISCUSSION Innovative approaches to HIV testing among marginalized population are urgently needed. Through this cluster randomized controlled trial, we will evaluate the effectiveness of a digital crowdsourced intervention, improving HIV testing uptake among MSM and providing a resource in related public health fields. TRIAL REGISTRATION ChiCTR1900024350 . Registered on 6 July 2019.
Collapse
Affiliation(s)
- Ci Ren
- Department of Epidemiology, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, Shandong, China
| | - Joseph D Tucker
- University of North Carolina Chapel Hill Project-China, No. 2 Lujing Road, Guangzhou, 510095, China
| | - Weiming Tang
- University of North Carolina Chapel Hill Project-China, No. 2 Lujing Road, Guangzhou, 510095, China
| | - Xiaorun Tao
- Institution for AIDS/STD Control and Prevention, Shandong Center for Disease Control and Prevention, Jinan, 250014, Shandong, China
| | - Meizhen Liao
- Institution for AIDS/STD Control and Prevention, Shandong Center for Disease Control and Prevention, Jinan, 250014, Shandong, China
| | - Guoyong Wang
- Institution for AIDS/STD Control and Prevention, Shandong Center for Disease Control and Prevention, Jinan, 250014, Shandong, China
| | - Kedi Jiao
- Department of Epidemiology, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, Shandong, China
| | - Zece Xu
- Department of Epidemiology, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, Shandong, China
| | - Zhe Zhao
- Department of Epidemiology, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, Shandong, China
| | - Yu Yan
- Department of Epidemiology, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, Shandong, China
| | - Yuxi Lin
- Department of Epidemiology, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, Shandong, China
| | - Chuanxi Li
- Department of Epidemiology, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, Shandong, China
| | - Lin Wang
- Department of Epidemiology, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, Shandong, China
| | - Yijun Li
- Department of Epidemiology, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, Shandong, China
| | - Dianmin Kang
- Institution for AIDS/STD Control and Prevention, Shandong Center for Disease Control and Prevention, Jinan, 250014, Shandong, China.
| | - Wei Ma
- Department of Epidemiology, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, Shandong, China.
| |
Collapse
|
12
|
Young NR, La Rosa M, Mehr SA, Krasnow MM. Does greater morning sickness predict carrying a girl? Analysis of nausea and vomiting during pregnancy from retrospective report. Arch Gynecol Obstet 2020; 303:1161-1166. [PMID: 33098451 DOI: 10.1007/s00404-020-05839-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 10/13/2020] [Indexed: 10/23/2022]
Abstract
PURPOSE The prevalence of severe nausea and vomiting during pregnancy (NVP) requiring hospitalization has been associated with female fetal sex. However, the question of whether fetal sex and less severe forms of NVP share that association has not been investigated. The objective of this study was to evaluate the relationship between fetal sex and the frequency of NVP. METHODS We collected self-reported data from mothers via an international web-based survey on the Amazon Mechanical Turk (MTurk) platform about pregnancy and first trimester NVP history. We considered the covariables of maternal age, parity status, proneness to nausea, geographic cohort, and preconceived notions of a relationship between fetal sex and NVP. RESULTS Two-thousand five hundred and forty-three mothers met the inclusion criteria, yielding data from 4320 pregnancies. Women gestating a female fetus reported higher frequencies of NVP (M = 6.35 on a 1-9 scale) than did women gestating males (M = 6.04, p = .007). This effect held true when all other variables were included in the regression. General proneness to nausea, maternal age, and parity were also significant independent predictors of NVP. CONCLUSIONS Women that carried a female fetus, as opposed to a male fetus, reported significantly higher frequency of NVP during the first trimester of pregnancy. Further research should evaluate both the proximate and ultimate causes of this relationship.
Collapse
Affiliation(s)
- Nicola R Young
- Department of Psychology, Harvard University, Cambridge, MA, USA.
| | - Mauricio La Rosa
- Department of Obstetrics and Gynecology, University of Texas Medical Branch, Galveston, TX, USA
| | - Samuel A Mehr
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Max M Krasnow
- Department of Psychology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
13
|
Arango-Argoty GA, Guron GKP, Garner E, Riquelme MV, Heath LS, Pruden A, Vikesland PJ, Zhang L. ARGminer: a web platform for the crowdsourcing-based curation of antibiotic resistance genes. Bioinformatics 2020; 36:2966-2973. [PMID: 32058567 DOI: 10.1093/bioinformatics/btaa095] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Revised: 01/31/2020] [Accepted: 02/08/2020] [Indexed: 12/20/2022] Open
Affiliation(s)
| | - G K P Guron
- Department of Civil and Environmental Engineering.,Department of Food Science and Technology, Virginia Tech, Blacksburg, VA 24061 - 0217, USA
| | - E Garner
- Department of Civil and Environmental Engineering
| | - M V Riquelme
- Department of Civil and Environmental Engineering
| | | | - A Pruden
- Department of Civil and Environmental Engineering
| | | | - L Zhang
- Department of Computer Science
| |
Collapse
|
14
|
Créquit P, Boutron I, Meerpohl J, Williams HC, Craig J, Ravaud P. Future of evidence ecosystem series: 2. current opportunities and need for better tools and methods. J Clin Epidemiol 2020; 123:143-152. [DOI: 10.1016/j.jclinepi.2020.01.023] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 12/26/2019] [Accepted: 01/07/2020] [Indexed: 02/06/2023]
|
15
|
Leaman R, Wei CH, Allot A, Lu Z. Ten tips for a text-mining-ready article: How to improve automated discoverability and interpretability. PLoS Biol 2020; 18:e3000716. [PMID: 32479517 PMCID: PMC7289435 DOI: 10.1371/journal.pbio.3000716] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 06/11/2020] [Indexed: 12/22/2022] Open
Abstract
Data-driven research in biomedical science requires structured, computable data. Increasingly, these data are created with support from automated text mining. Text-mining tools have rapidly matured: although not perfect, they now frequently provide outstanding results. We describe 10 straightforward writing tips—and a web tool, PubReCheck—guiding authors to help address the most common cases that remain difficult for text-mining tools. We anticipate these guides will help authors’ work be found more readily and used more widely, ultimately increasing the impact of their work and the overall benefit to both authors and readers. PubReCheck is available at http://www.ncbi.nlm.nih.gov/research/pubrecheck. Your published research is already being processed with automated tools, and text mining will become more common; this Community Page article describes how you can help these tools process your work more accurately, including a web tool, PubReCheck.
Collapse
Affiliation(s)
- Robert Leaman
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Chih-Hsuan Wei
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Alexis Allot
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
16
|
Picache J, May JC, McLean JA. Crowd-Sourced Chemistry: Considerations for Building a Standardized Database to Improve Omic Analyses. ACS OMEGA 2020; 5:980-985. [PMID: 31984253 PMCID: PMC6977078 DOI: 10.1021/acsomega.9b03708] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Accepted: 12/24/2019] [Indexed: 05/09/2023]
Abstract
Mass spectrometry (MS) is used in multiple omics disciplines to generate large collections of data. This data enables advancements in biomedical research by providing global profiles of a given system. One of the main barriers to generating these profiles is the inability to accurately annotate omics data, especially small molecules. To complement pre-existing large databases that are not quite complete, research groups devote efforts to generating personal libraries to annotate their data. Scientific progress is impeded during the generation of these personal libraries because the data contained within them is often redundant and/or incompatible with other databases. To overcome these redundancies and incompatibilities, we propose that communal, crowd-sourced databases be curated in a standardized fashion. A small number of groups have shown this model is feasible and successful. While the needs of a specific field will dictate the functionality of a communal database, we discuss some features to consider during database development. Special emphasis is made on standardization of terminology, documentation, format, reference materials, and quality assurance practices. These standardization procedures enable a field to have higher confidence in the quality of the data within a given database. We also discuss the three conceptual pillars of database design as well as how crowd-sourcing is practiced. Generating open-source databases requires front-end effort, but the result is a well curated, high quality data set that all can use. Having a resource such as this fosters collaboration and scientific advancement.
Collapse
Affiliation(s)
- Jaqueline
A. Picache
- Department of Chemistry,
Center for Innovative Technology, Vanderbilt Institute of Chemical
Biology, Vanderbilt Institute for Integrative Biosystems Research
and Education, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Jody C. May
- Department of Chemistry,
Center for Innovative Technology, Vanderbilt Institute of Chemical
Biology, Vanderbilt Institute for Integrative Biosystems Research
and Education, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - John A. McLean
- Department of Chemistry,
Center for Innovative Technology, Vanderbilt Institute of Chemical
Biology, Vanderbilt Institute for Integrative Biosystems Research
and Education, Vanderbilt University, Nashville, Tennessee 37235, United States
| |
Collapse
|
17
|
Borchert JS, Wang B, Ramzanali M, Stein AB, Malaiyandi LM, Dineley KE. Adverse Events Due to Insomnia Drugs Reported in a Regulatory Database and Online Patient Reviews: Comparative Study. J Med Internet Res 2019; 21:e13371. [PMID: 31702558 PMCID: PMC6874799 DOI: 10.2196/13371] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Revised: 08/22/2019] [Accepted: 09/26/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Patient online drug reviews are a resource for other patients seeking information about the practical benefits and drawbacks of drug therapies. Patient reviews may also serve as a source of postmarketing safety data that are more user-friendly than regulatory databases. However, the reliability of online reviews has been questioned, because they do not undergo professional review and lack means of verification. OBJECTIVE We evaluated online reviews of hypnotic medications, because they are commonly used and their therapeutic efficacy is particularly amenable to patient self-evaluation. Our primary objective was to compare the types and frequencies of adverse events reported to the Food and Drug Administration Adverse Event Reporting System (FAERS) with analogous information in patient reviews on the consumer health website Drugs.com. The secondary objectives were to describe patient reports of efficacy and adverse events and assess the influence of medication cost, effectiveness, and adverse events on user ratings of hypnotic medications. METHODS Patient ratings and narratives were retrieved from 1407 reviews on Drugs.com between February 2007 and March 2018 for eszopiclone, ramelteon, suvorexant, zaleplon, and zolpidem. Reviews were coded to preferred terms in the Medical Dictionary for Regulatory Activities. These reviews were compared to 5916 cases in the FAERS database from January 2015 to September 2017. RESULTS Similar adverse events were reported to both Drugs.com and FAERS. Both resources identified a lack of efficacy as a common complaint for all five drugs. Both resources revealed that amnesia commonly occurs with eszopiclone, zaleplon, and zolpidem, while nightmares commonly occur with suvorexant. Compared to FAERS, online reviews of zolpidem reported a much higher frequency of amnesia and partial sleep activities. User ratings were highest for zolpidem and lowest for suvorexant. Statistical analyses showed that patient ratings are influenced by considerations of efficacy and adverse events, while drug cost is unimportant. CONCLUSIONS For hypnotic medications, online patient reviews and FAERS emphasized similar adverse events. Online reviewers rated drugs based on perception of efficacy and adverse events. We conclude that online patient reviews of hypnotics are a valid source that can supplement traditional adverse event reporting systems.
Collapse
Affiliation(s)
- Jill S Borchert
- Chicago College of Pharmacy, Midwestern University, Downers Grove, IL, United States
| | - Bo Wang
- Chicago College of Osteopathic Medicine, Midwestern University, Downers Grove, IL, United States
| | - Muzaina Ramzanali
- Chicago College of Pharmacy, Midwestern University, Downers Grove, IL, United States
| | - Amy B Stein
- Office of Research and Sponsored Programs, Midwestern University, Glendale, AZ, United States
| | - Latha M Malaiyandi
- College of Graduate Studies, Midwestern University, Downers Grove, IL, United States
| | - Kirk E Dineley
- College of Graduate Studies, Midwestern University, Downers Grove, IL, United States
| |
Collapse
|
18
|
|
19
|
Abstract
The study of coopetition has been evolving with rapid growth in the number of academic publications in this field. A number of literature reviews have been published focusing on nature, antecedents of coopetition and future perspectives of its implementation. Coopetition is proved to be beneficial for joint investments and Research and development (R&D) projects, and yet competitive games take place in the global markets that may lead to safety hazards. There are few studies that investigate possible perspectives of coopetition strategy for solutions in safety and security, and therefore considering the global tendencies objective, necessity arises for a more detailed study of it. The analysis begins by identifying over 600 published studies where the terms “coopetition”, “safety”, “security” were used. Using rigorous bibliometric tools, established and emergent research clusters were identified, as well as the most influential studies, the most contributing authors and topical areas for further investigations. The systematic combination of quantitative and qualitative analytical tools helps to identify the potential directions for future research. By combining bibliometric analysis and content analysis, the main perspective areas for coopetition implementation towards safety and security were identified.
Collapse
|
20
|
Fontil V, Khoong EC, Hoskote M, Radcliffe K, Ratanawongsa N, Lyles CR, Sarkar U. Evaluation of a Health Information Technology-Enabled Collective Intelligence Platform to Improve Diagnosis in Primary Care and Urgent Care Settings: Protocol for a Pragmatic Randomized Controlled Trial. JMIR Res Protoc 2019; 8:e13151. [PMID: 31389337 PMCID: PMC6701158 DOI: 10.2196/13151] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Revised: 05/07/2019] [Accepted: 05/10/2019] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Diagnostic error in ambulatory care, a frequent cause of preventable harm, may be mitigated using the collective intelligence of multiple clinicians. The National Academy of Medicine has identified enhanced clinician collaboration and digital tools as a means to improve the diagnostic process. OBJECTIVE This study aims to assess the efficacy of a collective intelligence output to improve diagnostic confidence and accuracy in ambulatory care cases (from primary care and urgent care clinic visits) with diagnostic uncertainty. METHODS This is a pragmatic randomized controlled trial of using collective intelligence in cases with diagnostic uncertainty from clinicians at primary care and urgent care clinics in 2 health care systems in San Francisco. Real-life cases, identified for having an element of diagnostic uncertainty, will be entered into a collective intelligence digital platform to acquire collective intelligence from at least 5 clinician contributors on the platform. Cases will be randomized to an intervention group (where clinicians will view the collective intelligence output) or control (where clinicians will not view the collective intelligence output). Clinicians will complete a postvisit questionnaire that assesses their diagnostic confidence for each case; in the intervention cases, clinicians will complete the questionnaire after reviewing the collective intelligence output for the case. Using logistic regression accounting for clinician clustering, we will compare the primary outcome of diagnostic confidence and the secondary outcome of time with diagnosis (the time it takes for a clinician to reach a diagnosis), for intervention versus control cases. We will also assess the usability and satisfaction with the digital tool using measures adapted from the Technology Acceptance Model and Net Promoter Score. RESULTS We have recruited 32 out of our recruitment goal of 33 participants. This study is funded until May 2020 and is approved by the University of California San Francisco Institutional Review Board until January 2020. We have completed data collection as of June 2019 and will complete our proposed analysis by December 2019. CONCLUSIONS This study will determine if the use of a digital platform for collective intelligence is acceptable, useful, and efficacious in improving diagnostic confidence and accuracy in outpatient cases with diagnostic uncertainty. If shown to be valuable in improving clinicians' diagnostic process, this type of digital tool may be one of the first innovations used for reducing diagnostic errors in outpatient care. The findings of this study may provide a path forward for improving the diagnostic process. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) DERR1-10.2196/13151.
Collapse
Affiliation(s)
- Valy Fontil
- Division of General Internal Medicine, University of California San Francisco, San Francisco, CA, United States.,UCSF Center for Vulnerable Populations, University of California San Francisco, San Francisco, CA, United States
| | - Elaine C Khoong
- Division of General Internal Medicine, University of California San Francisco, San Francisco, CA, United States
| | - Mekhala Hoskote
- Division of General Internal Medicine, University of California San Francisco, San Francisco, CA, United States.,UCSF Center for Vulnerable Populations, University of California San Francisco, San Francisco, CA, United States
| | - Kate Radcliffe
- Division of General Internal Medicine, University of California San Francisco, San Francisco, CA, United States.,UCSF Center for Vulnerable Populations, University of California San Francisco, San Francisco, CA, United States
| | - Neda Ratanawongsa
- Division of General Internal Medicine, University of California San Francisco, San Francisco, CA, United States.,UCSF Center for Vulnerable Populations, University of California San Francisco, San Francisco, CA, United States
| | - Courtney Rees Lyles
- Division of General Internal Medicine, University of California San Francisco, San Francisco, CA, United States.,UCSF Center for Vulnerable Populations, University of California San Francisco, San Francisco, CA, United States
| | - Urmimala Sarkar
- Division of General Internal Medicine, University of California San Francisco, San Francisco, CA, United States.,UCSF Center for Vulnerable Populations, University of California San Francisco, San Francisco, CA, United States
| |
Collapse
|
21
|
Abstract
With the massive amounts of medical data made available online, language technologies have proven to be indispensable in processing biomedical and molecular biology literature, health data or patient records. With huge amount of reports, evaluating their impact has long ceased to be a trivial task. Linking the contents of these documents to each other, as well as to specialized ontologies, could enable access to and the discovery of structured clinical information and could foster a major leap in natural language processing and in health research. The aim of this Special Issue, “Curative Power of Medical Data” in Data, is to gather innovative approaches for the exploitation of biomedical data using semantic web technologies and linked data by developing a community involvement in biomedical research. This Special Issue contains four surveys, which include a wide range of topics, from the analysis of biomedical articles writing style, to automatically generating tests from medical references, constructing a Gold standard biomedical corpus or the visualization of biomedical data.
Collapse
|
22
|
Himmelstein DS, Rubinetti V, Slochower DR, Hu D, Malladi VS, Greene CS, Gitter A. Open collaborative writing with Manubot. PLoS Comput Biol 2019; 15:e1007128. [PMID: 31233491 PMCID: PMC6611653 DOI: 10.1371/journal.pcbi.1007128] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2018] [Revised: 07/05/2019] [Accepted: 05/24/2019] [Indexed: 01/08/2023] Open
Abstract
Open, collaborative research is a powerful paradigm that can immensely strengthen the scientific process by integrating broad and diverse expertise. However, traditional research and multi-author writing processes break down at scale. We present new software named Manubot, available at https://manubot.org, to address the challenges of open scholarly writing. Manubot adopts the contribution workflow used by many large-scale open source software projects to enable collaborative authoring of scholarly manuscripts. With Manubot, manuscripts are written in Markdown and stored in a Git repository to precisely track changes over time. By hosting manuscript repositories publicly, such as on GitHub, multiple authors can simultaneously propose and review changes. A cloud service automatically evaluates proposed changes to catch errors. Publication with Manubot is continuous: When a manuscript's source changes, the rendered outputs are rebuilt and republished to a web page. Manubot automates bibliographic tasks by implementing citation by identifier, where users cite persistent identifiers (e.g. DOIs, PubMed IDs, ISBNs, URLs), whose metadata is then retrieved and converted to a user-specified style. Manubot modernizes publishing to align with the ideals of open science by making it transparent, reproducible, immediate, versioned, collaborative, and free of charge.
Collapse
Affiliation(s)
- Daniel S. Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Vincent Rubinetti
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - David R. Slochower
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, San Diego, California, United States of America
| | - Dongbo Hu
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Venkat S. Malladi
- Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Bioinformatics Core Facility, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Casey S. Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Morgridge Institute for Research, Madison, Wisconsin, United States of America
| |
Collapse
|
23
|
Yin Z, Sulieman LM, Malin BA. A systematic literature review of machine learning in online personal health data. J Am Med Inform Assoc 2019; 26:561-576. [PMID: 30908576 PMCID: PMC7647332 DOI: 10.1093/jamia/ocz009] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2018] [Revised: 01/06/2019] [Accepted: 01/11/2019] [Indexed: 02/07/2023] Open
Abstract
OBJECTIVE User-generated content (UGC) in online environments provides opportunities to learn an individual's health status outside of clinical settings. However, the nature of UGC brings challenges in both data collecting and processing. The purpose of this study is to systematically review the effectiveness of applying machine learning (ML) methodologies to UGC for personal health investigations. MATERIALS AND METHODS We searched PubMed, Web of Science, IEEE Library, ACM library, AAAI library, and the ACL anthology. We focused on research articles that were published in English and in peer-reviewed journals or conference proceedings between 2010 and 2018. Publications that applied ML to UGC with a focus on personal health were identified for further systematic review. RESULTS We identified 103 eligible studies which we summarized with respect to 5 research categories, 3 data collection strategies, 3 gold standard dataset creation methods, and 4 types of features applied in ML models. Popular off-the-shelf ML models were logistic regression (n = 22), support vector machines (n = 18), naive Bayes (n = 17), ensemble learning (n = 12), and deep learning (n = 11). The most investigated problems were mental health (n = 39) and cancer (n = 15). Common health-related aspects extracted from UGC were treatment experience, sentiments and emotions, coping strategies, and social support. CONCLUSIONS The systematic review indicated that ML can be effectively applied to UGC in facilitating the description and inference of personal health. Future research needs to focus on mitigating bias introduced when building study cohorts, creating features from free text, improving clinical creditability of UGC, and model interpretability.
Collapse
Affiliation(s)
- Zhijun Yin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Lina M Sulieman
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Bradley A Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, USA
| |
Collapse
|
24
|
Bouadjenek MR, Zobel J, Verspoor K. Automated assessment of biological database assertions using the scientific literature. BMC Bioinformatics 2019; 20:216. [PMID: 31035936 PMCID: PMC6489365 DOI: 10.1186/s12859-019-2801-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 04/09/2019] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND The large biological databases such as GenBank contain vast numbers of records, the content of which is substantively based on external resources, including published literature. Manual curation is used to establish whether the literature and the records are indeed consistent. We explore in this paper an automated method for assessing the consistency of biological assertions, to assist biocurators, which we call BARC, Biocuration tool for Assessment of Relation Consistency. In this method a biological assertion is represented as a relation between two objects (for example, a gene and a disease); we then use our novel set-based relevance algorithm SaBRA to retrieve pertinent literature, and apply a classifier to estimate the likelihood that this relation (assertion) is correct. RESULTS Our experiments on assessing gene-disease relations and protein-protein interactions using the PubMed Central collection show that BARC can be effective at assisting curators to perform data cleansing. Specifically, the results obtained showed that BARC substantially outperforms the best baselines, with an improvement of F-measure of 3.5% and 13%, respectively, on gene-disease relations and protein-protein interactions. We have additionally carried out a feature analysis that showed that all feature types are informative, as are all fields of the documents. CONCLUSIONS BARC provides a clear benefit for the biocuration community, as there are no prior automated tools for identifying inconsistent assertions in large-scale biological databases.
Collapse
Affiliation(s)
- Mohamed Reda Bouadjenek
- Department of Mechanical & Industrial Engineering, University of Toronto, Toronto, M5S 3G8 Canada
| | - Justin Zobel
- School of Computing and Information Systems, University of Melbourne, Melbourne, 3010 Australia
| | - Karin Verspoor
- School of Computing and Information Systems, University of Melbourne, Melbourne, 3010 Australia
| |
Collapse
|
25
|
Lu W, Guttentag A, Elbel B, Kiszko K, Abrams C, Kirchner TR. Crowdsourcing for Food Purchase Receipt Annotation via Amazon Mechanical Turk: A Feasibility Study. J Med Internet Res 2019; 21:e12047. [PMID: 30950801 PMCID: PMC6473207 DOI: 10.2196/12047] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Revised: 12/15/2018] [Accepted: 12/16/2018] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND The decisions that individuals make about the food and beverage products they purchase and consume directly influence their energy intake and dietary quality and may lead to excess weight gain and obesity. However, gathering and interpreting data on food and beverage purchase patterns can be difficult. Leveraging novel sources of data on food and beverage purchase behavior can provide us with a more objective understanding of food consumption behaviors. OBJECTIVE Food and beverage purchase receipts often include time-stamped location information, which, when associated with product purchase details, can provide a useful behavioral measurement tool. The purpose of this study was to assess the feasibility, reliability, and validity of processing data from fast-food restaurant receipts using crowdsourcing via Amazon Mechanical Turk (MTurk). METHODS Between 2013 and 2014, receipts (N=12,165) from consumer purchases were collected at 60 different locations of five fast-food restaurant chains in New Jersey and New York City, USA (ie, Burger King, KFC, McDonald's, Subway, and Wendy's). Data containing the restaurant name, location, receipt ID, food items purchased, price, and other information were manually entered into an MS Access database and checked for accuracy by a second reviewer; this was considered the gold standard. To assess the feasibility of coding receipt data via MTurk, a prototype set of receipts (N=196) was selected. For each receipt, 5 turkers were asked to (1) identify the receipt identifier and the name of the restaurant and (2) indicate whether a beverage was listed in the receipt; if yes, they were to categorize the beverage as cold (eg, soda or energy drink) or hot (eg, coffee or tea). Interturker agreement for specific questions (eg, restaurant name and beverage inclusion) and agreement between turker consensus responses and the gold standard values in the manually entered dataset were calculated. RESULTS Among the 196 receipts completed by turkers, the interturker agreement was 100% (196/196) for restaurant names (eg, Burger King, McDonald's, and Subway), 98.5% (193/196) for beverage inclusion (ie, hot, cold, or none), 92.3% (181/196) for types of hot beverage (eg, hot coffee or hot tea), and 87.2% (171/196) for types of cold beverage (eg, Coke or bottled water). When compared with the gold standard data, the agreement level was 100% (196/196) for restaurant name, 99.5% (195/196) for beverage inclusion, and 99.5% (195/196) for beverage types. CONCLUSIONS Our findings indicated high interrater agreement for questions across difficulty levels (eg, single- vs binary- vs multiple-choice items). Compared with traditional methods for coding receipt data, MTurk can produce excellent-quality data in a lower-cost, more time-efficient manner.
Collapse
Affiliation(s)
- Wenhua Lu
- Department of Childhood Studies, Rutgers, The State University of New Jersey, Camden, NJ, United States
| | - Alexandra Guttentag
- College of Global Public Health, New York University, New York, NY, United States
| | - Brian Elbel
- School of Medicine, New York University, New York, NY, United States.,Robert F Wagner Graduate School of Public Service, New York University, New York, NY, United States
| | - Kamila Kiszko
- School of Medicine, New York University, New York, NY, United States
| | - Courtney Abrams
- School of Medicine, New York University, New York, NY, United States
| | - Thomas R Kirchner
- College of Global Public Health, New York University, New York, NY, United States
| |
Collapse
|
26
|
Saez-Rodriguez J, Rinschen MM, Floege J, Kramann R. Big science and big data in nephrology. Kidney Int 2019; 95:1326-1337. [PMID: 30982672 DOI: 10.1016/j.kint.2018.11.048] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Revised: 11/11/2018] [Accepted: 11/20/2018] [Indexed: 12/16/2022]
Abstract
There have been tremendous advances during the last decade in methods for large-scale, high-throughput data generation and in novel computational approaches to analyze these datasets. These advances have had a profound impact on biomedical research and clinical medicine. The field of genomics is rapidly developing toward single-cell analysis, and major advances in proteomics and metabolomics have been made in recent years. The developments on wearables and electronic health records are poised to change clinical trial design. This rise of 'big data' holds the promise to transform not only research progress, but also clinical decision making towards precision medicine. To have a true impact, it requires integrative and multi-disciplinary approaches that blend experimental, clinical and computational expertise across multiple institutions. Cancer research has been at the forefront of the progress in such large-scale initiatives, so-called 'big science,' with an emphasis on precision medicine, and various other areas are quickly catching up. Nephrology is arguably lagging behind, and hence these are exciting times to start (or redirect) a research career to leverage these developments in nephrology. In this review, we summarize advances in big data generation, computational analysis, and big science initiatives, with a special focus on applications to nephrology.
Collapse
Affiliation(s)
- Julio Saez-Rodriguez
- RWTH Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine (JRC-COMBINE), Aachen, Germany; Institute for Computational Biomedicine, Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Heidelberg, Germany; Molecular Medicine Partnership Unit (MMPU), European Molecular Biology Laboratory and Heidelberg University, Heidelberg, Germany.
| | - Markus M Rinschen
- Department II of Internal Medicine, and Center for Molecular Medicine Cologne, University of Cologne, Cologne, Germany; Center for Mass Spectrometry and Metabolomics, The Scripps Research Institute, La Jolla, California, USA
| | - Jürgen Floege
- RWTH Aachen, Department of Nephrology and Clinical Immunology, Aachen, Germany
| | - Rafael Kramann
- RWTH Aachen, Department of Nephrology and Clinical Immunology, Aachen, Germany; Department of Internal Medicine, Nephrology and Transplantation, Erasmus Medical Center, Rotterdam, The Netherlands.
| |
Collapse
|
27
|
Wang Z, Lachmann A, Ma'ayan A. Mining data and metadata from the gene expression omnibus. Biophys Rev 2019; 11:103-110. [PMID: 30594974 PMCID: PMC6381352 DOI: 10.1007/s12551-018-0490-8] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 12/04/2018] [Indexed: 12/16/2022] Open
Abstract
Publicly available gene expression datasets deposited in the Gene Expression Omnibus (GEO) are growing at an accelerating rate. Such datasets hold great value for knowledge discovery, particularly when integrated. Although numerous software platforms and tools have been developed to enable reanalysis and integration of individual, or groups, of GEO datasets, large-scale reuse of those datasets is impeded by minimal requirements for standardized metadata both at the study and sample levels as well as uniform processing of the data across studies. Here, we review methodologies developed to facilitate the systematic curation and processing of publicly available gene expression datasets from GEO. We identify trends for advanced metadata curation and summarize approaches for reprocessing the data within the entire GEO repository.
Collapse
Affiliation(s)
- Zichen Wang
- BD2K-LINCS Data Coordination and Integration Center; Knowledge Management Center for the Illuminating the Druggable Genome; Mount Sinai Center for Bioinformatics, Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, Box 1603, One Gustave L. Levy Place, New York, NY, 10029, USA.
| | - Alexander Lachmann
- BD2K-LINCS Data Coordination and Integration Center; Knowledge Management Center for the Illuminating the Druggable Genome; Mount Sinai Center for Bioinformatics, Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, Box 1603, One Gustave L. Levy Place, New York, NY, 10029, USA
| | - Avi Ma'ayan
- BD2K-LINCS Data Coordination and Integration Center; Knowledge Management Center for the Illuminating the Druggable Genome; Mount Sinai Center for Bioinformatics, Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, Box 1603, One Gustave L. Levy Place, New York, NY, 10029, USA
| |
Collapse
|
28
|
Kilicoglu H. Biomedical text mining for research rigor and integrity: tasks, challenges, directions. Brief Bioinform 2018; 19:1400-1414. [PMID: 28633401 PMCID: PMC6291799 DOI: 10.1093/bib/bbx057] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Revised: 04/10/2017] [Indexed: 01/01/2023] Open
Abstract
An estimated quarter of a trillion US dollars is invested in the biomedical research enterprise annually. There is growing alarm that a significant portion of this investment is wasted because of problems in reproducibility of research findings and in the rigor and integrity of research conduct and reporting. Recent years have seen a flurry of activities focusing on standardization and guideline development to enhance the reproducibility and rigor of biomedical research. Research activity is primarily communicated via textual artifacts, ranging from grant applications to journal publications. These artifacts can be both the source and the manifestation of practices leading to research waste. For example, an article may describe a poorly designed experiment, or the authors may reach conclusions not supported by the evidence presented. In this article, we pose the question of whether biomedical text mining techniques can assist the stakeholders in the biomedical research enterprise in doing their part toward enhancing research integrity and rigor. In particular, we identify four key areas in which text mining techniques can make a significant contribution: plagiarism/fraud detection, ensuring adherence to reporting guidelines, managing information overload and accurate citation/enhanced bibliometrics. We review the existing methods and tools for specific tasks, if they exist, or discuss relevant research that can provide guidance for future work. With the exponential increase in biomedical research output and the ability of text mining approaches to perform automatic tasks at large scale, we propose that such approaches can support tools that promote responsible research practices, providing significant benefits for the biomedical research enterprise.
Collapse
Affiliation(s)
- Halil Kilicoglu
- Lister Hill National Center for Biomedical Communications, US National Library of Medicine
| |
Collapse
|
29
|
Sims MH, Hodges Shaw M, Gilbertson S, Storch J, Halterman MW. Legal and ethical issues surrounding the use of crowdsourcing among healthcare providers. Health Informatics J 2018; 25:1618-1630. [PMID: 30192688 DOI: 10.1177/1460458218796599] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
As the pace of medical discovery widens the knowledge-to-practice gap, technologies that enable peer-to-peer crowdsourcing have become increasingly common. Crowdsourcing has the potential to help medical providers collaborate to solve patient-specific problems in real time. We recently conducted the first trial of a mobile, medical crowdsourcing application among healthcare providers in a university hospital setting. In addition to acknowledging the benefits, our participants also raised concerns regarding the potential negative consequences of this emerging technology. In this commentary, we consider the legal and ethical implications of the major findings identified in our previous trial including compliance with the Health Insurance Portability and Accountability Act, patient protections, healthcare provider liability, data collection, data retention, distracted doctoring, and multi-directional anonymous posting. We believe the commentary and recommendations raised here will provide a frame of reference for individual providers, provider groups, and institutions to explore the salient legal and ethical issues before they implement these systems into their workflow.
Collapse
Affiliation(s)
| | | | - Seth Gilbertson
- University at Buffalo, The State University of New York, USA
| | | | | |
Collapse
|
30
|
Knadler JJ, Penny DJ, Harris TH, Webb GD, Cabrera AG, Kyle WB. Strength in numbers: Crowdsourcing the most relevant literature in pediatric cardiology. CONGENIT HEART DIS 2018; 13:794-798. [PMID: 30178626 DOI: 10.1111/chd.12669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Revised: 07/31/2018] [Accepted: 08/06/2018] [Indexed: 11/30/2022]
Abstract
OBJECTIVE The growing body of medical literature in pediatric cardiology has made it increasingly difficult for individual providers to stay abreast of the most current, meaningful articles to help guide practice. Crowdsourcing represents a collaborative process of obtaining information from a large group of individuals, typically from an online or web-based community, and could serve a potential mechanism to pool individual efforts to combat this issue. This study aimed to utilize crowdsourcing as a novel way to generate a list of the most relevant, current publications in congenital heart disease, utilizing input from an international group of professionals in the field of pediatric cardiology. DESIGN AND SETTING All members of the PediHeartNet Google group, an international email distribution list of medical professionals with an interest in pediatric cardiology, were queried in 2017 to submit literature that they considered to be most relevant to their current practice. A Google Form submission platform was used. The articles were evaluated by a multi-institutional panel of four experts in pediatric cardiology using the Delphi method via an electronic evaluation form until a consensus was reached regarding whether the article merited inclusion in the final list. RESULTS In total, 260 articles were submitted by members of the PediHeartNet Google group. Expert review using the Delphi method resulted in a list of 108 articles. The final collection of articles was published on a publicly available educational website. CONCLUSIONS Crowdsourcing represents a novel approach for generating a high-yield, comprehensive, yet practical list of the most relevant recent publications in pediatric cardiology. The same techniques could be easily applied to any medical subspecialty. By enlisting the input of frontline providers, the value and relevance of such a list will be significant. A web-based platform for publication of the list allows for real-time updates to ensure continued relevance.
Collapse
Affiliation(s)
- Joseph J Knadler
- Lillie Frank Abercrombie Section of Cardiology, Department of Pediatrics, Texas Children's Hospital, Baylor College of Medicine, Houston, Texas
| | - Daniel J Penny
- Lillie Frank Abercrombie Section of Cardiology, Department of Pediatrics, Texas Children's Hospital, Baylor College of Medicine, Houston, Texas
| | - Tyler H Harris
- Division of Cardiology, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, Pittsburgh, Pennsylvania
| | - Gary D Webb
- The Heart Institute, Division of Pediatric Cardiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Antonio G Cabrera
- Lillie Frank Abercrombie Section of Cardiology, Department of Pediatrics, Texas Children's Hospital, Baylor College of Medicine, Houston, Texas.,Section of Critical Care Medicine, Department of Pediatrics, Texas Children's Hospital, Baylor College of Medicine, Houston, Texas
| | - William B Kyle
- Lillie Frank Abercrombie Section of Cardiology, Department of Pediatrics, Texas Children's Hospital, Baylor College of Medicine, Houston, Texas
| |
Collapse
|
31
|
|
32
|
Lossio-Ventura JA, Hogan W, Modave F, Guo Y, He Z, Yang X, Zhang H, Bian J. OC-2-KB: integrating crowdsourcing into an obesity and cancer knowledge base curation system. BMC Med Inform Decis Mak 2018; 18:55. [PMID: 30066655 PMCID: PMC6069686 DOI: 10.1186/s12911-018-0635-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND There is strong scientific evidence linking obesity and overweight to the risk of various cancers and to cancer survivorship. Nevertheless, the existing online information about the relationship between obesity and cancer is poorly organized, not evidenced-based, of poor quality, and confusing to health information consumers. A formal knowledge representation such as a Semantic Web knowledge base (KB) can help better organize and deliver quality health information. We previously presented the OC-2-KB (Obesity and Cancer to Knowledge Base), a software pipeline that can automatically build an obesity and cancer KB from scientific literature. In this work, we investigated crowdsourcing strategies to increase the number of ground truth annotations and improve the quality of the KB. METHODS We developed a new release of the OC-2-KB system addressing key challenges in automatic KB construction. OC-2-KB automatically extracts semantic triples in the form of subject-predicate-object expressions from PubMed abstracts related to the obesity and cancer literature. The accuracy of the facts extracted from scientific literature heavily relies on both the quantity and quality of the available ground truth triples. Thus, we incorporated a crowdsourcing process to improve the quality of the KB. RESULTS We conducted two rounds of crowdsourcing experiments using a new corpus with 82 obesity and cancer-related PubMed abstracts. We demonstrated that crowdsourcing is indeed a low-cost mechanism to collect labeled data from non-expert laypeople. Even though individual layperson might not offer reliable answers, the collective wisdom of the crowd is comparable to expert opinions. We also retrained the relation detection machine learning models in OC-2-KB using the crowd annotated data and evaluated the content of the curated KB with a set of competency questions. Our evaluation showed improved performance of the underlying relation detection model in comparison to the baseline OC-2-KB. CONCLUSIONS We presented a new version of OC-2-KB, a system that automatically builds an evidence-based obesity and cancer KB from scientific literature. Our KB construction framework integrated automatic information extraction with crowdsourcing techniques to verify the extracted knowledge. Our ultimate goal is a paradigm shift in how the general public access, read, digest, and use online health information.
Collapse
Affiliation(s)
- Juan Antonio Lossio-Ventura
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, 2004 Mowry Road, Gainesville, FL, 32610, USA
| | - William Hogan
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, 2004 Mowry Road, Gainesville, FL, 32610, USA
| | - François Modave
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, 2004 Mowry Road, Gainesville, FL, 32610, USA
| | - Yi Guo
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, 2004 Mowry Road, Gainesville, FL, 32610, USA
| | - Zhe He
- School of Information, Florida State University, 142 Collegiate Loop, Tallahassee, FL, 32306, USA
| | - Xi Yang
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, 2004 Mowry Road, Gainesville, FL, 32610, USA
| | - Hansi Zhang
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, 2004 Mowry Road, Gainesville, FL, 32610, USA
| | - Jiang Bian
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, 2004 Mowry Road, Gainesville, FL, 32610, USA.
| |
Collapse
|
33
|
Foster M, Pandey A, Kreimeyer K, Botsis T. Generation of an annotated reference standard for vaccine adverse event reports. Vaccine 2018; 36:4325-4330. [PMID: 29880244 DOI: 10.1016/j.vaccine.2018.05.079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Revised: 05/08/2018] [Accepted: 05/21/2018] [Indexed: 01/24/2023]
Abstract
As part of a collaborative project between the US Food and Drug Administration (FDA) and the Centers for Disease Control and Prevention for the development of a web-based natural language processing (NLP) workbench, we created a corpus of 1000 Vaccine Adverse Event Reporting System (VAERS) reports annotated for 36,726 clinical features, 13,365 temporal features, and 22,395 clinical-temporal links. This paper describes the final corpus, as well as the methodology used to create it, so that clinical NLP researchers outside FDA can evaluate the utility of the corpus to aid their own work. The creation of this standard went through four phases: pre-training, pre-production, production-clinical feature annotation, and production-temporal annotation. The pre-production phase used a double annotation followed by adjudication strategy to refine and finalize the annotation model while the production phases followed a single annotation strategy to maximize the number of reports in the corpus. An analysis of 30 reports randomly selected as part of a quality control assessment yielded accuracies of 0.97, 0.96, and 0.83 for clinical features, temporal features, and clinical-temporal associations, respectively and speaks to the quality of the corpus.
Collapse
Affiliation(s)
- Matthew Foster
- FDA Center for Biologics Evaluation and Research, Office of Biostatistics and Epidemiology. 10903 New Hampshire Ave, Silver Spring, MD, United States.
| | - Abhishek Pandey
- FDA Center for Biologics Evaluation and Research, Office of Biostatistics and Epidemiology. 10903 New Hampshire Ave, Silver Spring, MD, United States
| | - Kory Kreimeyer
- FDA Center for Biologics Evaluation and Research, Office of Biostatistics and Epidemiology. 10903 New Hampshire Ave, Silver Spring, MD, United States
| | - Taxiarchis Botsis
- FDA Center for Biologics Evaluation and Research, Office of Biostatistics and Epidemiology. 10903 New Hampshire Ave, Silver Spring, MD, United States; The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| |
Collapse
|
34
|
Créquit P, Mansouri G, Benchoufi M, Vivot A, Ravaud P. Mapping of Crowdsourcing in Health: Systematic Review. J Med Internet Res 2018; 20:e187. [PMID: 29764795 PMCID: PMC5974463 DOI: 10.2196/jmir.9330] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Revised: 02/10/2018] [Accepted: 03/14/2018] [Indexed: 11/22/2022] Open
Abstract
Background Crowdsourcing involves obtaining ideas, needed services, or content by soliciting Web-based contributions from a crowd. The 4 types of crowdsourced tasks (problem solving, data processing, surveillance or monitoring, and surveying) can be applied in the 3 categories of health (promotion, research, and care). Objective This study aimed to map the different applications of crowdsourcing in health to assess the fields of health that are using crowdsourcing and the crowdsourced tasks used. We also describe the logistics of crowdsourcing and the characteristics of crowd workers. Methods MEDLINE, EMBASE, and ClinicalTrials.gov were searched for available reports from inception to March 30, 2016, with no restriction on language or publication status. Results We identified 202 relevant studies that used crowdsourcing, including 9 randomized controlled trials, of which only one had posted results at ClinicalTrials.gov. Crowdsourcing was used in health promotion (91/202, 45.0%), research (73/202, 36.1%), and care (38/202, 18.8%). The 4 most frequent areas of application were public health (67/202, 33.2%), psychiatry (32/202, 15.8%), surgery (22/202, 10.9%), and oncology (14/202, 6.9%). Half of the reports (99/202, 49.0%) referred to data processing, 34.6% (70/202) referred to surveying, 10.4% (21/202) referred to surveillance or monitoring, and 5.9% (12/202) referred to problem-solving. Labor market platforms (eg, Amazon Mechanical Turk) were used in most studies (190/202, 94%). The crowd workers’ characteristics were poorly reported, and crowdsourcing logistics were missing from two-thirds of the reports. When reported, the median size of the crowd was 424 (first and third quartiles: 167-802); crowd workers’ median age was 34 years (32-36). Crowd workers were mainly recruited nationally, particularly in the United States. For many studies (58.9%, 119/202), previous experience in crowdsourcing was required, and passing a qualification test or training was seldom needed (11.9% of studies; 24/202). For half of the studies, monetary incentives were mentioned, with mainly less than US $1 to perform the task. The time needed to perform the task was mostly less than 10 min (58.9% of studies; 119/202). Data quality validation was used in 54/202 studies (26.7%), mainly by attention check questions or by replicating the task with several crowd workers. Conclusions The use of crowdsourcing, which allows access to a large pool of participants as well as saving time in data collection, lowering costs, and speeding up innovations, is increasing in health promotion, research, and care. However, the description of crowdsourcing logistics and crowd workers’ characteristics is frequently missing in study reports and needs to be precisely reported to better interpret the study findings and replicate them.
Collapse
Affiliation(s)
- Perrine Créquit
- INSERM UMR1153, Methods Team, Epidemiology and Statistics Sorbonne Paris Cité Research Center, Paris Descartes University, Paris, France.,Centre d'Epidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, Paris, France.,Cochrane France, Paris, France
| | - Ghizlène Mansouri
- INSERM UMR1153, Methods Team, Epidemiology and Statistics Sorbonne Paris Cité Research Center, Paris Descartes University, Paris, France
| | - Mehdi Benchoufi
- Centre d'Epidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, Paris, France
| | - Alexandre Vivot
- INSERM UMR1153, Methods Team, Epidemiology and Statistics Sorbonne Paris Cité Research Center, Paris Descartes University, Paris, France.,Centre d'Epidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, Paris, France
| | - Philippe Ravaud
- INSERM UMR1153, Methods Team, Epidemiology and Statistics Sorbonne Paris Cité Research Center, Paris Descartes University, Paris, France.,Centre d'Epidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, Paris, France.,Cochrane France, Paris, France.,Department of Epidemiology, Columbia University, Mailman School of Public Health, New York, NY, United States
| |
Collapse
|
35
|
|
36
|
Schuhmacher A, Gassmann O, McCracken N, Hinder M. Open innovation and external sources of innovation. An opportunity to fuel the R&D pipeline and enhance decision making? J Transl Med 2018; 16:119. [PMID: 29739427 PMCID: PMC5941640 DOI: 10.1186/s12967-018-1499-2] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2018] [Accepted: 04/30/2018] [Indexed: 12/28/2022] Open
Abstract
Historically, research and development (R&D) in the pharmaceutical sector has predominantly been an in-house activity. To enable investments for game changing late-stage assets and to enable better and less costly go/no-go decisions, most companies have employed a fail early paradigm through the implementation of clinical proof-of-concept organizations. To fuel their pipelines, some pioneers started to complement their internal R&D efforts through collaborations as early as the 1990s. In recent years, multiple extrinsic and intrinsic factors induced an opening for external sources of innovation and resulted in new models for open innovation, such as open sourcing, crowdsourcing, public–private partnerships, innovations centres, and the virtualization of R&D. Three factors seem to determine the breadth and depth regarding how companies approach external innovation: (1) the company’s legacy, (2) the company’s willingness and ability to take risks and (3) the company’s need to control IP and competitors. In addition, these factors often constitute the major hurdles to effectively leveraging external opportunities and assets. Conscious and differential choices of the R&D and business models for different companies and different divisions in the same company seem to best allow a company to fully exploit the potential of both internal and external innovations.
Collapse
Affiliation(s)
| | - Oliver Gassmann
- Institute for Technology Management, University of St. Gallen, Dufourstrasse 40a, 9000, St. Gallen, Switzerland
| | - Nigel McCracken
- Debiopharm International S.A., Chemin Messidor 5-7, 1002, Lausanne, Switzerland
| | - Markus Hinder
- Novartis Institutes for BioMedical Research, Postfach, Forum 1, 4002, Basel, Switzerland
| |
Collapse
|
37
|
Abstract
Data, including information generated from them by processing and analysis, are an asset with measurable value. The assets that biological research funding produces are the data generated, the information derived from these data, and, ultimately, the discoveries and knowledge these lead to. From the time when Henry Oldenburg published the first scientific journal in 1665 (Proceedings of the Royal Society) to the founding of the United States National Library of Medicine in 1879 to the present, there has been a sustained drive to improve how researchers can record and discover what is known. Researchers’ experimental work builds upon years and (collectively) billions of dollars’ worth of earlier work. Today, researchers are generating data at ever-faster rates because of advances in instrumentation and technology, coupled with decreases in production costs. Unfortunately, the ability of researchers to manage and disseminate their results has not kept pace, so their work cannot achieve its maximal impact. Strides have recently been made, but more awareness is needed of the essential role that biological data resources, including biocuration, play in maintaining and linking this ever-growing flood of data and information. The aim of this paper is to describe the nature of data as an asset, the role biocurators play in increasing its value, and consistent, practical means to measure effectiveness that can guide planning and justify costs in biological research information resources’ development and management.
Collapse
|
38
|
Abstract
Background Crowdsourcing is a nascent phenomenon that has grown exponentially since it was coined in 2006. It involves a large group of people solving a problem or completing a task for an individual or, more commonly, for an organisation. While the field of crowdsourcing has developed more quickly in information technology, it has great promise in health applications. This review examines uses of crowdsourcing in global health and health, broadly. Methods Semantic searches were run in Google Scholar for “crowdsourcing,” “crowdsourcing and health,” and similar terms. 996 articles were retrieved and all abstracts were scanned. 285 articles related to health. This review provides a narrative overview of the articles identified. Results Eight areas where crowdsourcing has been used in health were identified: diagnosis; surveillance; nutrition; public health and environment; education; genetics; psychology; and, general medicine/other. Many studies reported crowdsourcing being used in a diagnostic or surveillance capacity. Crowdsourcing has been widely used across medical disciplines; however, it is important for future work using crowdsourcing to consider the appropriateness of the crowd being used to ensure the crowd is capable and has the adequate knowledge for the task at hand. Gamification of tasks seems to improve accuracy; other innovative methods of analysis including introducing thresholds and measures of trustworthiness should be considered. Conclusion Crowdsourcing is a new field that has been widely used and is innovative and adaptable. With the exception of surveillance applications that are used in emergency and disaster situations, most uses of crowdsourcing have only been used as pilots. These exceptions demonstrate that it is possible to take crowdsourcing applications to scale. Crowdsourcing has the potential to provide more accessible health care to more communities and individuals rapidly and to lower costs of care.
Collapse
Affiliation(s)
- Kerri Wazny
- Centre for Global Health Research, Usher Institute of Informatics and Population Sciences, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
39
|
Turner TR, Wagner JK, Cabana GS. Ethics in biological anthropology. AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 2018; 165:939-951. [PMID: 29574844 PMCID: PMC5873973 DOI: 10.1002/ajpa.23367] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Revised: 11/08/2017] [Accepted: 11/10/2017] [Indexed: 11/05/2022]
Affiliation(s)
- Trudy R Turner
- Department of Anthropology, University of Wisconsin-Milwaukee, POB 413, Milwaukee, Wisconsin
| | - Jennifer K Wagner
- Center for Translational Bioethics & Health Care Policy, Geisinger, Danville, Pennsylvania
| | - Graciela S Cabana
- Department of Anthropology, University of Tennessee-Knoxville, Knoxville, Tennessee
| |
Collapse
|
40
|
Abstract
Background First coined by Howe in 2006, the field of crowdsourcing has grown exponentially. Despite its growth and its transcendence across many fields, the definition of crowdsourcing has still not been agreed upon, and examples are poorly indexed in peer–reviewed literature. Many examples of crowdsourcing have not been scaled–up past the pilot phase. In spite of this, crowdsourcing has great potential, especially in global health where resources are lacking. This narrative review seeks to review both indexed and grey crowdsourcing literature broadly in order to explore the current state of the field. Methods This is a review of reviews of crowdsourcing. Semantic searches were conducted using Google Scholar rather than indexed databases due to poor indexing of the topic. 996 articles were retrieved, of which 69 were initially identified as being reviews or theoretically–based. 21 of these were found to be irrelevant and 48 articles were reviewed. Results This narrative review focuses on defining crowdsourcing, taxonomies of crowdsourcing, who constitutes the crowd, research that is amenable to crowdsourcing, regulatory and ethical aspects of crowdsourcing and some notable examples of crowdsourcing. Conclusions Crowdsourcing has the potential to be hugely promising, especially in global health, due to its ability to collect information rapidly, inexpensively and accurately. Rigorous ethical and regulatory controls are needed to ensure data are collected and analysed appropriately and crowdsourcing should be considered complementary to traditional research methods.
Collapse
Affiliation(s)
- Kerri Wazny
- Centre for Global Health, Usher Institute for Informatics and Population Sciences, University of Edinburgh, Edinburgh, Scotland, UK
| |
Collapse
|
41
|
Crowdsourcing and community engagement: a qualitative analysis of the 2BeatHIV contest. J Virus Erad 2018; 4:30-36. [PMID: 29568551 PMCID: PMC5851182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
Abstract
Background: As HIV cure research advances, it is important to engage local communities. Crowdsourcing may be an effective, bottom-up approach. Crowdsourcing contests elicit public contributions to solve problems and celebrate finalists. We examine the development of a crowdsourcing contest to understand public perspectives about HIV cure research. Methods: We used flyers, emails, online advertisement and phone calls to recruit a convenience sample of community members to participate in focus-group discussions. Participants developed a contest name, logo and hashtag. Qualitative analysis identified emergent themes in the focus group transcripts. Results: Seventy-one people participated in four focus groups. Emergent themes for HIV cure engagement included: (1) emphasising collective approaches to HIV cure; (2) dispelling myths to spur discussion; (3) using HIV cure as motivation for participation; and (4) using creative community engagement. Conclusion: Crowdsourcing contests may be useful for engaging local communities, developing culturally tailored awareness campaign messaging, and encouraging the public to learn more about HIV cure research.
Collapse
|
42
|
Harper L, Campbell J, Cannon EKS, Jung S, Poelchau M, Walls R, Andorf C, Arnaud E, Berardini TZ, Birkett C, Cannon S, Carson J, Condon B, Cooper L, Dunn N, Elsik CG, Farmer A, Ficklin SP, Grant D, Grau E, Herndon N, Hu ZL, Humann J, Jaiswal P, Jonquet C, Laporte MA, Larmande P, Lazo G, McCarthy F, Menda N, Mungall CJ, Munoz-Torres MC, Naithani S, Nelson R, Nesdill D, Park C, Reecy J, Reiser L, Sanderson LA, Sen TZ, Staton M, Subramaniam S, Tello-Ruiz MK, Unda V, Unni D, Wang L, Ware D, Wegrzyn J, Williams J, Woodhouse M, Yu J, Main D. AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture. Database (Oxford) 2018; 2018:5096675. [PMID: 30239679 PMCID: PMC6146126 DOI: 10.1093/database/bay088] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Revised: 07/19/2018] [Accepted: 07/30/2018] [Indexed: 01/07/2023]
Abstract
The future of agricultural research depends on data. The sheer volume of agricultural biological data being produced today makes excellent data management essential. Governmental agencies, publishers and science funders require data management plans for publicly funded research. Furthermore, the value of data increases exponentially when they are properly stored, described, integrated and shared, so that they can be easily utilized in future analyses. AgBioData (https://www.agbiodata.org) is a consortium of people working at agricultural biological databases, data archives and knowledgbases who strive to identify common issues in database development, curation and management, with the goal of creating database products that are more Findable, Accessible, Interoperable and Reusable. We strive to promote authentic, detailed, accurate and explicit communication between all parties involved in scientific data. As a step toward this goal, we present the current state of biocuration, ontologies, metadata and persistence, database platforms, programmatic (machine) access to data, communication and sustainability with regard to data curation. Each section describes challenges and opportunities for these topics, along with recommendations and best practices.
Collapse
Affiliation(s)
- Lisa Harper
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, USA
| | | | - Ethalinda K S Cannon
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, USA
- Computer Science, Iowa State University, Ames, IA, USA
| | - Sook Jung
- Horticulture, Washington State University, Pullman, WA, USA
| | - Monica Poelchau
- National Agricultural Library, USDA Agricultural Research Service, Beltsville, MD, USA
| | | | - Carson Andorf
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, USA
- Computer Science, Iowa State University, Ames, IA, USA
| | - Elizabeth Arnaud
- Bioversity International, Informatics Unit, Conservation and Availability Programme, Parc Scientifique Agropolis II, Montpellier, France
| | - Tanya Z Berardini
- The Arabidopsis Information Resource, Phoenix Bioinformatics, Fremont, CA, USA
| | | | - Steve Cannon
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, USA
| | - James Carson
- Texas Advanced Computing Center, The University of Texas at Austin, Austin, TX, USA
| | - Bradford Condon
- Entomology and Plant Pathology, University of Tennessee Knoxville, Knoxville, TN, USA
| | - Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Nathan Dunn
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Christine G Elsik
- Division of Animal Sciences and Division of Plant Sciences, University of Missouri, Columbia, MO, USA
| | - Andrew Farmer
- National Center for Genome Resources, Santa Fe, NM, USA
| | | | - David Grant
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, USA
| | - Emily Grau
- National Center for Genome Resources, Santa Fe, NM, USA
| | - Nic Herndon
- Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA
| | - Zhi-Liang Hu
- Animal Science, Iowa State University, Ames, USA
| | - Jodi Humann
- Horticulture, Washington State University, Pullman, WA, USA
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Clement Jonquet
- Laboratory of Informatics, Robotics, Microelectronics of Montpellier, University of Montpellier & CNRS, Montpellier, France
| | - Marie-Angélique Laporte
- Bioversity International, Informatics Unit, Conservation and Availability Programme, Parc Scientifique Agropolis II, Montpellier, France
| | | | - Gerard Lazo
- Crop Improvement and Genetics Research Unit, USDA-ARS, Albany, CA, USA
| | - Fiona McCarthy
- School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, AZ, USA
| | | | | | | | - Sushma Naithani
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Rex Nelson
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, USA
| | - Daureen Nesdill
- Marriott Library, University of Utah, Salt Lake City, UT, USA
| | - Carissa Park
- Animal Science, Iowa State University, Ames, USA
| | - James Reecy
- Animal Science, Iowa State University, Ames, USA
| | - Leonore Reiser
- The Arabidopsis Information Resource, Phoenix Bioinformatics, Fremont, CA, USA
| | | | - Taner Z Sen
- Crop Improvement and Genetics Research Unit, USDA-ARS, Albany, CA, USA
| | - Margaret Staton
- Entomology and Plant Pathology, University of Tennessee Knoxville, Knoxville, TN, USA
| | | | | | - Victor Unda
- Horticulture, Washington State University, Pullman, WA, USA
| | - Deepak Unni
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Liya Wang
- Plant Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Doreen Ware
- USDA, Plant, Soil and Nutrition Research, Ithaca, NY, USA
- Plant Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Jill Wegrzyn
- Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA
| | - Jason Williams
- Cold Spring Harbor Laboratory, DNA Learning Center, Cold Spring Harbor, NY, USA
| | - Margaret Woodhouse
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, USA
| | - Jing Yu
- Horticulture, Washington State University, Pullman, WA, USA
| | - Doreen Main
- Horticulture, Washington State University, Pullman, WA, USA
| |
Collapse
|
43
|
Agrawal K, Mehdi M, Reichert M, Hauck F, Schlee W, Probst T, Pryss R. Towards Incentive Management Mechanisms in the Context of Crowdsensing Technologies based on TrackYourTinnitus Insights. ACTA ACUST UNITED AC 2018. [DOI: 10.1016/j.procs.2018.07.155] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
44
|
Mathews A, Farley S, Hightow-Weidman L, Muessig K, Rennie S, Tucker JD. Crowdsourcing and community engagement: a qualitative analysis of the 2BeatHIV contest. J Virus Erad 2018. [DOI: 10.1016/s2055-6640(20)30239-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
|
45
|
Chancellor MB, Bartolone SN, Veerecke A, Lamb LE. Crowdsourcing Disease Biomarker Discovery Research: The IP4IC Study. J Urol 2017; 199:1344-1350. [PMID: 29225061 DOI: 10.1016/j.juro.2017.09.167] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/14/2017] [Indexed: 12/30/2022]
Abstract
PURPOSE Biomarker discovery is limited by readily assessable, cost efficient human samples available in large numbers that represent the entire heterogeneity of the disease. We developed a novel, active participation crowdsourcing method to determine BP-RS (Bladder Permeability Defect Risk Score). It is based on noninvasive urinary cytokines to discriminate patients with interstitial cystitis/bladder pain syndrome who had Hunner lesions from controls and patients with interstitial cystitis/bladder pain syndrome but without Hunner lesions. MATERIALS AND METHODS We performed a national crowdsourcing study in cooperation with the Interstitial Cystitis Association. Patients answered demographic, symptom severity and urinary frequency questionnaires on a HIPAA (Health Insurance Portability and Accountability Act) compliant website. Urine samples were collected at home, stabilized with a preservative and sent to Beaumont Hospital for analysis. The expression of 3 urinary cytokines was used in a machine learning algorithm to develop BP-RS. RESULTS The IP4IC study collected a total of 448 urine samples, representing 153 patients (147 females and 6 males) with interstitial cystitis/bladder pain syndrome, of whom 54 (50 females and 4 males) had Hunner lesions. A total of 159 female and 136 male controls also participated, who were age matched. A defined BP-RS was calculated to predict interstitial cystitis/bladder pain syndrome with Hunner lesions or a bladder permeability defect etiology with 89% validity. CONCLUSIONS In this novel participation crowdsourcing study we obtained a large number of urine samples from 46 states, which were collected at home, shipped and stored at room temperature. Using a machine learning algorithm we developed BP-RS to quantify the risk of interstitial cystitis/bladder pain syndrome with Hunner lesions, which is indicative of a bladder permeability defect etiology. To our knowledge BP-RS is the first validated urine biomarker assay for interstitial cystitis/bladder pain syndrome and one of the first biomarker assays to be developed using crowdsourcing.
Collapse
Affiliation(s)
- Michael B Chancellor
- Oakland University William Beaumont School of Medicine, Rochester Hills, Michigan.
| | | | - Andrew Veerecke
- Department of Urology, Beaumont Health System, Royal Oak, Michigan
| | - Laura E Lamb
- Oakland University William Beaumont School of Medicine, Rochester Hills, Michigan
| |
Collapse
|
46
|
Abstract
BACKGROUND First coined by Howe in 2006, the field of crowdsourcing has grown exponentially. Despite its growth and its transcendence across many fields, the definition of crowdsourcing has still not been agreed upon, and examples are poorly indexed in peer-reviewed literature. Many examples of crowdsourcing have not been scaled-up past the pilot phase. In spite of this, crowdsourcing has great potential, especially in global health where resources are lacking. This narrative review seeks to review both indexed and grey crowdsourcing literature broadly in order to explore the current state of the field. METHODS This is a review of reviews of crowdsourcing. Semantic searches were conducted using Google Scholar rather than indexed databases due to poor indexing of the topic. 996 articles were retrieved, of which 69 were initially identified as being reviews or theoretically-based. 21 of these were found to be irrelevant and 48 articles were reviewed. RESULTS This narrative review focuses on defining crowdsourcing, taxonomies of crowdsourcing, who constitutes the crowd, research that is amenable to crowdsourcing, regulatory and ethical aspects of crowdsourcing and some notable examples of crowdsourcing. CONCLUSIONS Crowdsourcing has the potential to be hugely promising, especially in global health, due to its ability to collect information rapidly, inexpensively and accurately. Rigorous ethical and regulatory controls are needed to ensure data are collected and analysed appropriately and crowdsourcing should be considered complementary to traditional research methods.
Collapse
Affiliation(s)
- Kerri Wazny
- Centre for Global Health, Usher Institute for Informatics and Population Sciences, University of Edinburgh, Edinburgh, Scotland, UK
| |
Collapse
|
47
|
Talikka M, Bukharov N, Hayes WS, Hofmann-Apitius M, Alexopoulos L, Peitsch MC, Hoeng J. Novel approaches to develop community-built biological network models for potential drug discovery. Expert Opin Drug Discov 2017; 12:849-857. [PMID: 28585481 DOI: 10.1080/17460441.2017.1335302] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
INTRODUCTION Hundreds of thousands of data points are now routinely generated in clinical trials by molecular profiling and NGS technologies. A true translation of this data into knowledge is not possible without analysis and interpretation in a well-defined biology context. Currently, there are many public and commercial pathway tools and network models that can facilitate such analysis. At the same time, insights and knowledge that can be gained is highly dependent on the underlying biological content of these resources. Crowdsourcing can be employed to guarantee the accuracy and transparency of the biological content underlining the tools used to interpret rich molecular data. Areas covered: In this review, the authors describe crowdsourcing in drug discovery. The focal point is the efforts that have successfully used the crowdsourcing approach to verify and augment pathway tools and biological network models. Technologies that enable the building of biological networks with the community are also described. Expert opinion: A crowd of experts can be leveraged for the entire development process of biological network models, from ontologies to the evaluation of their mechanistic completeness. The ultimate goal is to facilitate biomarker discovery and personalized medicine by mechanistically explaining patients' differences with respect to disease prevention, diagnosis, and therapy outcome.
Collapse
Affiliation(s)
- Marja Talikka
- a Philip Morris International R&D , Philip Morris Products S.A. , Neuchâtel , Switzerland
| | - Natalia Bukharov
- b Translational Data Management Services, Clarivate Analytics (Formerly the IP & Science Business of Thomson Reuters) , Boston , MA , USA
| | - William S Hayes
- c Data Sciences , Applied Dynamic Solutions, LLC , Rahway , NJ , USA
| | - Martin Hofmann-Apitius
- d Department of Bioinformatics , Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven , Sankt Augustin , Germany
| | - Leonidas Alexopoulos
- e Systems Bioengineering Lab , National Technical University of Athens , Zografou , Greece.,f Protavio Ltd , Stevenage , UK
| | - Manuel C Peitsch
- a Philip Morris International R&D , Philip Morris Products S.A. , Neuchâtel , Switzerland
| | - Julia Hoeng
- a Philip Morris International R&D , Philip Morris Products S.A. , Neuchâtel , Switzerland
| |
Collapse
|
48
|
Abstract
The generation of large-scale biomedical data is creating unprecedented opportunities for basic and translational science. Typically, the data producers perform initial analyses, but it is very likely that the most informative methods may reside with other groups. Crowdsourcing the analysis of complex and massive data has emerged as a framework to find robust methodologies. When the crowdsourcing is done in the form of collaborative scientific competitions, known as Challenges, the validation of the methods is inherently addressed. Challenges also encourage open innovation, create collaborative communities to solve diverse and important biomedical problems, and foster the creation and dissemination of well-curated data repositories.
Collapse
|
49
|
Jagodnik KM, Koplev S, Jenkins SL, Ohno-Machado L, Paten B, Schurer SC, Dumontier M, Verborgh R, Bui A, Ping P, McKenna NJ, Madduri R, Pillai A, Ma'ayan A. Developing a framework for digital objects in the Big Data to Knowledge (BD2K) commons: Report from the Commons Framework Pilots workshop. J Biomed Inform 2017; 71:49-57. [PMID: 28501646 DOI: 10.1016/j.jbi.2017.05.006] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Revised: 05/01/2017] [Accepted: 05/08/2017] [Indexed: 12/11/2022]
Abstract
The volume and diversity of data in biomedical research have been rapidly increasing in recent years. While such data hold significant promise for accelerating discovery, their use entails many challenges including: the need for adequate computational infrastructure, secure processes for data sharing and access, tools that allow researchers to find and integrate diverse datasets, and standardized methods of analysis. These are just some elements of a complex ecosystem that needs to be built to support the rapid accumulation of these data. The NIH Big Data to Knowledge (BD2K) initiative aims to facilitate digitally enabled biomedical research. Within the BD2K framework, the Commons initiative is intended to establish a virtual environment that will facilitate the use, interoperability, and discoverability of shared digital objects used for research. The BD2K Commons Framework Pilots Working Group (CFPWG) was established to clarify goals and work on pilot projects that address existing gaps toward realizing the vision of the BD2K Commons. This report reviews highlights from a two-day meeting involving the BD2K CFPWG to provide insights on trends and considerations in advancing Big Data science for biomedical research in the United States.
Collapse
Affiliation(s)
- Kathleen M Jagodnik
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1215, New York, NY 10029, USA
| | - Simon Koplev
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1215, New York, NY 10029, USA
| | - Sherry L Jenkins
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1215, New York, NY 10029, USA
| | - Lucila Ohno-Machado
- Health System Department of Biomedical Informatics, University of California San Diego, 9500 Gilman Dr., La Jolla, CA 92083, USA; Health Services Research, San Diego Veterans Administration Health System, San Diego, CA 92083, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High St., Santa Cruz, CA 95060, USA
| | - Stephan C Schurer
- Department of Molecular and Cellular Pharmacology, University of Miami, 331461120 NW 14th Street, CRB 650 (M-857), Miami, FL 33136, USA
| | - Michel Dumontier
- Institute for Data Science, Universiteit Maastricht, Minderbroedersberg 4-6, 6211 LK Maastricht, Netherlands
| | - Ruben Verborgh
- Ghent University - iMinds Research Foundation Flanders, St. Pietersnieuwstraat 33, 9000 Gent, Belgium
| | - Alex Bui
- Department of Radiological Sciences, UCLA School of Medicine, Los Angeles, CA 90095, USA; Department of Bioengineering, UCLA Henri Samueli School of Engineering, Los Angeles, CA 90095, USA
| | - Peipei Ping
- Departments of Physiology, Medicine, and Bioinformatics, UCLA School of Medicine, Los Angeles, CA 90095, USA
| | - Neil J McKenna
- Department of Molecular and Cellular Biology, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 77030, USA
| | - Ravi Madduri
- Department of Mathematics and Computer Science, Argonne National Laboratory, 9700 S. Cass Avenue, Argonne, IL 60439, USA
| | - Ajay Pillai
- Division of Genome Sciences, National Human Genome Research Institute, National Institutes of Health, 31 Center Drive, MSC 2152, 9000 Rockville Pike, Bethesda, MD 20892, USA
| | - Avi Ma'ayan
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1215, New York, NY 10029, USA.
| |
Collapse
|
50
|
Cocos A, Qian T, Callison-Burch C, Masino AJ. Crowd control: Effectively utilizing unscreened crowd workers for biomedical data annotation. J Biomed Inform 2017; 69:86-92. [DOI: 10.1016/j.jbi.2017.04.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Revised: 03/27/2017] [Accepted: 04/02/2017] [Indexed: 01/17/2023]
|