26
|
Penlington M, Silverman H, Vasudevan A, Pavithran P. Plain Language Summaries of Clinical Trial Results: A Preliminary Study to Assess Availability of Easy-to-Understand Summaries and Approaches to Improving Public Engagement. Pharmaceut Med 2020; 34:401-406. [PMID: 33113147 PMCID: PMC7744300 DOI: 10.1007/s40290-020-00359-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/26/2020] [Indexed: 11/30/2022]
Abstract
BACKGROUND Easy-to-understand, stand-alone factual summaries of clinical trial results have the potential to improve public understanding of and engagement with pharmaceutical research. The European Clinical Trial Regulation (EU) No. 536/2014 is a major regulatory initiative that will result in a large number of such plain language summaries (PLSs) posted in the public domain. Today, however, little is known about the extent to which PLSs are written and are available to the general public. OBJECTIVES This preliminary study assessed (i) 20 top pharmaceutical companies' positions on improving transparency and commitment to disclosing trial result summaries in an easy-to-understand format and (ii) the availability of such summaries in the public domain and the ease of locating them via general web searches. METHODS The availability of PLSs in the public domain was estimated based on the number of EudraCT technical result summaries in four disease areas: chronic obstructive pulmonary disease, asthma, meningitis, and influenza. The likelihood of PLSs being easy to find through internet search engine queries by members of the public was assessed using Google. RESULTS All 20 sponsors had committed to improve clinical trial transparency, 17 committed to sharing PLSs with trial participants, and 14 had at least one PLS available in the public domain. A total of 99 clinical studies in these four disease areas had technical summaries posted on EudraCT between 1 January 2017 and 30 June 2020. Of these 99, 14 studies had PLSs in the public domain. A total of 12 of 14 PLSs were directly captured by search engine. However, the sponsor trial identifier or EudraCT number had to be included in the search term to locate them. Generic search terms resulted in large volumes of non-relevant results. CONCLUSION Despite the progressive movement towards clinical trial transparency, easily accessible PLSs on clinical trials are currently scarce. The provision of a European mandate and framework for non-technical result summaries by Regulation (EU) 536/2014 will be a major step to bring about positive change.
Collapse
|
27
|
Lazarus JV, Palayew A, Rasmussen LN, Andersen TH, Nicholson J, Norgaard O. Searching PubMed to Retrieve Publications on the COVID-19 Pandemic: Comparative Analysis of Search Strings. J Med Internet Res 2020; 22:e23449. [PMID: 33197230 PMCID: PMC7695541 DOI: 10.2196/23449] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 10/10/2020] [Accepted: 10/24/2020] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Since it was declared a pandemic on March 11, 2020, COVID-19 has dominated headlines around the world and researchers have generated thousands of scientific articles about the disease. The fast speed of publication has challenged researchers and other stakeholders to keep up with the volume of published articles. To search the literature effectively, researchers use databases such as PubMed. OBJECTIVE The aim of this study is to evaluate the performance of different searches for COVID-19 records in PubMed and to assess the complexity of searches required. METHODS We tested PubMed searches for COVID-19 to identify which search string performed best according to standard metrics (sensitivity, precision, and F-score). We evaluated the performance of 8 different searches in PubMed during the first 10 weeks of the COVID-19 pandemic to investigate how complex a search string is needed. We also tested omitting hyphens and space characters as well as applying quotation marks. RESULTS The two most comprehensive search strings combining several free-text and indexed search terms performed best in terms of sensitivity (98.4%/98.7%) and F-score (96.5%/95.7%), but the single-term search COVID-19 performed best in terms of precision (95.3%) and well in terms of sensitivity (94.4%) and F-score (94.8%). The term Wuhan virus performed the worst: 7.7% for sensitivity, 78.1% for precision, and 14.0% for F-score. We found that deleting a hyphen or space character could omit a substantial number of records, especially when searching with SARS-CoV-2 as a single term. CONCLUSIONS Comprehensive search strings combining free-text and indexed search terms performed better than single-term searches in PubMed, but not by a large margin compared to the single term COVID-19. For everyday searches, certain single-term searches that are entered correctly are probably sufficient, whereas more comprehensive searches should be used for systematic reviews. Still, we suggest additional measures that the US National Library of Medicine could take to support all PubMed users in searching the COVID-19 literature.
Collapse
|
28
|
Asseo K, Fierro F, Slavutsky Y, Frasnelli J, Niv MY. Tracking COVID-19 using taste and smell loss Google searches is not a reliable strategy. Sci Rep 2020; 10:20527. [PMID: 33239650 PMCID: PMC7689487 DOI: 10.1038/s41598-020-77316-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Accepted: 10/29/2020] [Indexed: 02/06/2023] Open
Abstract
Web search tools are widely used by the general public to obtain health-related information, and analysis of search data is often suggested for public health monitoring. We analyzed popularity of searches related to smell loss and taste loss, recently listed as symptoms of COVID-19. Searches on sight loss and hearing loss, which are not considered as COVID-19 symptoms, were used as control. Google Trends results per region in Italy or state in the US were compared to COVID-19 incidence in the corresponding geographical areas. The COVID-19 incidence did not correlate with searches for non-symptoms, but in some weeks had high correlation with taste and smell loss searches, which also correlated with each other. Correlation of the sensory symptoms with new COVID-19 cases for each country as a whole was high at some time points, but decreased (Italy) or dramatically fluctuated over time (US). Smell loss searches correlated with the incidence of media reports in the US. Our results show that popularity of symptom searches is not reliable for pandemic monitoring. Awareness of this limitation is important during the COVID-19 pandemic, which continues to spread and to exhibit new clinical manifestations, and for potential future health threats.
Collapse
|
29
|
Cheng L, Zhou X, Wang F, Xiao L. A State-Level Analysis of Mortality and Google Searches for Pornography: Insight from Life History Theory. ARCHIVES OF SEXUAL BEHAVIOR 2020; 49:3005-3011. [PMID: 32601838 DOI: 10.1007/s10508-020-01765-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 06/04/2020] [Accepted: 06/06/2020] [Indexed: 06/11/2023]
Abstract
Due to the widespread popularity of pornography, some studies explored which individual factors are associated with the frequency of pornography use. However, knowledge about the relationship between socioecological environment and pornography consumption remains scant. Based on life history theory, the current research investigated the association between state-level mortality and search interest for pornography using Google trends. We observed that, in the U.S., the higher mortality or violent crime rate in a state, the stronger search interest for pornography on Google. The results expand the literature regarding the relationship between socioecological environment and individuals' online sexual behavior at the state level.
Collapse
|
30
|
Birnbaum ML, Wen H, Van Meter A, Ernala SK, Rizvi AF, Arenare E, Estrin D, De Choudhury M, Kane JM. Identifying emerging mental illness utilizing search engine activity: A feasibility study. PLoS One 2020; 15:e0240820. [PMID: 33064759 PMCID: PMC7567375 DOI: 10.1371/journal.pone.0240820] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 10/04/2020] [Indexed: 11/18/2022] Open
Abstract
Mental illness often emerges during the formative years of adolescence and young adult development and interferes with the establishment of healthy educational, vocational, and social foundations. Despite the severity of symptoms and decline in functioning, the time between illness onset and receiving appropriate care can be lengthy. A method by which to objectively identify early signs of emerging psychiatric symptoms could improve early intervention strategies. We analyzed a total of 405,523 search queries from 105 individuals with schizophrenia spectrum disorders (SSD, N = 36), non-psychotic mood disorders (MD, N = 38) and healthy volunteers (HV, N = 31) utilizing one year's worth of data prior to the first psychiatric hospitalization. Across 52 weeks, we found significant differences in the timing (p<0.05) and frequency (p<0.001) of searches between individuals with SSD and MD compared to HV up to a year in advance of the first psychiatric hospitalization. We additionally identified significant linguistic differences in search content among the three groups including use of words related to sadness and perception, use of first and second person pronouns, and use of punctuation (all p<0.05). In the weeks before hospitalization, both participants with SSD and MD displayed significant shifts in search timing (p<0.05), and participants with SSD displayed significant shifts in search content (p<0.05). Our findings demonstrate promise for utilizing personal patterns of online search activity to inform clinical care.
Collapse
|
31
|
Pawar AS, Nagpal S, Pawar N, Lerman LO, Eirin A. General Public's Information-Seeking Patterns of Topics Related to Obesity: Google Trends Analysis. JMIR Public Health Surveill 2020; 6:e20923. [PMID: 32633725 PMCID: PMC7448178 DOI: 10.2196/20923] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Revised: 06/25/2020] [Accepted: 07/07/2020] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Obesity is a major public health challenge, and recent literature sheds light on the concept of "normalization" of obesity. OBJECTIVE We aimed to study the worldwide pattern of web-based information seeking by public on obesity and on its related terms and topics using Google Trends. METHODS We compared the relative frequency of obesity-related search terms and topics between 2004 and 2019 on Google Trends. The mean relative interest scores for these terms over the 4-year quartiles were compared. RESULTS The mean relative interest score of the search term "obesity" consistently decreased with time in all four quartiles (2004-2019), whereas the relative interest scores of the search topics "weight loss" and "abdominal obesity" increased. The topic "weight loss" was popular during the month of January, and its median relative interest score for January was higher than that for other months for the entire study period (P<.001). The relative interest score for the search term "obese" decreased over time, whereas those scores for the terms "body positivity" and "self-love" increased after 2013. CONCLUSIONS Despite a worldwide increase in the prevalence of obesity, its popularity as an internet search term diminished over time. The reason for peaks in months should be explored and applied to the awareness campaigns for better effectiveness. These patterns suggest normalization of obesity in society and a rise of public curiosity about image-related obesity rather than its medical implications and harm.
Collapse
|
32
|
Agten A, Van Houtven J, Askenazi M, Burzykowski T, Laukens K, Valkenborg D. Visualizing the agreement of peptide assignments between different search engines. JOURNAL OF MASS SPECTROMETRY : JMS 2020; 55:e4471. [PMID: 31713933 DOI: 10.1002/jms.4471] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 10/23/2019] [Accepted: 10/28/2019] [Indexed: 06/10/2023]
Abstract
There is a trend in the analysis of shotgun proteomics data that aims to combine information from multiple search engines to increase the number of peptide annotations in an experiment. Typically, the degree of search engine complementarity and search engine agreement is visually illustrated by means of Venn diagrams that present the findings of a database search on the level of the nonredundant peptide annotations. We argue this practice to be not fit-for-purpose since the diagrams do not take into account and often conceal the information on complementarity and agreement at the level of the spectrum identification. We promote a new type of visualization that provides insight on the peptide sequence agreement at the level of the peptide-spectrum match (PSM) as a measure of consensus between two search engines with nominal outcomes. We applied the visualizations and percentage sequence agreement to an in-house data set of our benchmark organism, Caenorhabditis elegans, and illustrated that when assessing the agreement between search engine, one should disentangle the notion of PSM confidence and PSM identity. The visualizations presented in this manuscript provide a more informative assessment of pairs of search engines and are made available as an R function in the Supporting Information.
Collapse
|
33
|
Ferhatoglu MF, Kartal A, Filiz Aİ, Kebudi A. Comparison of New Era's Education Platforms, YouTube® and WebSurg®, in Sleeve Gastrectomy. Obes Surg 2020; 29:3472-3477. [PMID: 31172453 DOI: 10.1007/s11695-019-04008-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
INTRODUCTION The Internet is a widely used resource for obtaining medical information. However, the quality of information on online platforms is still debated. Our goal in this quality-controlled WebSurg® and YouTube®-based study was to compare these two online video platforms in terms of the accuracy and quality of information about sleeve gastrectomy videos. METHODS Most viewed (popular) videos returned by YouTube® search engine in response to the keyword "sleeve gastrectomy" were included in the study. The educational accuracy and quality of the videos were evaluated according to known scoring systems. A novel scoring system measured technical quality. The ten most viewed (popular) videos in WebSurg® in response to the keyword "sleeve gastrectomy" were compared with ten YouTube® videos with the highest educational/technical scores. RESULTS Scoring systems measuring the educational accuracy and quality of WebSurg® videos were significantly higher than ten YouTube® videos which have the most top technical scores (p < 0.05), and no significant difference was found in the assessment of ten YouTube® videos that have the highest technical ratings compared with WebSurg® videos (p 0.481). CONCLUSIONS WebSurg® videos, which were passed through a reviewing process and were mostly prepared by academicians, remained below the expected quality. The main limitation of WebSurg® and YouTube® is the lack of information on preoperative and postoperative processes.
Collapse
|
34
|
Verheggen K, Raeder H, Berven FS, Martens L, Barsnes H, Vaudel M. Anatomy and evolution of database search engines-a central component of mass spectrometry based proteomic workflows. MASS SPECTROMETRY REVIEWS 2020; 39:292-306. [PMID: 28902424 DOI: 10.1002/mas.21543] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 07/05/2017] [Indexed: 06/07/2023]
Abstract
Sequence database search engines are bioinformatics algorithms that identify peptides from tandem mass spectra using a reference protein sequence database. Two decades of development, notably driven by advances in mass spectrometry, have provided scientists with more than 30 published search engines, each with its own properties. In this review, we present the common paradigm behind the different implementations, and its limitations for modern mass spectrometry datasets. We also detail how the search engines attempt to alleviate these limitations, and provide an overview of the different software frameworks available to the researcher. Finally, we highlight alternative approaches for the identification of proteomic mass spectrometry datasets, either as a replacement for, or as a complement to, sequence database search engines.
Collapse
|
35
|
Liang X, Xia Z, Jian L, Wang Y, Niu X, Link AJ. A cost-sensitive online learning method for peptide identification. BMC Genomics 2020; 21:324. [PMID: 32334531 PMCID: PMC7183122 DOI: 10.1186/s12864-020-6693-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2019] [Accepted: 03/24/2020] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Post-database search is a key procedure in peptide identification with tandem mass spectrometry (MS/MS) strategies for refining peptide-spectrum matches (PSMs) generated by database search engines. Although many statistical and machine learning-based methods have been developed to improve the accuracy of peptide identification, the challenge remains on large-scale datasets and datasets with a distribution of unbalanced PSMs. A more efficient learning strategy is required for improving the accuracy of peptide identification on challenging datasets. While complex learning models have larger power of classification, they may cause overfitting problems and introduce computational complexity on large-scale datasets. Kernel methods map data from the sample space to high dimensional spaces where data relationships can be simplified for modeling. RESULTS In order to tackle the computational challenge of using the kernel-based learning model for practical peptide identification problems, we present an online learning algorithm, OLCS-Ranker, which iteratively feeds only one training sample into the learning model at each round, and, as a result, the memory requirement for computation is significantly reduced. Meanwhile, we propose a cost-sensitive learning model for OLCS-Ranker by using a larger loss of decoy PSMs than that of target PSMs in the loss function. CONCLUSIONS The new model can reduce its false discovery rate on datasets with a distribution of unbalanced PSMs. Experimental studies show that OLCS-Ranker outperforms other methods in terms of accuracy and stability, especially on datasets with a distribution of unbalanced PSMs. Furthermore, OLCS-Ranker is 15-85 times faster than CRanker.
Collapse
|
36
|
Chen T, Gentry S, Qiu D, Deng Y, Notley C, Cheng G, Song F. Online Information on Electronic Cigarettes: Comparative Study of Relevant Websites From Baidu and Google Search Engines. J Med Internet Res 2020; 22:e14725. [PMID: 32012069 PMCID: PMC7007591 DOI: 10.2196/14725] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 10/16/2019] [Accepted: 12/19/2019] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Online information on electronic cigarettes (e-cigarettes) may influence people's perception and use of e-cigarettes. Websites with information on e-cigarettes in the Chinese language have not been systematically assessed. OBJECTIVE The aim of this study was to assess and compare the types and credibility of Web-based information on e-cigarettes identified from Google (in English) and Baidu (in Chinese) search engines. METHODS We used the keywords vaping or e-cigarettes to conduct a search on Google and the equivalent Chinese characters for Baidu. The first 50 unique and relevant websites from each of the two search engines were included in this analysis. The main characteristics of the websites, credibility of the websites, and claims made on the included websites were systematically assessed and compared. RESULTS Compared with websites on Google, more websites on Baidu were owned by manufacturers or retailers (15/50, 30% vs 33/50, 66%; P<.001). None of the Baidu websites, compared to 24% (12/50) of Google websites, were provided by public or health professional institutions. The Baidu websites were more likely to contain e-cigarette advertising (P<.001) and less likely to provide information on health education (P<.001). The overall credibility of the included Baidu websites was lower than that of the Google websites (P<.001). An age restriction warning was shown on all advertising websites from Google (15/15) but only on 10 of the 33 (30%) advertising websites from Baidu (P<.001). Conflicting or unclear health and social claims were common on the included websites. CONCLUSIONS Although conflicting or unclear claims on e-cigarettes were common on websites from both Baidu and Google search engines, there was a lack of online information from public health authorities in China. Unbiased information and evidence-based recommendations on e-cigarettes should be provided by public health authorities to help the public make informed decisions regarding the use of e-cigarettes.
Collapse
|
37
|
Madej T, Marchler-Bauer A, Lanczycki C, Zhang D, Bryant SH. Biological Assembly Comparison with VAST. Methods Mol Biol 2020; 2112:175-186. [PMID: 32006286 DOI: 10.1007/978-1-0716-0270-6_13] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The VAST+ algorithm is an efficient, simple, and elegant solution to the problem of comparing the atomic structures of biological assemblies. Given two protein assemblies, it takes as input all the pairwise structural alignments of the component proteins. It then clusters the rotation matrices from the pairwise superpositions, with the clusters corresponding to subsets of the two assemblies that may be aligned and well superposed. It uses the Vector Alignment Search Tool (VAST) protein-protein comparison method for the input structural alignments, but other methods could be used, as well. From a chosen cluster, an "original" alignment for the assembly may be defined by simply combining the relevant input alignments. However, it is often useful to reduce/trim the original alignment, using a Monte Carlo refinement algorithm, which allows biologically relevant conformational differences to be more readily detected and observed. The method is easily extended to include RNA or DNA molecules. VAST+ results may be accessed via the URL https://www.ncbi.nlm.nih.gov/Structure , then entering a PDB accession or terms in the search box, and using the link [VAST+] in the upper right corner of the Structure Summary page.
Collapse
|
38
|
Gillum S, Williams N, Brink B, Ross E. Clinician Job Searches in the Internet Era: Internet-Based Study. J Med Internet Res 2019; 21:e12638. [PMID: 31278735 PMCID: PMC6640069 DOI: 10.2196/12638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Revised: 05/20/2019] [Accepted: 05/20/2019] [Indexed: 11/13/2022] Open
Abstract
Background Traditional methods using print media and commercial firms for clinician recruiting are often limited by cost, slow pace, and suboptimal results. An efficient and fiscally sound approach is needed for searching online to recruit clinicians. Objective The aim of the study was to assess the Web-based methods by which clinicians might be searching for jobs in a broad range of specialties and how academic medical centers can advertise clinical job openings to prominently appear on internet searches that would yield the greatest return on investment. Methods We used a search engine (Google) to identify 8 query terms for each of the specialties and specialists (eg, dermatology and dermatologist) to determine internet job search methodologies for 12 clinical disciplines. Searches were conducted, and the data used for analysis were the first 20 results. Results In total, 176 searches were conducted at varying times over the course of several months, and 3520 results were recorded. The following 4 types of websites appeared in the top 10 search results across all specialties searched, accounting for 52.27% (920/1760) of the results: (1) a single no-cost job aggregator (229/1760, 13.01%); (2) 2 prominent journal-based paid digital job listing services (157/1760, 8.92% and 91/1760, 5.17%, respectively); (3) a fee-based Web-based agency (137/1760, 7.78%) offering candidate profiles; and (4) society-based paid advertisements (totaling 306/1760, 17.38%). These sites accounted for 75.45% (664/880) of results limited to the top 5 results. Repetitive short-term testing yielded similar results with minor changes in the rank order. Conclusions On the basis of our findings, we offer a specific financially prudent internet strategy for both clinicians searching the internet for employment and employers hiring clinicians in academic medical centers.
Collapse
|
39
|
Alzu'bi AA, Zhou L, Watzlaf VJM. Genetic Variations and Precision Medicine. PERSPECTIVES IN HEALTH INFORMATION MANAGEMENT 2019; 16:1a. [PMID: 31019429 PMCID: PMC6462879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The time and costs associated with the sequencing of a human genome have decreased significantly in recent years. Many people have chosen to have their genomes sequenced to receive genomics-based personalized healthcare services. To reach the goal of genomics-based precision medicine, health information management (HIM) professionals need to manage and analyze patients' genomic data. Two important pieces of information from the genome sequence are the risk of genetic diseases and the specific medication or pharmacogenomic results for the individual patient, both of which are linked to a patient's genetic variations. In this review article, we introduce genetic variations, including their data types, relevant databases, and some currently available analysis methods and systems. HIM professionals can choose to use these databases, methods, and systems in the management and analysis of patients' genomic data.
Collapse
|
40
|
Helmers L, Horn F, Biegler F, Oppermann T, Müller KR. Automating the search for a patent's prior art with a full text similarity search. PLoS One 2019; 14:e0212103. [PMID: 30830911 PMCID: PMC6398827 DOI: 10.1371/journal.pone.0212103] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 01/28/2019] [Indexed: 11/18/2022] Open
Abstract
More than ever, technical inventions are the symbol of our society’s advance. Patents guarantee their creators protection against infringement. For an invention being patentable, its novelty and inventiveness have to be assessed. Therefore, a search for published work that describes similar inventions to a given patent application needs to be performed. Currently, this so-called search for prior art is executed with semi-automatically composed keyword queries, which is not only time consuming, but also prone to errors. In particular, errors may systematically arise by the fact that different keywords for the same technical concepts may exist across disciplines. In this paper, a novel approach is proposed, where the full text of a given patent application is compared to existing patents using machine learning and natural language processing techniques to automatically detect inventions that are similar to the one described in the submitted document. Various state-of-the-art approaches for feature extraction and document comparison are evaluated. In addition to that, the quality of the current search process is assessed based on ratings of a domain expert. The evaluation results show that our automated approach, besides accelerating the search process, also improves the search results for prior art with respect to their quality.
Collapse
|
41
|
Ćurković M, Košec A. Bubble effect: including internet search engines in systematic reviews introduces selection bias and impedes scientific reproducibility. BMC Med Res Methodol 2018; 18:130. [PMID: 30424741 PMCID: PMC6234590 DOI: 10.1186/s12874-018-0599-2] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2018] [Accepted: 10/30/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Using internet search engines (such as Google search) in systematic literature reviews is increasingly becoming a ubiquitous part of search methodology. In order to integrate the vast quantity of available knowledge, literature mostly focuses on systematic reviews, considered to be principal sources of scientific evidence at all practical levels. Any possible individual methodological flaws present in these systematic reviews have the potential to become systemic. MAIN TEXT This particular bias, that could be referred to as (re)search bubble effect, is introduced because of inherent, personalized nature of internet search engines that tailors results according to derived user preferences based on unreproducible criteria. In other words, internet search engines adjust their user's beliefs and attitudes, leading to the creation of a personalized (re)search bubble, including entries that have not been subjected to rigorous peer review process. The internet search engine algorithms are in a state of constant flux, producing differing results at any given moment, even if the query remains identical. There are many more subtle ways of introducing unwanted variations and synonyms of search queries that are used autonomously, detached from user insight and intent. Even the most well-known and respected systematic literature reviews do not seem immune to the negative implications of the search bubble effect, affecting reproducibility. CONCLUSION Although immensely useful and justified by the need for encompassing the entirety of knowledge, the practice of including internet search engines in systematic literature reviews is fundamentally irreconcilable with recent emphasis on scientific reproducibility and rigor, having a profound impact on the discussion of the limits of scientific epistemology. Scientific research that is not reproducible, may still be called science, but represents one that should be avoided. Our recommendation is to use internet search engines as an additional literature source, primarily in order to validate initial search strategies centered on bibliographic databases.
Collapse
|
42
|
Bikbov B, Perico N, Remuzzi G. A comparison of metrics and performance characteristics of different search strategies for article retrieval for a systematic review of the global epidemiology of kidney and urinary diseases. BMC Med Res Methodol 2018; 18:110. [PMID: 30340535 PMCID: PMC6194627 DOI: 10.1186/s12874-018-0569-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Accepted: 10/04/2018] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Conducting a systematic review requires a comprehensive bibliographic search. Comparing different search strategies is essential for choosing those that cover all useful data sources. Our aim was to develop search strategies for article retrieval for a systematic review of the global epidemiology of kidney and urinary diseases, and evaluate their metrics and performance characteristics that could be useful for other systematic epidemiologic reviews. METHODS We described the methodological framework and analysed approaches applied in the previously conducted systematic review intended to obtain published data for global estimates of the kidney and urinary disease burden. We used several search strategies in PubMed and EMBASE, and compared several metrics: number needed to retrieve (NNR), number of extracted data rows, number of covered countries, and when appropriate, sensitivity, specificity, precision, and accuracy. RESULTS The initial search obtained 29,460 records from PubMed, and 4247 from EMBASE. After the revision, the full text of 381 and 14 articles respectively was obtained for data extraction (the percentage of useful records is 1.3% for PubMed, 0.3% for EMBASE). For PubMed we developed two search strategies and compared them with a 'gold standard' formed by merging their results: free word search strategy (FreeWoSS) was based on the search for keywords in all fields, and subject headings based search strategy (SuHeSS) used only MeSH-mapped conditions and countries names. SuHeSS excluded almost 15% of useful articles and data rows extracted from them, but had a lower NNR of 40 and higher specificity. FreeWoSS had better sensitivity and was able to cover the vast majority of articles and extracted data rows, but had a higher NNR of 65. CONCLUSIONS The sensitive FreeWoSS strategy provides more data for modelling, while the more specific SuHeSS strategy could be used when resources are limited. EMBASE has limited value for our systematic review.
Collapse
|
43
|
Bittremieux W, Meysman P, Noble WS, Laukens K. Fast Open Modification Spectral Library Searching through Approximate Nearest Neighbor Indexing. J Proteome Res 2018; 17:3463-3474. [PMID: 30184435 PMCID: PMC6173621 DOI: 10.1021/acs.jproteome.8b00359] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Open modification searching (OMS) is a powerful search strategy that identifies peptides carrying any type of modification by allowing a modified spectrum to match against its unmodified variant by using a very wide precursor mass window. A drawback of this strategy, however, is that it leads to a large increase in search time. Although performing an open search can be done using existing spectral library search engines by simply setting a wide precursor mass window, none of these tools have been optimized for OMS, leading to excessive runtimes and suboptimal identification results. We present the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. This approach is combined with a cascade search strategy to maximize the number of identified unmodified and modified spectra while strictly controlling the false discovery rate as well as a shifted dot product score to sensitively match modified spectra to their unmodified counterparts. ANN-SoLo achieves state-of-the-art performance in terms of speed and the number of identifications. On a previously published human cell line data set, ANN-SoLo confidently identifies more spectra than SpectraST or MSFragger and achieves a speedup of an order of magnitude compared with SpectraST. ANN-SoLo is implemented in Python and C++. It is freely available under the Apache 2.0 license at https://github.com/bittremieux/ANN-SoLo .
Collapse
|
44
|
Wagner M, Lampos V, Cox IJ, Pebody R. The added value of online user-generated content in traditional methods for influenza surveillance. Sci Rep 2018; 8:13963. [PMID: 30228285 PMCID: PMC6143510 DOI: 10.1038/s41598-018-32029-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 08/28/2018] [Indexed: 11/09/2022] Open
Abstract
There has been considerable work in evaluating the efficacy of using online data for health surveillance. Often comparisons with baseline data involve various squared error and correlation metrics. While useful, these overlook a variety of other factors important to public health bodies considering the adoption of such methods. In this paper, a proposed surveillance system that incorporates models based on recent research efforts is evaluated in terms of its added value for influenza surveillance at Public Health England. The system comprises of two supervised learning approaches trained on influenza-like illness (ILI) rates provided by the Royal College of General Practitioners (RCGP) and produces ILI estimates using Twitter posts or Google search queries. RCGP ILI rates for different age groups and laboratory confirmed cases by influenza type are used to evaluate the models with a particular focus on predicting the onset, overall intensity, peak activity and duration of the 2015/16 influenza season. We show that the Twitter-based models perform poorly and hypothesise that this is mostly due to the sparsity of the data available and a limited training period. Conversely, the Google-based model provides accurate estimates with timeliness of approximately one week and has the potential to complement current surveillance systems.
Collapse
|
45
|
Frinhani RDMD, de Carvalho MAM, Soma NY. A PageRank-based heuristic for the minimization of open stacks problem. PLoS One 2018; 13:e0203076. [PMID: 30161217 PMCID: PMC6117050 DOI: 10.1371/journal.pone.0203076] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Accepted: 08/14/2018] [Indexed: 11/25/2022] Open
Abstract
The minimization of open stacks problem (MOSP) aims to determine the ideal production sequence to optimize the occupation of physical space in manufacturing settings. Most of current methods for solving the MOSP were not designed to work with large instances, precluding their use in specific cases of similar modeling problems. We therefore propose a PageRank-based heuristic to solve large instances modeled in graphs. In computational experiments, both data from the literature and new datasets up to 25 times fold larger in input size than current datasets, totaling 1330 instances, were analyzed to compare the proposed heuristic with state-of-the-art methods. The results showed the competitiveness of the proposed heuristic in terms of quality, as it found optimal solutions in several cases, and in terms of shorter run times compared with the fastest available method. Furthermore, based on specific graph densities, we found that the difference in the value of solutions between methods was small, thus justifying the use of the fastest method. The proposed heuristic is scalable and is more affected by graph density than by size.
Collapse
|
46
|
Richardson ML, Amini B. Teaching Radiology Physics Interactively with Scientific Notebook Software. Acad Radiol 2018; 25:801-810. [PMID: 29751860 DOI: 10.1016/j.acra.2017.11.024] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Revised: 11/01/2017] [Accepted: 11/04/2017] [Indexed: 11/30/2022]
Abstract
RATIONALE AND OBJECTIVES The goal of this study is to demonstrate how the teaching of radiology physics can be enhanced with the use of interactive scientific notebook software. METHODS We used the scientific notebook software known as Project Jupyter, which is free, open-source, and available for the Macintosh, Windows, and Linux operating systems. RESULTS We have created a scientific notebook that demonstrates multiple interactive teaching modules we have written for our residents using the Jupyter notebook system. CONCLUSIONS Scientific notebook software allows educators to create teaching modules in a form that combines text, graphics, images, data, interactive calculations, and image analysis within a single document. These notebooks can be used to build interactive teaching modules, which can help explain complex topics in imaging physics to residents.
Collapse
|
47
|
Hernando L, Mendiburu A, Lozano JA. Anatomy of the Attraction Basins: Breaking with the Intuition. EVOLUTIONARY COMPUTATION 2018; 27:435-466. [PMID: 29786459 DOI: 10.1162/evco_a_00227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Solving combinatorial optimization problems efficiently requires the development of algorithms that consider the specific properties of the problems. In this sense, local search algorithms are designed over a neighborhood structure that partially accounts for these properties. Considering a neighborhood, the space is usually interpreted as a natural landscape, with valleys and mountains. Under this perception, it is commonly believed that, if maximizing, the solutions located in the slopes of the same mountain belong to the same attraction basin, with the peaks of the mountains being the local optima. Unfortunately, this is a widespread erroneous visualization of a combinatorial landscape. Thus, our aim is to clarify this aspect, providing a detailed analysis of, first, the existence of plateaus where the local optima are involved, and second, the properties that define the topology of the attraction basins, picturing a reliable visualization of the landscapes. Some of the features explored in this article have never been examined before. Hence, new findings about the structure of the attraction basins are shown. The study is focused on instances of permutation-based combinatorial optimization problems considering the 2-exchange and the insert neighborhoods. As a consequence of this work, we break away from the extended belief about the anatomy of attraction basins.
Collapse
|
48
|
Keyhani S, Vali M, Cohen B, Woodbridge A, Arenson M, Eilkhani E, Aivadyan C, Hasin D. A search algorithm for identifying likely users and non-users of marijuana from the free text of the electronic medical record. PLoS One 2018; 13:e0193706. [PMID: 29509775 PMCID: PMC5839555 DOI: 10.1371/journal.pone.0193706] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2017] [Accepted: 02/19/2018] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND The harmful effects of marijuana on health and in particular cardiovascular health are understudied. To develop such knowledge, an efficient method of developing an informative cohort of marijuana users and non-users is needed. METHODS We identified patients with a diagnosis of coronary artery disease using ICD-9 codes who were seen in the San Francisco VA in 2015. We imported these patients' medical record notes into an informatics platform that facilitated text searches. We categorized patients into those with evidence of marijuana use in the past 12 months and patients with no such evidence, using the following text strings: "marijuana", "mjx", and "cannabis". We randomly selected 51 users and 51 non-users based on this preliminary classification, and sent a recruitment letter to 97 of these patients who had contact information available. Patients were interviewed on marijuana use and domains related to cardiovascular health. Data on marijuana use collected from the medical record was compared to data collected as part of the interview. RESULTS The interview completion rate was 71%. Among the 35 patients identified by text strings as having used marijuana in the previous year, 15 had used marijuana in the past 30 days (positive predictive value = 42.9%). The probability of use in the past month increased from 8.8% to 42.9% in people who have these keywords in their medical record compared to those who did not have these terms in their medical record. CONCLUSION Methods that combine text search strategies for participant recruitment with health interviews provide an efficient approach to developing prospective cohorts that can be used to study the health effects of marijuana.
Collapse
|
49
|
Volpato EDSN, Betini M, Puga ME, Agarwal A, Cataneo AJM, de Oliveira LD, Bazan R, Braz LG, Pereira JEG, Dib RE. Strategies to optimize MEDLINE and EMBASE search strategies for anesthesiology systematic reviews. An experimental study. SAO PAULO MED J 2018; 136:103-108. [PMID: 29340504 PMCID: PMC9879554 DOI: 10.1590/1516-3180.2017.0277100917] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Accepted: 09/10/2017] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND A high-quality electronic search is essential for ensuring accuracy and comprehensiveness among the records retrieved when conducting systematic reviews. Therefore, we aimed to identify the most efficient method for searching in both MEDLINE (through PubMed) and EMBASE, covering search terms with variant spellings, direct and indirect orders, and associations with MeSH and EMTREE terms (or lack thereof). DESIGN AND SETTING Experimental study. UNESP, Brazil. METHODS We selected and analyzed 37 search strategies that had specifically been developed for the field of anesthesiology. These search strategies were adapted in order to cover all potentially relevant search terms, with regard to variant spellings and direct and indirect orders, in the most efficient manner. RESULTS When the strategies included variant spellings and direct and indirect orders, these adapted versions of the search strategies selected retrieved the same number of search results in MEDLINE (mean of 61.3%) and a higher number in EMBASE (mean of 63.9%) in the sample analyzed. The numbers of results retrieved through the searches analyzed here were not identical with and without associated use of MeSH and EMTREE terms. However, association of these terms from both controlled vocabularies retrieved a larger number of records than did the use of either one of them. CONCLUSIONS In view of these results, we recommend that the search terms used should include both preferred and non-preferred terms (i.e. variant spellings and direct/indirect order of the same term) and associated MeSH and EMTREE terms, in order to develop highly-sensitive search strategies for systematic reviews.
Collapse
|
50
|
Chen J, Scholz U, Zhou R, Lange M. LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions. PLoS Comput Biol 2018; 14:e1006058. [PMID: 29529024 PMCID: PMC5871001 DOI: 10.1371/journal.pcbi.1006058] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Revised: 03/27/2018] [Accepted: 02/23/2018] [Indexed: 11/19/2022] Open
Abstract
In order to access and filter content of life-science databases, full text search is a widely applied query interface. But its high flexibility and intuitiveness is paid for with potentially imprecise and incomplete query results. To reduce this drawback, query assistance systems suggest those combinations of keywords with the highest potential to match most of the relevant data records. Widespread approaches are syntactic query corrections that avoid misspelling and support expansion of words by suffixes and prefixes. Synonym expansion approaches apply thesauri, ontologies, and query logs. All need laborious curation and maintenance. Furthermore, access to query logs is in general restricted. Approaches that infer related queries by their query profile like research field, geographic location, co-authorship, affiliation etc. require user's registration and its public accessibility that contradict privacy concerns. To overcome these drawbacks, we implemented LAILAPS-QSM, a machine learning approach that reconstruct possible linguistic contexts of a given keyword query. The context is referred from the text records that are stored in the databases that are going to be queried or extracted for a general purpose query suggestion from PubMed abstracts and UniProt data. The supplied tool suite enables the pre-processing of these text records and the further computation of customized distributed word vectors. The latter are used to suggest alternative keyword queries. An evaluated of the query suggestion quality was done for plant science use cases. Locally present experts enable a cost-efficient quality assessment in the categories trait, biological entity, taxonomy, affiliation, and metabolic function which has been performed using ontology term similarities. LAILAPS-QSM mean information content similarity for 15 representative queries is 0.70, whereas 34% have a score above 0.80. In comparison, the information content similarity for human expert made query suggestions is 0.90. The software is either available as tool set to build and train dedicated query suggestion services or as already trained general purpose RESTful web service. The service uses open interfaces to be seamless embeddable into database frontends. The JAVA implementation uses highly optimized data structures and streamlined code to provide fast and scalable response for web service calls. The source code of LAILAPS-QSM is available under GNU General Public License version 2 in Bitbucket GIT repository: https://bitbucket.org/ipk_bit_team/bioescorte-suggestion.
Collapse
|