1
|
Vasilakes J, Georgiadis P, Nguyen NT, Miwa M, Ananiadou S. Contextualized medication event extraction with levitated markers. J Biomed Inform 2023; 141:104347. [PMID: 37030658 DOI: 10.1016/j.jbi.2023.104347] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 03/23/2023] [Indexed: 04/09/2023]
Abstract
Automatic extraction of patient medication histories from free-text clinical notes can increase the amount of relevant information to clinicians for developing treatment plans. In addition to detecting medication events, clinical text mining systems must also be able to predict event context, such as negation, uncertainty, and time of occurrence, in order to construct accurate patient timelines. Towards this goal, we introduce Levitated Context Markers (LCMs), a novel transformer-based model for contextualized event extraction. LCMs are an adaptation of levitated markers -originally developed for relation extraction- that allow pretrained transformer models to utilize global input representations while also focusing on event-related subspans using a sparse attention mechanism. In addition to outperforming a strong baseline model on the Contextualized Medication Event Dataset, we show that LCMs' sparse attention can provide interpretable predictions by detecting relevant context cues in an unsupervised manner.
Collapse
|
2
|
Liu G, Li T, Yang A, Zhang X, Qi S, Feng W. Knowledge domains and emerging trends of microglia research from 2002 to 2021: A bibliometric analysis and visualization study. Front Aging Neurosci 2023; 14:1057214. [PMID: 36688156 PMCID: PMC9849393 DOI: 10.3389/fnagi.2022.1057214] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Accepted: 12/12/2022] [Indexed: 01/06/2023] Open
Abstract
Background Microglia have been identified for a century. In this period, their ontogeny and functions have come to light thanks to the tireless efforts of scientists. However, numerous documents are being produced, making it challenging for scholars, especially those new to the field, to understand them thoroughly. Therefore, having a reliable method for quickly grasping a field is crucial. Methods We searched and downloaded articles from the Web of Science Core Collection with "microglia" or "microglial" in the title from 2002 to 2021. Eventually, 12,813 articles were located and, using CiteSpace and VOSviewer, the fundamental data, knowledge domains, hot spots, and emerging trends, as well as the influential literature in the field of microglia research, were analyzed. Results Following 2011, microglia publications grew significantly. The two prominent journals are Glia and J Neuroinflamm. The United States and Germany dominated the microglia study. The primary research institutions are Harvard Univ and Univ Freiburg, and the leading authors are Prinz Marco and Kettenmann Helmut. The knowledge domains of microglia include eight directions, namely neuroinflammation, lipopolysaccharide, aging, neuropathic pain, macrophages, Alzheimer's disease, retina, and apoptosis. Microglial phenotype is the focus of research; while RNA-seq, exosome, and glycolysis are emerging topics, a microglial-specific marker is still a hard stone. We also identified 19 influential articles that contributed to the study of microglial origin (Mildner A 2007; Ginhoux F 2010), identity (Butovsky O 2014), homeostasis (Cardona AE 2006; Elmore MRP 2014); microglial function such as surveillance (Nimmerjahn A 2005), movement (Davalos D 2005; Haynes SE 2006), phagocytosis (Simard AR 2006), and synapse pruning (Wake H 2009; Paolicelli RC 2011; Schafer DP 2012; Parkhurst CN 2013); and microglial state/phenotype associated with disease (Keren-Shaul H 2017), as well as 5 review articles represented by Kettenmann H 2011. Conclusion Using bibliometrics, we have investigated the fundamental data, knowledge structure, and dynamic evolution of microglia research over the previous 20 years. We hope this study can provide some inspiration and a reference for researchers studying microglia in neuroscience.
Collapse
Affiliation(s)
- Guangjie Liu
- Department of Neurosurgery, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Tianhua Li
- Department of Neurosurgery, Xuanwu Hospital, Capital Medical University, Beijing, China,China International Neuroscience Institute (China-INI), Beijing, China
| | - Anming Yang
- Department of Neurosurgery, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Xin Zhang
- Department of Neurosurgery, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Songtao Qi
- Department of Neurosurgery, Nanfang Hospital, Southern Medical University, Guangzhou, China,*Correspondence: Songtao Qi, ✉
| | - Wenfeng Feng
- Department of Neurosurgery, Nanfang Hospital, Southern Medical University, Guangzhou, China,Wenfeng Feng, ✉
| |
Collapse
|
3
|
Metrics and mechanisms: Measuring the unmeasurable in the science of science. J Informetr 2022. [DOI: 10.1016/j.joi.2022.101290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
4
|
Extracting and Measuring Uncertain Biomedical Knowledge from Scientific Statements. JOURNAL OF DATA AND INFORMATION SCIENCE 2022. [DOI: 10.2478/jdis-2022-0008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Abstract
Purpose
Given the information overload of scientific literature, there is an increasing need for computable biomedical knowledge buried in free text. This study aimed to develop a novel approach to extracting and measuring uncertain biomedical knowledge from scientific statements.
Design/methodology/approach
Taking cardiovascular research publications in China as a sample, we extracted subject–predicate–object triples (SPO triples) as knowledge units and unknown/hedging/conflicting uncertainties as the knowledge context. We introduced information entropy (IE) as potential metric to quantify the uncertainty of epistemic status of scientific knowledge represented at subject-object pairs (SO pairs) levels.
Findings
The results indicated an extraordinary growth of cardiovascular publications in China while only a modest growth of the novel SPO triples. After evaluating the uncertainty of biomedical knowledge with IE, we identified the Top 10 SO pairs with highest IE, which implied the epistemic status pluralism. Visual presentation of the SO pairs overlaid with uncertainty provided a comprehensive overview of clusters of biomedical knowledge and contending topics in cardiovascular research.
Research limitations
The current methods didn’t distinguish the specificity and probabilities of uncertainty cue words. The number of sentences surrounding a given triple may also influence the value of IE.
Practical implications
Our approach identified major uncertain knowledge areas such as diagnostic biomarkers, genetic polymorphism and co-existing risk factors related to cardiovascular diseases in China. These areas are suggested to be prioritized; new hypotheses need to be verified, while disputes, conflicts, and contradictions need to be settled.
Originality/value
We provided a novel approach by combining natural language processing and computational linguistics with informetric methods to extract and measure uncertain knowledge from scientific statements.
Collapse
|
5
|
Small H. The confirmation of scientific theories using Bayesian causal networks and citation sentiments. QUANTITATIVE SCIENCE STUDIES 2022. [DOI: 10.1162/qss_a_00189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Abstract
The confirmation of scientific theories is approached by combining Bayesian probabilistic methods, in particular Bayesian causal networks, and the analysis of citing sentences for highly cited papers. It is assumed that causes and their effects can be identified by linguistic methods from the citing sentences and that the cause-and-effect pairs can be equated with theories and their evidence. Further, it is proposed that citation context sentiments for “evidence” and “uncertainty” can be used to supply the required conditional probabilities for Bayesian analysis where data is drawn from citing sentences for highly cited papers from various fields. Hence, the approach combines citation and linguistic methods in a probabilistic framework and, given the small sample of papers, should be considered a feasibility study. Special attention is given to the case of nociception in medicine, and analogies are drawn with various episodes from the history of science such as the Watson and Crick discovery of the structure of DNA and other discoveries where a striking and improbable fit between theory and evidence leads to a sense of confirmation.
Peer Review
https://publons.com/publon/10.1162/qss_a_00189
Collapse
Affiliation(s)
- Henry Small
- SciTech Strategies Inc., Bala Cynwyd, PA 19004 (USA)
| |
Collapse
|
6
|
Lamers WS, Boyack K, Larivière V, Sugimoto CR, van Eck NJ, Waltman L, Murray D. Investigating disagreement in the scientific literature. eLife 2021; 10:72737. [PMID: 34951588 PMCID: PMC8709576 DOI: 10.7554/elife.72737] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 11/26/2021] [Indexed: 11/13/2022] Open
Abstract
Disagreement is essential to scientific progress but the extent of disagreement in science, its evolution over time, and the fields in which it happens remain poorly understood. Here we report the development of an approach based on cue phrases that can identify instances of disagreement in scientific articles. These instances are sentences in an article that cite other articles. Applying this approach to a collection of more than four million English-language articles published between 2000 and 2015 period, we determine the level of disagreement in five broad fields within the scientific literature (biomedical and health sciences; life and earth sciences; mathematics and computer science; physical sciences and engineering; and social sciences and humanities) and 817 meso-level fields. Overall, the level of disagreement is highest in the social sciences and humanities, and lowest in mathematics and computer science. However, there is considerable heterogeneity across the meso-level fields, revealing the importance of local disciplinary cultures and the epistemic characteristics of disagreement. Analysis at the level of individual articles reveals notable episodes of disagreement in science, and illustrates how methodological artifacts can confound analyses of scientific texts.
Collapse
Affiliation(s)
- Wout S Lamers
- Centre for Science and Technology Studies, Leiden University, Leiden, Netherlands
| | - Kevin Boyack
- SciTech Strategies, Inc, Albuquerque, United States
| | - Vincent Larivière
- École de bibliothéconomie et des sciences de l'information, Université de Montréal, Montreal, Canada
| | - Cassidy R Sugimoto
- School of Public Policy, Georgia Institute of Technology, Atlanta, United States
| | - Nees Jan van Eck
- Centre for Science and Technology Studies, Leiden University, Leiden, Netherlands
| | - Ludo Waltman
- Centre for Science and Technology Studies, Leiden University, Leiden, Netherlands
| | - Dakota Murray
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, United States
| |
Collapse
|
7
|
Xie Y, Lang D, Lin S, Chen F, Sang X, Gu P, Wu R, Li Z, Zhu X, Ji L. Mapping Maternal Health in the New Media Environment: A Scientometric Analysis. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:13095. [PMID: 34948706 PMCID: PMC8700903 DOI: 10.3390/ijerph182413095] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 11/30/2021] [Accepted: 12/06/2021] [Indexed: 11/16/2022]
Abstract
BACKGROUND The new media provides a convenient platform to access, use and exchange health information. And as a special group of health care, maternal health care is still of international concern due to their high mortality rate. Scientific research is a good way to provide advice on how to improve maternal health through stringent reasoning and accurate data. However, the dramatic increase of publications, the diversity of themes, and the dispersion of researchers may reduce the quality of information and increase the difficulty of selection. Thus, this study aims to analyze the research progress on maternal health under the global new media environment, exploring the current research hotspots and frontiers. METHODS A scientometric analysis was carried out by CiteSpace5.7.R1. In total, 2270 articles have been further analyzed to explore top countries and institutions, potential articles, research frontiers, and hotspots. RESULTS The publications ascended markedly, from 29 in 2008 to 472 publications by 2020. But there is still a lot of room to grow, and the growth rate does not conform to the Price's Law. Research centers concentrated in Latin America, such as the University of Toronto and the University of California. The work of Larsson M, Lagan BM and Tiedje L had high potential influence. Most of the research subjects were maternal and newborn babies, and the research frontiers were distributed in health education and psychological problems. Maternal mental health, nutrition, weight, production technology, and equipment were seemingly hotspots. CONCLUSION The new media has almost brought a new era for maternal health, mainly characterized by psychological qualities, healthy and reasonable physical conditions and advanced technology.
Collapse
Affiliation(s)
- Yinghua Xie
- School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China; (Y.X.); (D.L.); (S.L.); (F.C.)
- Research Center for Rural Health Service, Key Research Institute of Humanities and Social Sciences of Hubei Provincial Department of Education, Wuhan 430030, China
| | - Dong Lang
- School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China; (Y.X.); (D.L.); (S.L.); (F.C.)
- Research Center for Rural Health Service, Key Research Institute of Humanities and Social Sciences of Hubei Provincial Department of Education, Wuhan 430030, China
| | - Shuna Lin
- School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China; (Y.X.); (D.L.); (S.L.); (F.C.)
- Research Center for Rural Health Service, Key Research Institute of Humanities and Social Sciences of Hubei Provincial Department of Education, Wuhan 430030, China
| | - Fangfei Chen
- School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China; (Y.X.); (D.L.); (S.L.); (F.C.)
- Research Center for Rural Health Service, Key Research Institute of Humanities and Social Sciences of Hubei Provincial Department of Education, Wuhan 430030, China
| | - Xiaodong Sang
- China Biotechnology Development Center, Beijing 100039, China; (X.S.); (R.W.); (Z.L.)
| | - Peng Gu
- China Science and Technology Exchange Center, Beijing 100045, China;
| | - Ruijun Wu
- China Biotechnology Development Center, Beijing 100039, China; (X.S.); (R.W.); (Z.L.)
| | - Zhifei Li
- China Biotechnology Development Center, Beijing 100039, China; (X.S.); (R.W.); (Z.L.)
| | - Xuan Zhu
- School of Computer, Central China Normal University, Wuhan 430079, China
| | - Lu Ji
- School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China; (Y.X.); (D.L.); (S.L.); (F.C.)
- Research Center for Rural Health Service, Key Research Institute of Humanities and Social Sciences of Hubei Provincial Department of Education, Wuhan 430030, China
| |
Collapse
|
8
|
Jaiswal A, Tang L, Ghosh M, Rousseau JF, Peng Y, Ding Y. RadBERT-CL: Factually-Aware Contrastive Learning For Radiology Report Classification. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2021; 158:196-208. [PMID: 35498230 PMCID: PMC9055736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Radiology reports are unstructured and contain the imaging findings and corresponding diagnoses transcribed by radiologists which include clinical facts and negated and/or uncertain statements. Extracting pathologic findings and diagnoses from radiology reports is important for quality control, population health, and monitoring of disease progress. Existing works, primarily rely either on rule-based systems or transformer-based pre-trained model fine-tuning, but could not take the factual and uncertain information into consideration, and therefore generate false positive outputs. In this work, we introduce three sedulous augmentation techniques which retain factual and critical information while generating augmentations for contrastive learning. We introduce RadBERT-CL, which fuses these information into BlueBert via a self-supervised contrastive loss. Our experiments on MIMIC-CXR show superior performance of RadBERT-CL on fine-tuning for multi-class, multi-label report classification. We illustrate that when few labeled data are available, RadBERT-CL outperforms conventional SOTA transformers (BERT/BlueBert) by significantly larger margins (6-11%). We also show that the representations learned by RadBERT-CL can capture critical medical information in the latent space.
Collapse
Affiliation(s)
- Ajay Jaiswal
- The University of Texas at Austin, United States
| | - Liyan Tang
- The University of Texas at Austin, United States
| | | | | | | | - Ying Ding
- The University of Texas at Austin, United States
| |
Collapse
|
9
|
Relationships between method-section citation rates and citation contexts: evidence from highly cited references in psychology. ONLINE INFORMATION REVIEW 2021. [DOI: 10.1108/oir-03-2021-0171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeThe Method section of research articles offers an important space for researchers to describe their research processes and research objects they utilize. To understand the relationship between these research materials and their representations in scientific publications, this paper offers a quantitative examination of the citation contexts of the most frequently cited references in the Method section of the paper sample, many of which belong to the category of research material objects.Design/methodology/approachIn this research, the authors assessed the extent to which these references appear in the Method section, which is regarded as an indicator of the instrumentality of the reference. The authors also examined how this central measurement is connected to its other citation contexts, such as key linguistic attributes and verbs that are used in citation sentences.FindingsThe authors found that a series of key linguistic attributes can be used to predict the instrumentality of a reference. The use of self-mention phrases and the readability score of the citances are especially strong predictors, along with boosters and hedges, the two measurements that were not included in the final model.Research limitations/implicationsThis research focuses on a single research domain, psychology, which limits the understanding of how research material objects are cited in different research domains or interdisciplinary research contexts. Moreover, this research is based on 200 frequently cited references, which are unable to represent all references cited in psychological publications.Practical implicationsWith the identified relationship between instrumental citation contexts and other characteristics of citation sentences, this research opens the possibility of more accurately identifying research material objects from scientific references, the most accessible scholarly data.Originality/valueThis is the first large-scale, quantitative analysis of the linguistic features of citations to research material objects. This study offers important baseline results for future studies focusing on scientific instruments, an increasingly important type of object involved in scientific research.Peer reviewThe peer review history for this article is available at: 10.1108/OIR-03-2021-0171
Collapse
|
10
|
Boguslav MR, Salem NM, White EK, Leach SM, Hunter LE. Identifying and classifying goals for scientific knowledge. BIOINFORMATICS ADVANCES 2021; 1:vbab012. [PMID: 34661112 PMCID: PMC8508177 DOI: 10.1093/bioadv/vbab012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 06/17/2021] [Indexed: 01/26/2023]
Abstract
MOTIVATION Science progresses by posing good questions, yet work in biomedical text mining has not focused on them much. We propose a novel idea for biomedical natural language processing: identifying and characterizing the questions stated in the biomedical literature. Formally, the task is to identify and characterize statements of ignorance, statements where scientific knowledge is missing or incomplete. The creation of such technology could have many significant impacts, from the training of PhD students to ranking publications and prioritizing funding based on particular questions of interest. The work presented here is intended as the first step towards these goals. RESULTS We present a novel ignorance taxonomy driven by the role statements of ignorance play in research, identifying specific goals for future scientific knowledge. Using this taxonomy and reliable annotation guidelines (inter-annotator agreement above 80%), we created a gold standard ignorance corpus of 60 full-text documents from the prenatal nutrition literature with over 10 000 annotations and used it to train classifiers that achieved over 0.80 F1 scores. AVAILABILITY AND IMPLEMENTATION Corpus and source code freely available for download at https://github.com/UCDenver-ccp/Ignorance-Question-Work. The source code is implemented in Python.
Collapse
Affiliation(s)
- Mayla R Boguslav
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA,To whom correspondence should be addressed.
| | - Nourah M Salem
- Health Informatics Program, College of Health Solutions at Arizona State University, Phoenix, AZ 85004, USA
| | - Elizabeth K White
- Center for Genes, Environment and Health, National Jewish Health, Denver, CO 80206, USA
| | - Sonia M Leach
- Center for Genes, Environment and Health, National Jewish Health, Denver, CO 80206, USA
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| |
Collapse
|
11
|
Wolfram D, Wang P, Abuzahra F. An exploration of referees’ comments published in open peer review journals: The characteristics of review language and the association between review scrutiny and citations. RESEARCH EVALUATION 2021. [DOI: 10.1093/reseval/rvab005] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Abstract
Journals that adopt open peer review (OPR), where review reports of published articles are publicly available, provide an opportunity to study both review content characteristics and quantitative aspects of the overall review process. This study investigates two areas relevant to the quality assessment of manuscript reviews. First, do journal policies for reviewers to identify themselves influence how reviewers evaluate the merits of a manuscript based on the relative frequency of hedging terms and research-related terms appearing in their reviews? Second, is there an association between the number of reviews/reviewers and the manuscript’s research impact once published as measured by citations? We selected reviews for articles published in 17 OPR journals from 2017 to 2018 to examine the incidence of reviewers’ uses of hedging terms and research-related terms. The results suggest that there was little difference in the relative use of hedging term usage regardless of whether reviewers were required to identify themselves or if this was optional, indicating that the use of hedging in review contents was not influenced by journal requirements for reviewers to identify themselves. There was a larger difference observed for research-related terminology. We compared the total number of reviews for a manuscript, rounds of revisions, and the number of reviewers with the number of Web of Science citations the article received since publication. The findings reveal that scrutiny by more reviewers or conducting more reviews or rounds of review do not result in more impactful papers for most of the journals studied. Implications for peer review practice are discussed.
Collapse
Affiliation(s)
- Dietmar Wolfram
- School of Information Studies, University of Wisconsin-Milwaukee, P.O. Box 413, Milwaukee, WI 53201, USA
| | - Peiling Wang
- iSchool, University of Tennessee-Knoxville, 1345 Circle Park Drive, 451 Communications Building, Knoxville, TN 37996, USA
| | - Fuad Abuzahra
- College of Engineering and Applied Sciences, University of Wisconsin-Milwaukee, P.O. Box 413, Milwaukee, WI 53201, USA
| |
Collapse
|
12
|
Towards medical knowmetrics: representing and computing medical knowledge using semantic predications as the knowledge unit and the uncertainty as the knowledge context. Scientometrics 2021; 126:6225-6251. [PMID: 33612884 PMCID: PMC7882417 DOI: 10.1007/s11192-021-03880-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 01/19/2021] [Indexed: 11/05/2022]
Abstract
In China, Prof. Hongzhou Zhao and Zeyuan Liu are the pioneers of the concept “knowledge unit” and “knowmetrics” for measuring knowledge. However, the definition on “computable knowledge object” remains controversial so far in different fields. For example, it is defined as (1) quantitative scientific concept in natural science and engineering, (2) knowledge point in the field of education research, and (3) semantic predications, i.e., Subject-Predicate-Object (SPO) triples in biomedical fields. The Semantic MEDLINE Database (SemMedDB), a high-quality public repository of SPO triples extracted from medical literature, provides a basic data infrastructure for measuring medical knowledge. In general, the study of extracting SPO triples as computable knowledge unit from unstructured scientific text has been overwhelmingly focusing on scientific knowledge per se. Since the SPO triples would be possibly extracted from hypothetical, speculative statements or even conflicting and contradictory assertions, the knowledge status (i.e., the uncertainty), which serves as an integral and critical part of scientific knowledge has been largely overlooked. This article aims to put forward a framework for Medical Knowmetrics using the SPO triples as the knowledge unit and the uncertainty as the knowledge context. The lung cancer publications dataset is used to validate the proposed framework. The uncertainty of medical knowledge and how its status evolves over time indirectly reflect the strength of competing knowledge claims, and the probability of certainty for a given SPO triple. We try to discuss the new insights using the uncertainty-centric approaches to detect research fronts, and identify knowledge claims with high certainty level, in order to improve the efficacy of knowledge-driven decision support.
Collapse
|
13
|
Chen C. A Glimpse of the First Eight Months of the COVID-19 Literature on Microsoft Academic Graph: Themes, Citation Contexts, and Uncertainties. Front Res Metr Anal 2020; 5:607286. [PMID: 33870064 PMCID: PMC8025977 DOI: 10.3389/frma.2020.607286] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 11/25/2020] [Indexed: 12/15/2022] Open
Abstract
As scientists worldwide search for answers to the overwhelmingly unknown behind the deadly pandemic, the literature concerning COVID-19 has been growing exponentially. Keeping abreast of the body of literature at such a rapidly advancing pace poses significant challenges not only to active researchers but also to society as a whole. Although numerous data resources have been made openly available, the analytic and synthetic process that is essential in effectively navigating through the vast amount of information with heightened levels of uncertainty remains a significant bottleneck. We introduce a generic method that facilitates the data collection and sense-making process when dealing with a rapidly growing landscape of a research domain such as COVID-19 at multiple levels of granularity. The method integrates the analysis of structural and temporal patterns in scholarly publications with the delineation of thematic concentrations and the types of uncertainties that may offer additional insights into the complexity of the unknown. We demonstrate the application of the method in a study of the COVID-19 literature.
Collapse
Affiliation(s)
- Chaomei Chen
- College of Computing and Informatics, Drexel University, Philadelphia, PA, United States
| |
Collapse
|
14
|
Cheng Q, Wang J, Lu W, Huang Y, Bu Y. Keyword-citation-keyword network: a new perspective of discipline knowledge structure analysis. Scientometrics 2020. [DOI: 10.1007/s11192-020-03576-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
15
|
Hou J, Yang X, Chen C. Measuring researchers' potential scholarly impact with structural variations: Four types of researchers in information science (1979-2018). PLoS One 2020; 15:e0234347. [PMID: 32569295 PMCID: PMC7307741 DOI: 10.1371/journal.pone.0234347] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 05/22/2020] [Indexed: 11/25/2022] Open
Abstract
We propose a method to measure the potential scholarly impact of researchers based on network structural variations they introduced to the underlying author co-citation network of their field. We applied the method to the information science field based on 91,978 papers published between 1979 and 2018 from the Web of Science. We divided the entire period into eight consecutive intervals and measured structural variation change rates (ΔM) of individual authors in corresponding author co-citation networks. Four types of researchers are identified in terms of temporal dynamics of their potential scholarly impact—1) Increasing, 2) Decreasing, 3) Sustained, and 4) Transient. The study contributes to the understanding of how researchers’ scholarly impact might evolve in a broad context of the corresponding research community. Specifically, this study illustrated a crucial role played by structural variation metrics in measuring and explaining the potential scholarly impact of a researcher. This method based on the structural variation analysis offers a theoretical framework and a practical platform to analyze the potential scholarly impact of researchers and their specific contributions.
Collapse
Affiliation(s)
- Jianhua Hou
- School of Information Management, Sun Yat-sen University, Panyu District, Guangzhou, Guangdong, China
- * E-mail:
| | - Xiucai Yang
- College of Economics and Management, Dalian University, Dalian Economic Technological Development Zone, Dalian, China
| | - Chaomei Chen
- College of Computing and Informatics, Drexel University, Philadelphia, PA, United States of America
- Department of Information Science, Yonsei University, Seoul, Republic of Korea
| |
Collapse
|
16
|
Omero P, Valotto M, Bellana R, Bongelli R, Riccioni I, Zuczkowski A, Tasso C. Writer’s uncertainty identification in scientific biomedical articles: a tool for automatic if-clause tagging. LANG RESOUR EVAL 2020. [DOI: 10.1007/s10579-020-09491-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
AbstractIn a previous study, we manually identified seven categories (verbs, non-verbs, modal verbs in the simple present, modal verbs in the conditional mood, if, uncertain questions, and epistemic future) of Uncertainty Markers (UMs) in a corpus of 80 articles from the British Medical Journal randomly sampled from a 167-year period (1840–2007). The UMs detected on the base of an epistemic stance approach were those referring only to the authors of the articles and only in the present. We also performed preliminary experiments to assess the manual annotated corpus and to establish a baseline for the UMs automatic detection. The results of the experiments showed that most UMs could be recognized with good accuracy, except for the if-category, which includes four subcategories: if-clauses in a narrow sense; if-less clauses; as if/as though; if and whether introducing embedded questions. The unsatisfactory results concerning the if-category were probably due to both its complexity and the inadequacy of the detection rules, which were only lexical, not grammatical. In the current article, we describe a different approach, which combines grammatical and syntactic rules. The performed experiments show that the identification of uncertainty in the if-category has been largely double improved compared to our previous results. The complex overall process of uncertainty detection can greatly profit from a hybrid approach which should combine supervised Machine learning techniques with a knowledge-based approach constituted by a rule-based inference engine devoted to the if-clause case and designed on the basis of the above mentioned epistemic stance approach.
Collapse
|
17
|
Song M, Kang KY, Timakum T, Zhang X. Examining influential factors for acknowledgements classification using supervised learning. PLoS One 2020; 15:e0228928. [PMID: 32059035 PMCID: PMC7021295 DOI: 10.1371/journal.pone.0228928] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2019] [Accepted: 01/26/2020] [Indexed: 01/04/2023] Open
Abstract
Acknowledgements have been examined as important elements in measuring the contributions to and intellectual debts of a scientific publication. Unlike previous studies that were limited in the scope of analysis and manual examination. The present study aimed to conduct the automatic classification of acknowledgements on a large scale of data. To this end, we first created a training dataset for acknowledgements classification by sampling the acknowledgements sections from the entire PubMed Central database. Second, we adopted various supervised learning algorithms to examine which algorithm performed best in what condition. In addition, we observed the factors affecting classification performance. We investigated the effects of the following three main aspects: classification algorithms, categories, and text representations. The CNN+Doc2Vec algorithm achieved the highest performance of 93.58% accuracy in the original dataset and 87.93% in the converted dataset. The experimental results indicated that the characteristics of categories and sentence patterns influenced the performance of classification. Most of the classifiers performed better on the categories of financial, peer interactive communication, and technical support compared to other classes.
Collapse
Affiliation(s)
- Min Song
- Department of Library and Information Science, Yonsei University, Seoul, Korea
- * E-mail:
| | - Keun Young Kang
- Department of Library and Information Science, Yonsei University, Seoul, Korea
| | - Tatsawan Timakum
- Department of Library and Information Science, Yonsei University, Seoul, Korea
- Department of Information Sciences, Chiang Mai Rajabhat University, Chiang Mai, Thailand
| | - Xinyuan Zhang
- School of Information Management, Wuhan University, Hubei, China
| |
Collapse
|
18
|
Bornmann L, Wray KB, Haunschild R. Citation concept analysis (CCA): a new form of citation analysis revealing the usefulness of concepts for other researchers illustrated by exemplary case studies including classic books by Thomas S. Kuhn and Karl R. Popper. Scientometrics 2019. [DOI: 10.1007/s11192-019-03326-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
AbstractIn recent years, the full text of papers are increasingly available electronically which opens up the possibility of quantitatively investigating citation contexts in more detail. In this study, we introduce a new form of citation analysis, which we call citation concept analysis (CCA). CCA is intended to reveal the cognitive impact certain concepts—published in a highly-cited landmark publication—have on the citing authors. It counts the number of times the concepts are mentioned (cited) in the citation context of citing publications. We demonstrate the method using three classical highly cited books: (1) The structure of scientific revolutions by Thomas S. Kuhn, (2) The logic of scientific discovery—Logik der Forschung: Zur Erkenntnistheorie der modernen Naturwissenschaft in German—, and (3) Conjectures and refutations: the growth of scientific knowledge by Karl R. Popper. It is not surprising—as our results show—that Kuhn’s “paradigm” concept seems to have had a significant impact. What is surprising is that our results indicate a much larger impact of the concept “paradigm” than Kuhn’s other concepts, e.g., “scientific revolution”. The paradigm concept accounts for about 40% of the concept-related citations to Kuhn’s work, and its impact is resilient across all disciplines and over time. With respect to Popper, “falsification” is the most used concept derived from his books. Falsification is the cornerstone of Popper’s critical rationalism.
Collapse
|
19
|
Daud A, Amjad T, Siddiqui MA, Aljohani NR, Abbasi RA, Aslam MA. Correlational analysis of topic specificity and citations count of publication venues. LIBRARY HI TECH 2019. [DOI: 10.1108/lht-03-2018-0042] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
Citation analysis is an important measure for the assessment of quality and impact of academic entities (authors, papers and publication venues) used for ranking of research articles, authors and publication venues. It is a common observation that high-level publication venues, with few exceptions (Nature, Science and PLOS ONE), are usually topic specific. The purpose of this paper is to investigate the claim correlation analysis between topic specificity and citation count of different types of publication venues (journals, conferences and workshops).
Design/methodology/approach
The topic specificity was calculated using the information theoretic measure of entropy (which tells us about the disorder of the system). The authors computed the entropy of the titles of the papers published in each venue type to investigate their topic specificity.
Findings
It was observed that venues usually with higher citations (high-level publication venues) have low entropy and venues with lesser citations (not-high-level publication venues) have high entropy. Low entropy means less disorder and more specific to topic and vice versa. The input data considered here were DBLP-V7 data set for the last 10 years. Experimental analysis shows that topic specificity and citation count of publication venues are negatively correlated to each other.
Originality/value
This paper is the first attempt to discover correlation between topic sensitivity and citation counts of publication venues. It also used topic specificity as a feature to rank academic entities.
Collapse
|