1
|
Maleki A, Abbaspour J, Jowkar A, Sotudeh H. Role of citation and non-citation metrics in predicting the educational impact of textbooks. LIBRARY HI TECH 2023. [DOI: 10.1108/lht-06-2022-0297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
PurposeThe main objective of the present study is to determine the role of citation-based metrics (PageRank and HITS’ authority and hub scores) and non-citation metrics (Goodreads readers, reviews and ratings, textbook edition counts) in predicting educational ranks of textbooks.Design/methodology/approachThe rankings of 1869 academic textbooks of various disciplines indexed in Scopus were extracted from the Open Syllabus Project (OSP) and compared with normalized counts of Scopus citations, scores of PageRank, authority and hub (HITS) in Scopus book-to-book citation network, Goodreads ratings and reviews, review sentiment scores and WorldCat book editions.FindingsPrediction of the educational rank of scholarly syllabus books ranged from 32% in technology to 68% in philosophy, psychology and religion. WorldCat editions in social sciences, medicine and technology, Goodreads ratings in humanities, and book-citation-network authority scores in law and political science accounted for the strongest predictions of the educational score. Thus, each indicator of editions, Goodreads ratings, and book citation authority score alone can be used to show the rank of the academic textbooks, and if used in combination, they will help explain the educational uptake of books even better.Originality/valueThis is the first study examining the role of citation indicators, Goodreads readers, reviews and ratings in predicting the OSP rank of academic books.
Collapse
|
2
|
Lyu X, Costas R. Studying the cognitive relatedness between topics in the global science landscape: The case of Big Data research. J Inf Sci 2022. [DOI: 10.1177/01655515221121970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Taking Big Data research as a case study, this article intends to investigate the cognitive relatedness of research topics across the global science landscape to a focal topic. Several levels of cognitive relatedness are established depending on the citation distance between the citing publications and a core set of publications. The concept of citation generation is adopted for identifying and classifying other publications with different levels of relatedness to the core set. The micro publication-level classification system of Centre for Science and Technology Studies (CWTS) is applied for determining clusters of publication sets at the topic level. The overall cognitive relatedness of micro clusters to Big Data core publications are measured based on the mean citation generation of all the publications in corresponding clusters. In addition to the given clusters, this study also explores the ‘topics’ relatedness from a semantic point of view, by extracting high-frequency title terms of publications in each generation. Results show that data analysis methods and technologies are the topics with the strongest cognitive relatedness to Big Data research, while topics on physics and astronomy studies present the weakest relatedness. This approach allows assessment of relatedness between research topics by considering the citations distribution across multiple citation generations, and can provide useful insights to study and characterise topics with fuzzy boundaries or are difficult to delineate, thus representing a novel toolset relevant in the context of studying interdisciplinary research.
Collapse
Affiliation(s)
- Xiaozan Lyu
- Department of Administrative Management, School of Law, Zhejiang University City College, China
| | - Rodrigo Costas
- Centre for Science and Technology Studies (CWTS), Leiden University, The Netherlands; Centre for Research on Evaluation, Science and Technology (CREST), Stellenbosch University, South Africa
| |
Collapse
|
3
|
Wedell E, Park M, Korobskiy D, Warnow T, Chacko G. Center-Periphery Structure in Research Communities. QUANTITATIVE SCIENCE STUDIES 2022. [DOI: 10.1162/qss_a_00184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Abstract
Clustering and community detection in networks are of broad interest and have been the subject of extensive research that spans several fields. We are interested in the relatively narrow question of detecting communities of scientific publications that are linked by citations. These publication communities can be used to identify scientists with shared interests who form communities of researchers. Building on the well-known k-core algorithm, we have developed a modular pipeline to find publication communities with center-periphery structure. Using a quantitative and qualitative approach, we evaluate community finding results on a citation network consisting of over 14 million publications relevant to the field of extracellular vesicles. We compare our approach to communities discovered by the widely used Leiden algorithm for community finding.
Collapse
Affiliation(s)
- Eleanor Wedell
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801
| | - Minhyuk Park
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801
| | | | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801
| | - George Chacko
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801
- Office of Research, Grainger College of Engineering, University of Illinois Urbana-Champaign, Urbana, IL 61801
| |
Collapse
|
4
|
Eykens J, Guns R, Engels TCE. Fine-grained classification of social science journal articles using
textual data: A comparison of supervised machine learning
approaches. QUANTITATIVE SCIENCE STUDIES 2021. [DOI: 10.1162/qss_a_00106] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Abstract
We compare two supervised machine learning algorithms—Multinomial Naïve Bayes and Gradient Boosting—to classify social science articles using textual data. The high level of granularity of the classification scheme used and the possibility that multiple categories are assigned to a document make this task challenging. To collect the training data, we query three discipline specific thesauri to retrieve articles corresponding to specialties in the classification. The resulting data set consists of 113,909 records and covers 245 specialties, aggregated into 31 subdisciplines from three disciplines. Experts were consulted to validate the thesauri-based classification. The resulting multilabel data set is used to train the machine learning algorithms in different configurations. We deploy a multilabel classifier chaining model, allowing for an arbitrary number of categories to be assigned to each document. The best results are obtained with Gradient Boosting. The approach does not rely on citation data. It can be applied in settings where such information is not available. We conclude that fine-grained text-based classification of social sciences publications at a subdisciplinary level is a hard task, for humans and machines alike. A combination of human expertise and machine learning is suggested as a way forward to improve the classification of social sciences documents.
Collapse
Affiliation(s)
- Joshua Eykens
- Centre for R&D Monitoring (ECOOM), Faculty of Social Sciences, University of Antwerp, Middelheimlaan 1, 2020 Antwerp, Belgium
| | - Raf Guns
- Centre for R&D Monitoring (ECOOM), Faculty of Social Sciences, University of Antwerp, Middelheimlaan 1, 2020 Antwerp, Belgium
| | - Tim C. E. Engels
- Centre for R&D Monitoring (ECOOM), Faculty of Social Sciences, University of Antwerp, Middelheimlaan 1, 2020 Antwerp, Belgium
| |
Collapse
|
5
|
Sjögårde P, Ahlgren P, Waltman L. Algorithmic labeling in hierarchical classifications of publications: Evaluation of bibliographic fields and term weighting approaches. J Assoc Inf Sci Technol 2021. [DOI: 10.1002/asi.24452] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Peter Sjögårde
- Health Informatics Centre, Department of Learning, Informatics, Management and Ethics Karolinska Institutet Stockholm Sweden
- University Library, Karolinska Institutet Stockholm Sweden
| | - Per Ahlgren
- Department of Statistics Uppsala University Uppsala Sweden
| | - Ludo Waltman
- Centre for Science and Technology Studies Leiden University Leiden The Netherlands
| |
Collapse
|
6
|
Does faculty disciplinary background play a role in the publication pattern of an interdisciplinary research area? The case of science education in Brazil. Scientometrics 2020. [DOI: 10.1007/s11192-020-03593-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
7
|
Waltman L, Boyack KW, Colavizza G, van Eck NJ. A principled methodology for comparing relatedness measures for clustering publications. QUANTITATIVE SCIENCE STUDIES 2020. [DOI: 10.1162/qss_a_00035] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
There are many different relatedness measures, based for instance on citation relations or textual similarity, that can be used to cluster scientific publications. We propose a principled methodology for evaluating the accuracy of clustering solutions obtained using these relatedness measures. We formally show that the proposed methodology has an important consistency property. The empirical analyses that we present are based on publications in the fields of cell biology, condensed matter physics, and economics. Using the BM25 text-based relatedness measure as the evaluation criterion, we find that bibliographic coupling relations yield more accurate clustering solutions than direct citation relations and cocitation relations. The so-called extended direct citation approach performs similarly to or slightly better than bibliographic coupling in terms of the accuracy of the resulting clustering solutions. The other way around, using a citation-based relatedness measure as evaluation criterion, BM25 turns out to yield more accurate clustering solutions than other text-based relatedness measures.
Collapse
Affiliation(s)
- Ludo Waltman
- Centre for Science and Technology Studies, Leiden University, The Netherlands
| | | | | | - Nees Jan van Eck
- Centre for Science and Technology Studies, Leiden University, The Netherlands
| |
Collapse
|
8
|
Ahlgren P, Chen Y, Colliander C, van Eck NJ. Enhancing direct citations: A comparison of relatedness measures for community detection in a large set of PubMed publications. QUANTITATIVE SCIENCE STUDIES 2020. [DOI: 10.1162/qss_a_00027] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
The effects of enhancing direct citations, with respect to publication–publication relatedness measurement, by indirect citation relations (bibliographic coupling, cocitation, and extended direct citations) and text relations on clustering solution accuracy are analyzed. For comparison, we include each approach that is involved in the enhancement of direct citations. In total, we investigate the relative performance of seven approaches. To evaluate the approaches we use a methodology proposed by earlier research. However, the evaluation criterion used is based on MeSH, one of the most sophisticated publication-level classification schemes available. We also introduce an approach, based on interpolated accuracy values, by which overall relative clustering solution accuracy can be studied. The results show that the cocitation approach has the worst performance, and that the direct citations approach is outperformed by the other five investigated approaches. The extended direct citations approach has the best performance, followed by an approach in which direct citations are enhanced by the BM25 textual relatedness measure. An approach that combines direct citations with bibliographic coupling and cocitation performs slightly better than the bibliographic coupling approach, which in turn has a better performance than the BM25 approach.
Collapse
Affiliation(s)
- Per Ahlgren
- Department of Statistics, Uppsala University, Uppsala (Sweden)
| | - Yunwei Chen
- Scientometrics & Evaluation Research Center (SERC), Chengdu Library and Information Center of Chinese Academy of Sciences, Chengdu, 610041 (China)
| | - Cristian Colliander
- Department of Sociology, Inforsk, Umeå University, Umeå (Sweden)
- University Library, Umeå University, Umeå (Sweden)
| | - Nees Jan van Eck
- Centre for Science and Technology Studies, Leiden University (The Netherlands)
| |
Collapse
|
9
|
Maciel RF, Bayerl PS, Kerr Pinheiro MM. Technical research innovations of the US national security system. Scientometrics 2019. [DOI: 10.1007/s11192-019-03148-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
10
|
Node2vec Representation for Clustering Journals and as A Possible Measure of Diversity. JOURNAL OF DATA AND INFORMATION SCIENCE 2019. [DOI: 10.2478/jdis-2019-0010] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Abstract
Purpose
To investigate the effectiveness of using node2vec on journal citation networks to represent journals as vectors for tasks such as clustering, science mapping, and journal diversity measure.
Design/methodology/approach
Node2vec is used in a journal citation network to generate journal vector representations.
Findings
1. Journals are clustered based on the node2vec trained vectors to form a science map. 2. The norm of the vector can be seen as an indicator of the diversity of journals. 3. Using node2vec trained journal vectors to determine the Rao-Stirling diversity measure leads to a better measure of diversity than that of direct citation vectors.
Research limitations
All analyses use citation data and only focus on the journal level.
Practical implications
Node2vec trained journal vectors embed rich information about journals, can be used to form a science map and may generate better values of journal diversity measures.
Originality/value
The effectiveness of node2vec in scientometric analysis is tested. Possible indicators for journal diversity measure are presented.
Collapse
|
11
|
|
12
|
A comparison of cognitive and organizational classification of publications in the social sciences and humanities. Scientometrics 2018. [DOI: 10.1007/s11192-018-2775-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
13
|
Sjögårde P, Ahlgren P. Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics. J Informetr 2018. [DOI: 10.1016/j.joi.2017.12.006] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
14
|
Parish AJ, Boyack KW, Ioannidis JPA. Dynamics of co-authorship and productivity across different fields of scientific research. PLoS One 2018; 13:e0189742. [PMID: 29320509 PMCID: PMC5761855 DOI: 10.1371/journal.pone.0189742] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Accepted: 11/30/2017] [Indexed: 11/18/2022] Open
Abstract
We aimed to assess which factors correlate with collaborative behavior and whether such behavior associates with scientific impact (citations and becoming a principal investigator). We used the R index which is defined for each author as log(Np)/log(I1), where I1 is the number of co-authors who appear in at least I1 papers written by that author and Np are his/her total papers. Higher R means lower collaborative behavior, i.e. not working much with others, or not collaborating repeatedly with the same co-authors. Across 249,054 researchers who had published ≥30 papers in 2000–2015 but had not published anything before 2000, R varied across scientific fields. Lower values of R (more collaboration) were seen in physics, medicine, infectious disease and brain sciences and higher values of R were seen for social science, computer science and engineering. Among the 9,314 most productive researchers already reaching Np ≥ 30 and I1 ≥ 4 by the end of 2006, R mostly remained stable for most fields from 2006 to 2015 with small increases seen in physics, chemistry, and medicine. Both US-based authorship and male gender were associated with higher values of R (lower collaboration), although the effect was small. Lower values of R (more collaboration) were associated with higher citation impact (h-index), and the effect was stronger in certain fields (physics, medicine, engineering, health sciences) than in others (brain sciences, computer science, infectious disease, chemistry). Finally, for a subset of 400 U.S. researchers in medicine, infectious disease and brain sciences, higher R (lower collaboration) was associated with a higher chance of being a principal investigator by 2016. Our analysis maps the patterns and evolution of collaborative behavior across scientific disciplines.
Collapse
Affiliation(s)
- Austin J. Parish
- Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, California, United States of America
- * E-mail:
| | - Kevin W. Boyack
- SciTech Strategies, Inc., Albuquerque, New Mexico, United States of America
| | - John P. A. Ioannidis
- Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, California, United States of America
- Stanford Prevention Research Center, Department of Medicine, Stanford University School of Medicine, Stanford, California United States of America
- Department of Health Research and Policy, Stanford University School of Medicine, Stanford, California, United States of America
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California
- Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, California, United States of America
| |
Collapse
|
15
|
Trevisani M, Tuzzi A. Chronological corpora curve clustering: From scientific corpora construction to knowledge dynamics discovery through word life-cycles clustering. MethodsX 2018; 5:1576-1587. [PMID: 30568881 PMCID: PMC6287063 DOI: 10.1016/j.mex.2018.11.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Accepted: 11/10/2018] [Indexed: 11/29/2022] Open
Abstract
Aim of this procedural method is to construct well-founded corpora of scientific literature, and, hence, to track the evolution of knowledge fields from the reconstruction and clustering of words’ life-cycles. The method contains: an original selection process of relevant keywords involving the identification of relevant stems and stem n-grams through a matching with item lists of relevant glossaries; several types of normalization of temporal trajectories of word raw frequencies a properly customized clustering of word life-cycles, with a graphical extensive investigation of the best candidates for cluster number, to unveil the important dynamics and decipher the history of a scientific field.
Collapse
|
16
|
|
17
|
|
18
|
van Eck NJ, Waltman L. Citation-based clustering of publications using CitNetExplorer and VOSviewer. Scientometrics 2017; 111:1053-1070. [PMID: 28490825 PMCID: PMC5400793 DOI: 10.1007/s11192-017-2300-7] [Citation(s) in RCA: 477] [Impact Index Per Article: 68.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Indexed: 12/18/2022]
Abstract
Clustering scientific publications in an important problem in bibliometric research. We demonstrate how two software tools, CitNetExplorer and VOSviewer, can be used to cluster publications and to analyze the resulting clustering solutions. CitNetExplorer is used to cluster a large set of publications in the field of astronomy and astrophysics. The publications are clustered based on direct citation relations. CitNetExplorer and VOSviewer are used together to analyze the resulting clustering solutions. Both tools use visualizations to support the analysis of the clustering solutions, with CitNetExplorer focusing on the analysis at the level of individual publications and VOSviewer focusing on the analysis at an aggregate level. The demonstration provided in this paper shows how a clustering of publications can be created and analyzed using freely available software tools. Using the approach presented in this paper, bibliometricians are able to carry out sophisticated cluster analyses without the need to have a deep knowledge of clustering techniques and without requiring advanced computer skills.
Collapse
Affiliation(s)
- Nees Jan van Eck
- Centre for Science and Technology Studies, Leiden University, Leiden, The Netherlands
| | - Ludo Waltman
- Centre for Science and Technology Studies, Leiden University, Leiden, The Netherlands
| |
Collapse
|
19
|
|
20
|
|
21
|
Klavans R, Boyack KW. Which Type of Citation Analysis Generates the Most Accurate Taxonomy of Scientific and Technical Knowledge? J Assoc Inf Sci Technol 2016. [DOI: 10.1002/asi.23734] [Citation(s) in RCA: 155] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
22
|
Sun X, Ding K, Lin Y. Mapping the evolution of scientific fields based on cross-field authors. J Informetr 2016. [DOI: 10.1016/j.joi.2016.04.016] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
23
|
Multiple Citation Indicators and Their Composite across Scientific Disciplines. PLoS Biol 2016; 14:e1002501. [PMID: 27367269 PMCID: PMC4930269 DOI: 10.1371/journal.pbio.1002501] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Accepted: 06/02/2016] [Indexed: 11/19/2022] Open
Abstract
Many fields face an increasing prevalence of multi-authorship, and this poses challenges in assessing citation metrics. Here, we explore multiple citation indicators that address total impact (number of citations, Hirsch H index [H]), co-authorship adjustment (Schreiber Hm index [Hm]), and author order (total citations to papers as single; single or first; or single, first, or last author). We demonstrate the correlation patterns between these indicators across 84,116 scientists (those among the top 30,000 for impact in a single year [2013] in at least one of these indicators) and separately across 12 scientific fields. Correlation patterns vary across these 12 fields. In physics, total citations are highly negatively correlated with indicators of co-authorship adjustment and of author order, while in other sciences the negative correlation is seen only for total citation impact and citations to papers as single author. We propose a composite score that sums standardized values of these six log-transformed indicators. Of the 1,000 top-ranked scientists with the composite score, only 322 are in the top 1,000 based on total citations. Many Nobel laureates and other extremely influential scientists rank among the top-1,000 with the composite indicator, but would rank much lower based on total citations. Conversely, many of the top 1,000 authors on total citations have had no single/first/last-authored cited paper. More Nobel laureates of 2011–2015 are among the top authors when authors are ranked by the composite score than by total citations, H index, or Hm index; 40/47 of these laureates are among the top 30,000 by at least one of the six indicators. We also explore the sensitivity of indicators to self-citation and alphabetic ordering of authors in papers across different scientific fields. Multiple indicators and their composite may give a more comprehensive picture of impact, although no citation indicator, single or composite, can be expected to select all the best scientists. Citation indicators addressing total impact, co-authorship, and author positions offer complementary insights about impact. This article shows that a composite score including six citation indicators identifies extremely influential scientists better than single indicators. Multiple citation indicators are used in science and scientific evaluation. With an increasing proportion of papers co-authored by many researchers, it is important to account for the relative contributions of different co-authors. We explored multiple citation indicators that address total impact, co-authorship adjustment, and author order (in particular, single, first, or last position authorships, since these positions suggest pivotal contributions to the work). We evaluated the top 30,000 scientists in 2013 based on each of six citation indicators (84,116 total scientists assessed) and also developed a composite score that combines the six indicators. Different scientists populated the top ranks when different indicators were used. Many Nobel laureates and other influential scientists rank among the top 1,000 with the composite indicator, but rank much lower based on total citations. Conversely, many of the top 1,000 authors on total citations had no single/first/last-authored cited paper. More Nobel laureates are among the top authors when authors are ranked by the composite score than by single indicators. Multiple indicators and their composite give a more comprehensive picture of impact, although no method can pick all the best scientists.
Collapse
|
24
|
Šubelj L, van Eck NJ, Waltman L. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods. PLoS One 2016; 11:e0154404. [PMID: 27124610 PMCID: PMC4849655 DOI: 10.1371/journal.pone.0154404] [Citation(s) in RCA: 70] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2015] [Accepted: 04/13/2016] [Indexed: 11/19/2022] Open
Abstract
Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community.
Collapse
Affiliation(s)
- Lovro Šubelj
- University of Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia
- * E-mail:
| | - Nees Jan van Eck
- Leiden University, Centre for Science and Technology Studies, Leiden, Netherlands
| | - Ludo Waltman
- Leiden University, Centre for Science and Technology Studies, Leiden, Netherlands
| |
Collapse
|