1
|
Shi D, Liu W, Wang Y. Has China's Young Thousand Talents program been successful in recruiting and nurturing top-caliber scientists? Science 2023; 379:62-65. [PMID: 36603081 DOI: 10.1126/science.abq1218] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
In this study, we examined China's Young Thousand Talents (YTT) program and evaluated its effectiveness in recruiting elite expatriate scientists and in nurturing the returnee scientists' productivity. We find that YTT scientists are generally of high caliber in research but, as a group, fall below the top category in pre-return productivity. We further find that YTT scientists are associated with a post-return publication gain across journal-quality tiers. However, this gain mainly takes place in last-authored publications and for high-caliber (albeit not top-caliber) recruits and can be explained by YTT scientists' access to greater funding and larger research teams. This paper has policy implications for the mobility of scientific talent, especially as early-career scientists face growing challenges in accessing research funding in the United States and European Union.
Collapse
Affiliation(s)
- Dongbo Shi
- School of International and Public Affairs, Shanghai Jiao Tong University, Shanghai, China
| | - Weichen Liu
- School of Public Policy and Management, Tsinghua University, Beijing, China
| | - Yanbo Wang
- Faculty of Business and Economics, The University of Hong Kong, Hong Kong
| |
Collapse
|
2
|
Online author name disambiguation in evolving digital library. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.07.104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
3
|
Waqas H, Qadir A. Completing features for author name disambiguation (AND): an empirical analysis. Scientometrics 2022. [DOI: 10.1007/s11192-021-04229-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
4
|
M. C. Cota JM, Laender AHF, Prates RO. Science Tree: a platform for exploring the brazilian academic genealogy. JOURNAL OF THE BRAZILIAN COMPUTER SOCIETY 2021. [DOI: 10.1186/s13173-021-00118-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
AbstractIdentifying and studying the formation of researchers over the years is a challenging task, since the current repositories of theses and dissertations are cataloged in a decentralized manner in different digital libraries, many of them with limited scope. In this article, we report our efforts towards building a large repository to record the Brazilian academic genealogy. For this, we collected data from the Lattes platform, an internationally recognized initiative that provides a repository of researchers’ curricula maintained by the Brazilian National Council for Scientific and Technological Development (CNPq), and developed a user-oriented platform, named Science Tree, to generate the academic genealogy trees of Brazilian researchers from them, also providing additional data resulting from a series of analyses regarding the main properties of such trees. In order to assess the facilities provided by the Science Tree platform, we conducted an experimental evaluation of it with two groups of users, the first one consisting of 286 researchers who answered an evaluation questionnaire and the second one involving seven researchers with large academic experience who agreed to participate in a face-to-face assessment conducted through a personal interview, during which they performed some pre-defined tasks. The results of these two evaluations with typical users enabled us not only to validate the main features offered by the platform, but also to identify new ones that could be added to it in the future. Overall, our effort has allowed us to identify interesting aspects related to the academic career of the Brazilian researchers, thus highlighting the importance of generating and cataloging their academic genealogy trees.
Collapse
|
5
|
|
6
|
Rehs A. A supervised machine learning approach to author disambiguation in the Web of Science. J Informetr 2021. [DOI: 10.1016/j.joi.2021.101166] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
7
|
Kim J, Kim J, Owen‐Smith J. Ethnicity-based name partitioning for author name disambiguation using supervised machine learning. J Assoc Inf Sci Technol 2021; 72:979-994. [PMID: 34414251 PMCID: PMC8359369 DOI: 10.1002/asi.24459] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 01/04/2021] [Accepted: 01/22/2021] [Indexed: 11/07/2022]
Abstract
In several author name disambiguation studies, some ethnic name groups such as East Asian names are reported to be more difficult to disambiguate than others. This implies that disambiguation approaches might be improved if ethnic name groups are distinguished before disambiguation. We explore the potential of ethnic name partitioning by comparing performance of four machine learning algorithms trained and tested on the entire data or specifically on individual name groups. Results show that ethnicity-based name partitioning can substantially improve disambiguation performance because the individual models are better suited for their respective name group. The improvements occur across all ethnic name groups with different magnitudes. Performance gains in predicting matched name pairs outweigh losses in predicting nonmatched pairs. Feature (e.g., coauthor name) similarities of name pairs vary across ethnic name groups. Such differences may enable the development of ethnicity-specific feature weights to improve prediction for specific ethic name categories. These findings are observed for three labeled data with a natural distribution of problem sizes as well as one in which all ethnic name groups are controlled for the same sizes of ambiguous names. This study is expected to motive scholars to group author names based on ethnicity prior to disambiguation.
Collapse
Affiliation(s)
- Jinseok Kim
- Institute for Research on Innovation & Science, Survey Research Center, Institute for Social ResearchUniversity of MichiganAnn ArborMichiganUSA
| | - Jenna Kim
- School of Information SciencesUniversity of Illinois at Urbana – ChampaignChampaignIllinoisUSA
| | - Jason Owen‐Smith
- Department of Sociology, Institute for Social ResearchUniversity of MichiganAnn ArborMichiganUSA
| |
Collapse
|
8
|
Waqas H, Qadir MA. Multilayer heuristics based clustering framework (MHCF) for author name disambiguation. Scientometrics 2021. [DOI: 10.1007/s11192-021-04087-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
9
|
Tekles A, Bornmann L. Author name disambiguation of bibliometric data: A comparison of several unsupervised approaches. QUANTITATIVE SCIENCE STUDIES 2020. [DOI: 10.1162/qss_a_00081] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Adequately disambiguating author names in bibliometric databases is a precondition for conducting reliable analyses at the author level. In the case of bibliometric studies that include many researchers, it is not possible to disambiguate each single researcher manually. Several approaches have been proposed for author name disambiguation, but there has not yet been a comparison of them under controlled conditions. In this study, we compare a set of unsupervised disambiguation approaches. Unsupervised approaches specify a model to assess the similarity of author mentions a priori instead of training a model with labeled data. To evaluate the approaches, we applied them to a set of author mentions annotated with a ResearcherID, this being an author identifier maintained by the researchers themselves. Apart from comparing the overall performance, we take a more detailed look at the role of the parametrization of the approaches and analyze the dependence of the results on the complexity of the disambiguation task. Furthermore, we examine which effects the differences in the set of metadata considered by the different approaches have on the disambiguation results. In the context of this study, the approach proposed by Caron and van Eck (2014) produced the best results.
Collapse
Affiliation(s)
- Alexander Tekles
- Division for Science and Innovation Studies, Administrative Headquarters of the Max Planck Society, Hofgartenstr. 8, 80539 Munich, Germany
- Ludwig-Maximilians-Universität Munich, Department of Sociology, Konradstr. 6, 80801 Munich, Germany
| | - Lutz Bornmann
- Division for Science and Innovation Studies, Administrative Headquarters of the Max Planck Society, Hofgartenstr. 8, 80539 Munich, Germany
| |
Collapse
|
10
|
|
11
|
Affiliation(s)
- Jinseok Kim
- Institute for Research on Innovation & Science, Survey Research Center, Institute for Social Research, University of Michigan Ann Arbor MI
| | - Jenna Kim
- School of Information Sciences, University of Illinois at Urbana‐Champaign Champaign IL
| |
Collapse
|
12
|
Kim J. A fast and integrative algorithm for clustering performance evaluation in author name disambiguation. Scientometrics 2019. [DOI: 10.1007/s11192-019-03143-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
13
|
KM P, Mondal S, Chandra J. A Graph Combination With Edge Pruning‐Based Approach for Author Name Disambiguation. J Assoc Inf Sci Technol 2019. [DOI: 10.1002/asi.24212] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Pooja KM
- Department of Computer Science and EngineeringIndian Institute of Technology Patna Patna India 801103
| | - Samrat Mondal
- Department of Computer Science and EngineeringIndian Institute of Technology Patna Patna India 801103
| | - Joydeep Chandra
- Department of Computer Science and EngineeringIndian Institute of Technology Patna Patna India 801103
| |
Collapse
|
14
|
Generating automatically labeled data for author name disambiguation: an iterative clustering method. Scientometrics 2018. [DOI: 10.1007/s11192-018-2968-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
15
|
Kim J, Kim J. The impact of imbalanced training data on machine learning for author name disambiguation. Scientometrics 2018. [DOI: 10.1007/s11192-018-2865-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
16
|
Abstract
Author name ambiguity degrades information retrieval, database integration, search results and, more importantly, correct attributions in bibliographic databases. Some unresolved issues include how to ascertain the actual number of authors, how to improve the performance and how to make the method more effective in terms of representative clustering metrics (average cluster purity, average author purity, K-metric, pairwise precision, pairwise recall, pairwise-F1, cluster precision, cluster recall and cluster-F1). It is a non-trivial task to disambiguate authors using only the implicit bibliographic information. An effective method ‘DISC’ is proposed that uses graph community detection algorithm, feature vectors and graph operations to disambiguate homonyms. The citation data set is pre-processed and ambiguous author blocks are formed. A co-authors graph is constructed using authors and their co-author’s relationships. A graph structural clustering ‘gSkeletonClu’ is applied to identify hubs, outliers and clusters of nodes in a co-author’s graph. Homonyms are resolved by splitting these clusters of nodes across the hub if their feature vector similarity is less than a predefined threshold. DISC utilises only co-authors and titles that are available in almost all bibliographic databases. With little modifications, DISC can also be used for entity disambiguation. To validate the DISC performance, experiments are performed on two Arnetminer data sets and compared with five previous unsupervised methods. Despite using limited bibliographic metadata, DISC achieves on average K-metric, pairwise-F1, and cluster-F1 of 92%, 84% and 74%, respectively, using Arnetminer-S and 86%, 80% and 57%, respectively, using Arnetminer-L. About 77.5% and 73.2% clusters are within the range (ground truth clusters ± 3) in Arnetminer-S and Arnetminer-L, respectively.
Collapse
Affiliation(s)
- Ijaz Hussain
- Department of Computer Science, COMSATS Institute of Information Technology, Pakistan
| | - Sohail Asghar
- Department of Computer Science, COMSATS Institute of Information Technology, Pakistan
| |
Collapse
|
17
|
Abstract
AbstractDigital libraries content and quality of services are badly affected by the author name ambiguity problem in the citations and it is considered as one of the hardest problems faced by the digital library researchers. Several techniques have been proposed in the literature for the author name ambiguity problem. In this paper, we reviewed some recently presented author name disambiguation techniques and give some challenges and future research directions. We analyze the recent advancements in this field and classify these techniques into supervised, unsupervised, semi-supervised, graph-based and heuristic-based techniques according to their problem formulation that is mainly used for the author name disambiguation. A few surveys have been conducted to review different techniques for the author name disambiguation. These surveys highlighted only the methodology adopted for author name disambiguation but did not critically review their shortcomings. This survey provides a detailed review of author name disambiguation techniques available in the literature, makes a comparison of these techniques at an abstract level and discusses their limitations.
Collapse
|
18
|
Carrasco RC, Serrano A, Castillo-Buergo R. A parser for authority control of author names in bibliographic records. Inf Process Manag 2016. [DOI: 10.1016/j.ipm.2016.02.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
19
|
Using Monte Carlo simulations to assess the impact of author name disambiguation quality on different bibliometric analyses. Scientometrics 2016. [DOI: 10.1007/s11192-016-1892-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
20
|
On the combination of domain-specific heuristics for author name disambiguation: the nearest cluster method. INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES 2015. [DOI: 10.1007/s00799-015-0158-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|