1
|
Xu SB, Hu G. Rethinking the author name ambiguity problem and beyond: The case of the Chinese context. Account Res 2024:1-24. [PMID: 38704656 DOI: 10.1080/08989621.2024.2349115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/25/2024] [Indexed: 05/06/2024]
Abstract
The perennial problem of author name ambiguity has attracted increasing attention in the academic community. Drawing on the literature, this article first highlights the pervasiveness of the problem and discusses its adverse consequences. It then analyzes the behavioral causes of the problem in the Chinese context and attributes them to personal, cultural, and institutional factors. Informed by this analysis and recognizing ORCID as a promising solution, we propose an ORCID-based "Prevention plus Cure" campaign against author name ambiguity. The prevention objective relies on researchers' consistent use of ORCID, while the cure objective involves retrospectively integrating ORCIDs into backfile publications. We also outline the responsibilities of various stakeholders to ensure the success of the campaign. Furthermore, we argue that universal adoption of ORCID can help curb authorship-related misconduct, discern predatory journals and publishers, and track researchers' undesirable records of academic publishing. We then analyze the current status of ORCID adoption in China, identify potential challenges, propose tentative solutions to address them, and highlight ORCID as a tool that can be utilized to empower China's combat against research misconduct. In conclusion, we emphasize the importance of conducting empirical research to inform more effective promotion of ORCID adoption in China.
Collapse
Affiliation(s)
- Shaoxiong Brian Xu
- School of Foreign Studies, Huanggang Normal University, Huanggang, Hubei, China
- Department of English and Communication, The Hong Kong Polytechnic University, Hung Hom, Hong Kong, China
| | - Guangwei Hu
- Department of English and Communication, The Hong Kong Polytechnic University, Hung Hom, Hong Kong, China
| |
Collapse
|
2
|
Spinellis D. Open reproducible scientometric research with Alexandria3k. PLoS One 2023; 18:e0294946. [PMID: 38032908 PMCID: PMC10688655 DOI: 10.1371/journal.pone.0294946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Accepted: 11/11/2023] [Indexed: 12/02/2023] Open
Abstract
Considerable scientific work involves locating, analyzing, systematizing, and synthesizing other publications, often with the help of online scientific publication databases and search engines. However, use of online sources suffers from a lack of repeatability and transparency, as well as from technical restrictions. Alexandria3k is a Python software package and an associated command-line tool that can populate embedded relational databases with slices from the complete set of several open publication metadata sets. These can then be employed for reproducible processing and analysis through versatile and performant queries. We demonstrate the software's utility by visualizing the evolution of publications in diverse scientific fields and relationships among them, by outlining scientometric facts associated with COVID-19 research, and by replicating commonly-used bibliometric measures and findings regarding scientific productivity, impact, and disruption.
Collapse
Affiliation(s)
- Diomidis Spinellis
- Department of Management Science and Technology, Athens University of Economics and Business, Athens, Greece
- Department of Software Technology, Delft University of Technology, Delft, The Netherlands
| |
Collapse
|
3
|
Heusse M, Cabanac G. ORCID
growth and field‐wise dynamics of adoption: A case study of the Toulouse scientific area. LEARNED PUBLISHING 2022. [DOI: 10.1002/leap.1451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
4
|
Exploring the relevance of ORCID as a source of study of data sharing activities at the individual-level: a methodological discussion. Scientometrics 2021. [DOI: 10.1007/s11192-021-04043-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
5
|
Boudry C. Availability of ORCIDs in publications archived in PubMed, MEDLINE, and Web of Science Core Collection. Scientometrics 2021. [DOI: 10.1007/s11192-020-03825-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
6
|
Kim J, Owen-Smith J. ORCID-linked labeled data for evaluating author name disambiguation at scale. Scientometrics 2021. [DOI: 10.1007/s11192-020-03826-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
AbstractHow can we evaluate the performance of a disambiguation method implemented on big bibliographic data? This study suggests that the open researcher profile system, ORCID, can be used as an authority source to label name instances at scale. This study demonstrates the potential by evaluating the disambiguation performances of Author-ity2009 (which algorithmically disambiguates author names in MEDLINE) using 3 million name instances that are automatically labeled through linkage to 5 million ORCID researcher profiles. Results show that although ORCID-linked labeled data do not effectively represent the population of name instances in Author-ity2009, they do effectively capture the ‘high precision over high recall’ performances of Author-ity2009. In addition, ORCID-linked labeled data can provide nuanced details about the Author-ity2009’s performance when name instances are evaluated within and across ethnicity categories. As ORCID continues to be expanded to include more researchers, labeled data via ORCID-linkage can be improved in representing the population of a whole disambiguated data and updated on a regular basis. This can benefit author name disambiguation researchers and practitioners who need large-scale labeled data but lack resources for manual labeling or access to other authority sources for linkage-based labeling. The ORCID-linked labeled data for Author-ity2009 are publicly available for validation and reuse.
Collapse
|
7
|
Boudry C, Durand-Barthez M. Use of author identifier services (ORCID, ResearcherID) and academic social networks (Academia.edu, ResearchGate) by the researchers of the University of Caen Normandy (France): A case study. PLoS One 2020; 15:e0238583. [PMID: 32877458 PMCID: PMC7467223 DOI: 10.1371/journal.pone.0238583] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Accepted: 08/19/2020] [Indexed: 11/19/2022] Open
Abstract
The purpose of this paper was to assess the presence of researchers on two author identifier services (ORCID and ResearcherID) and to compare the results with two academic social networks (Academia.edu and ResearchGate) using the categories of discipline, career advancement, and gender in a medium sized multidisciplinary university in France (University of Caen Normandy). Metrics such as number of publications per researcher, h-indexes, and average number of citations were also assessed. Of the 1,047 researchers studied, 673 (64.3%) had at least one profile on the four sites, and the number of researchers having multiple profiles decreased as more sites were studied. Researchers with only one profile numbered 385 (36.8%), while 204 (19.5%) had two, 68 (6.5%) had three, and only 16 (1.5%) had four. ResearchGate had by far the highest number of researchers present, with 569 (54.3%), whereas presence on the other sites was about 15%. We found that, apart from Academia.edu, researchers in Sciences, Technology, and Medicine (STM) were over-represented. Overall, experienced male researchers were over-represented on the sites studied. Our results show that, because of the numerous profiles lacking publication references (particularly on ORCID) and a low presence of researchers on the four sites studied (except for ResearchGate), assessing the number of publications, h-indexes, or average number of citations per article of individuals or institutions remains challenging. Finally, our data showed that French researchers have not adopted the use of the two author identifier sites (i.e. ORCID and ResearcherID). As long as French researchers remain reticent, these sites will not be able to provide the services for which they were created: addressing the problem of author misidentification, consequently providing exhaustive access to scientific production and bibliometric indicators of individual researchers and their institutions.
Collapse
Affiliation(s)
- Christophe Boudry
- Normandie Univ, UNICAEN, Média Normandie, Caen, France
- Unité régionale de formation à l’information scientifique et technique (URFIST), Ecole Nationale des Chartes, PSL Research University, Paris, France
- * E-mail:
| | - Manuel Durand-Barthez
- Unité régionale de formation à l’information scientifique et technique (URFIST), Ecole Nationale des Chartes, PSL Research University, Paris, France
- Laboratoire “Dispositifs d’Information et de Communication à l’Ère Numérique” (DICEN), EA7339, Conservatoire National des Arts et Métiers, Paris, France
| |
Collapse
|
8
|
Gomez CJ, Herman AC, Parigi P. Moving more, but closer: Mapping the growing regionalization of global scientific mobility using ORCID. J Informetr 2020. [DOI: 10.1016/j.joi.2020.101044] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
9
|
Collecting large-scale publication data at the level of individual researchers: a practical proposal for author name disambiguation. Scientometrics 2020. [DOI: 10.1007/s11192-020-03410-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
10
|
Karaulova M, Gök A, Shapira P. Identifying author heritage using surname data: An application for Russian surnames. J Assoc Inf Sci Technol 2019; 70:488-498. [PMID: 31763359 PMCID: PMC6853192 DOI: 10.1002/asi.24104] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2017] [Revised: 05/02/2018] [Accepted: 06/20/2018] [Indexed: 11/07/2022]
Abstract
This research article puts forward a method to identify the national heritage of authors based on the morphology of their surnames. Most studies in the field use variants of dictionary‐based surname methods to identify ethnic communities, an approach that suffers from methodological limitations. Using the public file of ORCID (Open Researcher and Contributor ID) identifiers in 2015, we developed a surname‐based identification method and applied it to infer Russian heritage from suffix‐based morphological regularities. The method was developed conceptually and tested in an undersampled control set. Identification based on surname morphology was then complemented by using first‐name data to eliminate false‐positive results. The method achieved 98% precision and 94% recall rates—superior to most other methods that use name data. The procedure can be adapted to identify the heritage of a variety of national groups with morphologically regular naming traditions. We elaborate on how the method can be employed to overcome long‐standing limitations of using name data in bibliometric datasets. This identification method can contribute to advancing research in scientific mobility and migration, patenting by certain groups, publishing and collaboration, transnational and scientific diaspora links, and the effects of diversity on the innovative performance of organizations, regions, and countries.
Collapse
Affiliation(s)
- Maria Karaulova
- Manchester Institute of Innovation Research, Alliance Manchester Business School, University of Manchester Manchester, M13 9PL UK
| | - Abdullah Gök
- Hunter Centre for Entrepreneurship, Strathclyde Business School University of Strathclyde 199 Cathedral Street, Glasgow, G4 0QU UK
| | - Philip Shapira
- Manchester Institute of Innovation Research, Alliance Manchester Business School, University of Manchester Manchester, M13 9PL UK.,School of Public Policy Georgia Institute of Technology Atlanta GA, 30332-0345 USA
| |
Collapse
|
11
|
Powell J, Hoover C, Gordon A, Mittrach M. Bridging identity challenges: why and how one library plugged ORCiD into their enterprise. LIBRARY HI TECH 2019. [DOI: 10.1108/lht-04-2018-0046] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
The purpose of this paper is to describe the implementation and impact of a locally customized Open Researcher and Contributor ID (ORCiD) profile wizard. It also provides a broader context for adopting ORCiD as an identity and single sign-on solution.
Design/methodology/approach
A custom web application was designed by a library team and implemented using a combination of the OAuth protocol and the ORCiD web services API. The tool leveraged a rich, curated set of local publication data, and exposed integration hooks that allowed other enterprise systems to connect ORCiD IDs with an internal employee identifier.
Findings
Initially the tool saw only modest use. Ultimately its success depended upon integration with other enterprise systems and the requirement of an ORCiD ID for internal funding requests, rather than exclusively on the merits of the tool. Since introduction, it has been used to generate over 1,660 ORCiDs from a population of 4,000 actively publishing researchers.
Practical implications
Organizations that desire to track publications by many affiliated authors would likely benefit from some sort of integration with ORCiD web services. This is particularly true for organizations that have many publishing researchers and/or track publications spanning many decades. Enterprise integration is crucial to the success of such a project.
Originality/value
Research inputs and research products are now primarily digital objects. So having a reliable system for associating researchers with their output is a big challenge that, if solved, could increase researcher impact and enhance digital scholarship. ORCiD IDs are a potential glue for many aspects of this problem. The design and implementation of the wizard eased and quickened adoption of ORCiD Ids by local researchers due in part to the ease with which a researcher can push publication information already held by the library to their profile. Subsequent integration of researcher ORCiD IDs with local enterprise systems has enabled real-time propagation of ORCiD IDs across research proposal workflow, publication review and content discovery systems.
Collapse
|
12
|
Marín-Arraiza P. ORCID in the Open Science scenario: opportunities for academic libraries. MITTEILUNGEN DER VEREINIGUNG ÖSTERREICHISCHER BIBLIOTHEKARINNEN UND BIBLIOTHEKARE 2019. [DOI: 10.31263/voebm.v72i2.2811] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
The persistent identification of authors and contributors plays a decisive role within the Open Science landscape. The increasing number of published research products and new open publishing models and infrastructures requires author identification which goes beyond fields or infrastructures and guarantees interoperability. ORCID iD is presented as a persistent identifier for researchers in this context. As information managers and organisers, many academic libraries have taken the lead in offering ORCID-related services and implementing it in their systems. This paper scans the implementation models across Europe and the actions carried out by libraries. Finally, it also depicts perspectives for integration in the Austrian library and research context.
Collapse
|
13
|
Abstract
Purpose
The purpose of this paper is to report on the development and analysis of an internal bibliometric services workshop for subject librarians. Primary goals of the workshop were to create an opportunity for collegial knowledge and skill sharing, and to identify discipline specific gaps and future support requirements.
Design/methodology/approach
Two campus librarians who typically offer bibliometric support services used pre- and post-surveys to plan and assess the workshop for subject liaison librarians.
Findings
Subject librarians from across the university expressed interest in developing bibliometric support services. The 12 workshop participants (30 percent of subject librarians) support diverse areas including the humanities, social sciences, life sciences, education and outreach, and the school of business. Post-workshop survey respondents highlighted the contextualization of available measures and the appropriate application of metrics in different disciplines to be the most helpful topics covered. Finally, while the institution subscribes to several citation analysis databases, more familiarity with Google Scholar citations was requested to address user needs and preferences across the various disciplines. Most participants expressed interest in attending additional workshops.
Originality/value
This study showcases the experience of campus librarians working together across academic schools and disciplines to respond to the increasing demand for bibliometric and scholarly impact support services. While services such as citation analysis have typically been siloed in specific job descriptions or subject areas within the library, these are service areas that can benefit from internal library-collaboration opportunities and knowledge sharing.
Collapse
|
14
|
Abstract
Abstract
Purpose
The ability to identify the scholarship of individual authors is essential for performance evaluation. A number of factors hinder this endeavor. Common and similarly spelled surnames make it difficult to isolate the scholarship of individual authors indexed on large databases. Variations in name spelling of individual scholars further complicates matters. Common family names in scientific powerhouses like China make it problematic to distinguish between authors possessing ubiquitous and/or anglicized surnames (as well as the same or similar first names). The assignment of unique author identifiers provides a major step toward resolving these difficulties. We maintain, however, that in and of themselves, author identifiers are not sufficient to fully address the author uncertainty problem. In this study we build on the author identifier approach by considering commonalities in fielded data between authors containing the same surname and first initial of their first name. We illustrate our approach using three case studies.
Design/methodology/approach
The approach we advance in this study is based on commonalities among fielded data in search results. We cast a broad initial net—i.e., a Web of Science (WOS) search for a given author’s last name, followed by a comma, followed by the first initial of his or her first name (e.g., a search for ‘John Doe’ would assume the form: ‘Doe, J’). Results for this search typically contain all of the scholarship legitimately belonging to this author in the given database (i.e., all of his or her true positives), along with a large amount of noise, or scholarship not belonging to this author (i.e., a large number of false positives). From this corpus we proceed to iteratively weed out false positives and retain true positives. Author identifiers provide a good starting point—e.g., if ‘Doe, J’ and ‘Doe, John’ share the same author identifier, this would be sufficient for us to conclude these are one and the same individual. We find email addresses similarly adequate—e.g., if two author names which share the same surname and same first initial have an email address in common, we conclude these authors are the same person. Author identifier and email address data is not always available, however. When this occurs, other fields are used to address the author uncertainty problem.
Commonalities among author data other than unique identifiers and email addresses is less conclusive for name consolidation purposes. For example, if ‘Doe, John’ and ‘Doe, J’ have an affiliation in common, do we conclude that these names belong the same person? They may or may not; affiliations have employed two or more faculty members sharing the same last and first initial. Similarly, it’s conceivable that two individuals with the same last name and first initial publish in the same journal, publish with the same co-authors, and/or cite the same references. Should we then ignore commonalities among these fields and conclude they’re too imprecise for name consolidation purposes? It is our position that such commonalities are indeed valuable for addressing the author uncertainty problem, but more so when used in combination.
Our approach makes use of automation as well as manual inspection, relying initially on author identifiers, then commonalities among fielded data other than author identifiers, and finally manual verification. To achieve name consolidation independent of author identifier matches, we have developed a procedure that is used with bibliometric software called VantagePoint (see www.thevantagepoint.com) While the application of our technique does not exclusively depend on VantagePoint, it is the software we find most efficient in this study. The script we developed to implement this procedure is designed to implement our name disambiguation procedure in a way that significantly reduces manual effort on the user’s part. Those who seek to replicate our procedure independent of VantagePoint can do so by manually following the method we outline, but we note that the manual application of our procedure takes a significant amount of time and effort, especially when working with larger datasets.
Our script begins by prompting the user for a surname and a first initial (for any author of interest). It then prompts the user to select a WOS field on which to consolidate author names. After this the user is prompted to point to the name of the authors field, and finally asked to identify a specific author name (referred to by the script as the primary author) within this field whom the user knows to be a true positive (a suggested approach is to point to an author name associated with one of the records that has the author’s ORCID iD or email address attached to it).
The script proceeds to identify and combine all author names sharing the primary author’s surname and first initial of his or her first name who share commonalities in the WOS field on which the user was prompted to consolidate author names. This typically results in significant reduction in the initial dataset size. After the procedure completes the user is usually left with a much smaller (and more manageable) dataset to manually inspect (and/or apply additional name disambiguation techniques to).
Research limitations
Match field coverage can be an issue. When field coverage is paltry dataset reduction is not as significant, which results in more manual inspection on the user’s part. Our procedure doesn’t lend itself to scholars who have had a legal family name change (after marriage, for example). Moreover, the technique we advance is (sometimes, but not always) likely to have a difficult time dealing with scholars who have changed careers or fields dramatically, as well as scholars whose work is highly interdisciplinary.
Practical implications
The procedure we advance has the ability to save a significant amount of time and effort for individuals engaged in name disambiguation research, especially when the name under consideration is a more common family name. It is more effective when match field coverage is high and a number of match fields exist.
Originality/value
Once again, the procedure we advance has the ability to save a significant amount of time and effort for individuals engaged in name disambiguation research. It combines preexisting with more recent approaches, harnessing the benefits of both.
Findings
Our study applies the name disambiguation procedure we advance to three case studies. Ideal match fields are not the same for each of our case studies. We find that match field effectiveness is in large part a function of field coverage. Comparing original dataset size, the timeframe analyzed for each case study is not the same, nor are the subject areas in which they publish. Our procedure is more effective when applied to our third case study, both in terms of list reduction and 100% retention of true positives. We attribute this to excellent match field coverage, and especially in more specific match fields, as well as having a more modest/manageable number of publications.
While machine learning is considered authoritative by many, we do not see it as practical or replicable. The procedure advanced herein is both practical, replicable and relatively user friendly. It might be categorized into a space between ORCID and machine learning. Machine learning approaches typically look for commonalities among citation data, which is not always available, structured or easy to work with. The procedure we advance is intended to be applied across numerous fields in a dataset of interest (e.g. emails, coauthors, affiliations, etc.), resulting in multiple rounds of reduction. Results indicate that effective match fields include author identifiers, emails, source titles, co-authors and ISSNs. While the script we present is not likely to result in a dataset consisting solely of true positives (at least for more common surnames), it does significantly reduce manual effort on the user’s part. Dataset reduction (after our procedure is applied) is in large part a function of (a) field availability and (b) field coverage.
Collapse
|
15
|
Haak LL, Meadows A, Brown J. Using ORCID, DOI, and Other Open Identifiers in Research Evaluation. Front Res Metr Anal 2018. [DOI: 10.3389/frma.2018.00028] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|