1
|
Dexter F, Epstein RH. Quantifying Cooperation among Investigators with Substantial Production in Operating Room Management [Letter]. J Multidiscip Healthc 2024; 17:3993-3994. [PMID: 39161539 PMCID: PMC11332419 DOI: 10.2147/jmdh.s489745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 08/09/2024] [Indexed: 08/21/2024] Open
Affiliation(s)
- Franklin Dexter
- Division of Management Consulting, Department of anesthesia, University of Iowa, Iowa City, IA, USA
| | - Richard H Epstein
- Department of Anesthesiology, Perioperative Medicine & Pain Management, University of Miami, Miller School of Medicine, Miami, FL, USA
| |
Collapse
|
2
|
Zhang L, Song N, Gui S, Wu K, Lu W. Bridging the gap in author names: building an enhanced author name dataset for biomedical literature system. J Am Med Inform Assoc 2024; 31:1648-1656. [PMID: 38916911 PMCID: PMC11258411 DOI: 10.1093/jamia/ocae127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 05/07/2024] [Accepted: 05/16/2024] [Indexed: 06/26/2024] Open
Abstract
OBJECTIVE Author name incompleteness, referring to only first initial available instead of full first name, is a long-standing problem in MEDLINE and has a negative impact on biomedical literature systems. The purpose of this study is to create an Enhanced Author Names (EAN) dataset for MEDLINE that maximizes the number of complete author names. MATERIALS AND METHODS The EAN dataset is built based on a large-scale name comparison and restoration with author names collected from multiple literature databases such as MEDLINE, Microsoft Academic Graph, and Semantic Scholar. We assess the impact of EAN on biomedical literature systems by conducting comparative and statistical analyses between EAN and MEDLINE's author names dataset (MAN) on 2 important tasks, author name search and author name disambiguation. RESULTS Evaluation results show that EAN improves the number of full author names in MEDLINE from 69.73 million to 110.9 million. EAN not only restores a substantial number of abbreviated names prior to the year 2002 when the NLM changed its author name indexing policy but also improves the availability of full author names in articles published afterward. The evaluation of the author name search and author name disambiguation tasks reveal that EAN is able to significantly enhance both tasks compared to MAN. CONCLUSION The extensive coverage of full names in EAN suggests that the name incompleteness issue can be largely mitigated. This has significant implications for the development of an improved biomedical literature system. EAN is available at https://zenodo.org/record/10251358, and an updated version is available at https://zenodo.org/records/10663234.
Collapse
Affiliation(s)
- Li Zhang
- Laboratory of Data Intelligence and Interdisciplinary Innovation of Nanjing University, Nanjing, Jiangsu, 210023, China
- School of Information Management, Nanjing University, Nanjing, Jiangsu, 210023, China
| | - Ningyuan Song
- Laboratory of Data Intelligence and Interdisciplinary Innovation of Nanjing University, Nanjing, Jiangsu, 210023, China
- School of Information Management, Nanjing University, Nanjing, Jiangsu, 210023, China
| | - Sisi Gui
- School of Information Management, Nanjing Agricultural University, Nanjing, Jiangsu, 210023, China
| | - Keye Wu
- Laboratory of Data Intelligence and Interdisciplinary Innovation of Nanjing University, Nanjing, Jiangsu, 210023, China
- School of Information Management, Nanjing University, Nanjing, Jiangsu, 210023, China
| | - Wei Lu
- School of Information Management, Wuhan University, Wuhan, Hubei, 430072, China
| |
Collapse
|
3
|
Akbaritabar A, Theile T, Zagheni E. Bilateral flows and rates of international migration of scholars for 210 countries for the period 1998-2020. Sci Data 2024; 11:816. [PMID: 39048586 PMCID: PMC11269605 DOI: 10.1038/s41597-024-03655-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 07/16/2024] [Indexed: 07/27/2024] Open
Abstract
A lack of comprehensive migration data is a major barrier for understanding the causes and consequences of migration processes, including for specific groups like high-skilled migrants. We leverage large-scale bibliometric data from Scopus and OpenAlex to trace the global movements of scholars. Based on our empirical validations, we develop pre-processing steps and offer best practices for the measurement and identification of migration events. We have prepared a publicly accessible dataset that shows a high level of correlation between the counts of scholars in Scopus and OpenAlex for most countries. Although OpenAlex has more extensive coverage of non-Western countries, the highest correlations with Scopus are observed in Western countries. We share aggregated yearly estimates of international migration rates and of bilateral flows for 210 countries and areas worldwide for the period 1998-2020 and describe the data structure and usage notes. We expect that the publicly shared dataset will enable researchers to further study the causes and the consequences of migration of scholars to forecast the future mobility of academic talent worldwide.
Collapse
Affiliation(s)
- Aliakbar Akbaritabar
- Department of Digital and Computational Demography, Max Planck Institute for Demographic Research, Rostock, 18057, Germany.
| | - Tom Theile
- Department of Digital and Computational Demography, Max Planck Institute for Demographic Research, Rostock, 18057, Germany
| | - Emilio Zagheni
- Department of Digital and Computational Demography, Max Planck Institute for Demographic Research, Rostock, 18057, Germany
| |
Collapse
|
4
|
Lin Z, Yin Y, Liu L, Wang D. SciSciNet: A large-scale open data lake for the science of science research. Sci Data 2023; 10:315. [PMID: 37264014 DOI: 10.1038/s41597-023-02198-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 05/02/2023] [Indexed: 06/03/2023] Open
Abstract
The science of science has attracted growing research interests, partly due to the increasing availability of large-scale datasets capturing the innerworkings of science. These datasets, and the numerous linkages among them, enable researchers to ask a range of fascinating questions about how science works and where innovation occurs. Yet as datasets grow, it becomes increasingly difficult to track available sources and linkages across datasets. Here we present SciSciNet, a large-scale open data lake for the science of science research, covering over 134M scientific publications and millions of external linkages to funding and public uses. We offer detailed documentation of pre-processing steps and analytical choices in constructing the data lake. We further supplement the data lake by computing frequently used measures in the literature, illustrating how researchers may contribute collectively to enriching the data lake. Overall, this data lake serves as an initial but useful resource for the field, by lowering the barrier to entry, reducing duplication of efforts in data processing and measurements, improving the robustness and replicability of empirical claims, and broadening the diversity and representation of ideas in the field.
Collapse
Affiliation(s)
- Zihang Lin
- Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA
- Kellogg School of Management, Northwestern University, Evanston, IL, USA
- School of Computer Science, Fudan University, Shanghai, China
| | - Yian Yin
- Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA
- Kellogg School of Management, Northwestern University, Evanston, IL, USA
- McCormick School of Engineering, Northwestern University, Evanston, IL, USA
| | - Lu Liu
- Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA
- Kellogg School of Management, Northwestern University, Evanston, IL, USA
| | - Dashun Wang
- Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA.
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA.
- Kellogg School of Management, Northwestern University, Evanston, IL, USA.
- McCormick School of Engineering, Northwestern University, Evanston, IL, USA.
| |
Collapse
|
5
|
Wang YS, Lee CJ, West JD, Bergstrom CT, Erosheva EA. Gender-based homophily in collaborations across a heterogeneous scholarly landscape. PLoS One 2023; 18:e0283106. [PMID: 37018177 PMCID: PMC10075399 DOI: 10.1371/journal.pone.0283106] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 03/01/2023] [Indexed: 04/06/2023] Open
Abstract
In this article, we investigate the role of gender in collaboration patterns by analyzing gender-based homophily-the tendency for researchers to co-author with individuals of the same gender. We develop and apply novel methodology to the corpus of JSTOR articles, a broad scholarly landscape, which we analyze at various levels of granularity. Most notably, for a precise analysis of gender homophily, we develop methodology which explicitly accounts for the fact that the data comprises heterogeneous intellectual communities and that not all authorships are exchangeable. In particular, we distinguish three phenomena which may affect the distribution of observed gender homophily in collaborations: a structural component that is due to demographics and non-gendered authorship norms of a scholarly community, a compositional component which is driven by varying gender representation across sub-disciplines and time, and a behavioral component which we define as the remainder of observed gender homophily after its structural and compositional components have been taken into account. Using minimal modeling assumptions, the methodology we develop allows us to test for behavioral homophily. We find that statistically significant behavioral homophily can be detected across the JSTOR corpus and show that this finding is robust to missing gender indicators in our data. In a secondary analysis, we show that the proportion of women representation in a field is positively associated with the probability of finding statistically significant behavioral homophily.
Collapse
Affiliation(s)
- Y. Samuel Wang
- Department of Statistics and Data Science, Cornell University, Ithaca, NY, United States of America
| | - Carole J. Lee
- Department of Philosophy, University of Washington, Seattle, WA, United States of America
| | - Jevin D. West
- Information School, University of Washington, Seattle, WA, United States of America
| | - Carl T. Bergstrom
- Department of Biology, University of Washington, Seattle, WA, United States of America
| | - Elena A. Erosheva
- Department of Statistics, University of Washington, Seattle, WA, United States of America
| |
Collapse
|
6
|
A novel NIH research grant recommender using BERT. PLoS One 2023; 18:e0278636. [PMID: 36649346 PMCID: PMC9844873 DOI: 10.1371/journal.pone.0278636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 11/19/2022] [Indexed: 01/18/2023] Open
Abstract
Research grants are important for researchers to sustain a good position in academia. There are many grant opportunities available from different funding agencies. However, finding relevant grant announcements is challenging and time-consuming for researchers. To resolve the problem, we proposed a grant announcements recommendation system for the National Institute of Health (NIH) grants using researchers' publications. We formulated the recommendation as a classification problem and proposed a recommender using state-of-the-art deep learning techniques: i.e. Bidirectional Encoder Representations from Transformers (BERT), to capture intrinsic, non-linear relationship between researchers' publications and grants announcements. Internal and external evaluations were conducted to assess the system's usefulness. During internal evaluations, the grant citations were used to establish grant-publication ground truth, and results were evaluated against Recall@k, Precision@k, Mean reciprocal rank (MRR) and Area under the Receiver Operating Characteristic curve (ROC-AUC). During external evaluations, researchers' publications were clustered using Dirichlet Process Mixture Model (DPMM), recommended grants by our model were then aggregated per cluster through Recency Weight, and finally researchers were invited to provide ratings to recommendations to calculate Precision@k. For comparison, baseline recommenders using Okapi Best Matching (BM25), Term-Frequency Inverse Document Frequency (TF-IDF), doc2vec, and Naïve Bayes (NB) were also developed. Both internal and external evaluations (all metrics) revealed favorable performances of our proposed BERT-based recommender.
Collapse
|
7
|
Derrick GE, Chen PY, van Leeuwen T, Larivière V, Sugimoto CR. The relationship between parenting engagement and academic performance. Sci Rep 2022; 12:22300. [PMID: 36566309 PMCID: PMC9789521 DOI: 10.1038/s41598-022-26258-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 12/13/2022] [Indexed: 12/25/2022] Open
Abstract
Gender differences in research productivity have been well documented. One frequent explanation of these differences is disproportionate child-related responsibilities for women. However, changing social dynamics around parenting has led to fathers taking an increasingly active role in parenting. This demands a more nuanced approach to understanding the relationship between parenting and productivity for both men and women. To gain insight into associations between parent roles, partner type, research productivity, and research impact, we conducted a global survey that targeted 1.5 million active scientists; we received viable responses from 10,445 parents (< 1% response rate), thus providing a basis for exploratory analyses that shed light on associations between parenting models and research outcomes, across men and women. Results suggest that the gendered effect observed in production may be related by differential engagement in parenting: men who serve in lead roles suffer similar penalties for parenting engagement, but women are more likely to serve in lead roles and to be more engaged across time and tasks, therefore suffering a higher penalty. Taking a period of parental leave is associated with higher levels of productivity; however, the productivity advantage dissipates after six months for the US-sample, and at 12-months for the non-US sample. These results suggest that parental engagement is a more powerful variable to explain gender differences in academic productivity than the mere existence of children, and that policies should factor these labor differentials into account.
Collapse
Affiliation(s)
- Gemma E Derrick
- Centre for Higher Education Transformations (CHET), School of Education, University of Bristol, Bristol, UK.
| | - Pei-Ying Chen
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, USA
| | - Thed van Leeuwen
- Centre for Science and Technology Studies, Leiden University, Leiden, The Netherlands
| | - Vincent Larivière
- École de bibliothéconomie et des sciences de l'information, Université de Montréal, Montréal, Canada
- Observatoire des sciences et des technologies, Université du Québec À Montréal, Montreal, Canada
| | | |
Collapse
|
8
|
Ramos-Vielba I, Robinson-Garcia N, Woolley R. A value creation model from science-society interconnections: Archetypal analysis combining publications, survey and altmetric data. PLoS One 2022; 17:e0269004. [PMID: 35657967 PMCID: PMC9165788 DOI: 10.1371/journal.pone.0269004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 05/12/2022] [Indexed: 11/29/2022] Open
Abstract
The interplay between science and society takes place through a wide range of intertwined relationships and mutual influences that shape each other and facilitate continuous knowledge flows. Stylised consequentialist perspectives on valuable knowledge moving from public science to society in linear and recursive pathways, whilst informative, cannot fully capture the broad spectrum of value creation possibilities. As an alternative we experiment with an approach that gathers together diverse science-society interconnections and reciprocal research-related knowledge processes that can generate valorisation. Our approach to value creation attempts to incorporate multiple facets, directions and dynamics in which constellations of scientific and societal actors generate value from research. The paper develops a conceptual model based on a set of nine value components derived from four key research-related knowledge processes: production, translation, communication, and utilization. The paper conducts an exploratory empirical study to investigate whether a set of archetypes can be discerned among these components that structure science-society interconnections. We explore how such archetypes vary between major scientific fields. Each archetype is overlaid on a research topic map, with our results showing the distinctive topic areas that correspond to different archetypes. The paper finishes by discussing the significance and limitations of our results and the potential of both our model and our empirical approach for further research.
Collapse
Affiliation(s)
- Irene Ramos-Vielba
- Danish Centre for Studies in Research and Research Policy, Department of Political Science, Aarhus University, Aarhus, Denmark
| | - Nicolas Robinson-Garcia
- EC3 Research Group, Information and Communication Studies Department, Universidad de Granada, Granada, Spain
| | - Richard Woolley
- INGENIO (CSIC-UPV), Universitat Politècnica de València, Valencia, Spain
| |
Collapse
|
9
|
Predicting the future impact of Computer Science researchers: Is there a gender bias? Scientometrics 2022. [DOI: 10.1007/s11192-022-04337-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
AbstractThe advent of large-scale bibliographic databases and powerful prediction algorithms led to calls for data-driven approaches for targeting scarce funds at researchers with high predicted future scientific impact. The potential side-effects and fairness implications of such approaches are unknown, however. Using a large-scale bibliographic data set of N = 111,156 Computer Science researchers active from 1993 to 2016, I build and evaluate a realistic scientific impact prediction model. Given the persistent under-representation of women in Computer Science, the model is audited for disparate impact based on gender. Random forests and Gradient Boosting Machines are used to predict researchers’ h-index in 2010 from their bibliographic profiles in 2005. Based on model predictions, it is determined whether the researcher will become a high-performer with an h-index in the top-25% of the discipline-specific h-index distribution. The models predict the future h-index with an accuracy of $$R^2 = 0.875$$
R
2
=
0.875
and correctly classify 91.0% of researchers as high-performers and low-performers. Overall accuracy does not vary strongly across researcher gender. Nevertheless, there is indication of disparate impact against women. The models under-estimate the true h-index of female researchers more strongly than the h-index of male researchers. Further, women are 8.6% less likely to be predicted to become high-performers than men. In practice, hiring, tenure, and funding decisions that are based on model predictions risk to perpetuate the under-representation of women in Computer Science.
Collapse
|
10
|
Madsen EB, Nielsen MW, Bjørnholm J, Jagsi R, Andersen JP. Meta-Research: Individual-level researcher data confirm the widening gender gap in publishing rates during COVID-19. eLife 2022; 11:76559. [PMID: 35293860 PMCID: PMC8942470 DOI: 10.7554/elife.76559] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 03/15/2022] [Indexed: 12/02/2022] Open
Abstract
Publications are essential for a successful academic career, and there is evidence that the COVID-19 pandemic has amplified existing gender disparities in the publishing process. We used longitudinal publication data on 431,207 authors in four disciplines - basic medicine, biology, chemistry and clinical medicine - to quantify the differential impact of COVID-19 on the annual publishing rates of men and women. In a difference-in-differences analysis, we estimated that the average gender difference in publication productivity increased from –0.26 in 2019 to –0.35 in 2020; this corresponds to the output of women being 17% lower than the output of men in 2109, and 24% lower in 2020. An age-group comparison showed a widening gender gap for both early-career and mid-career scientists. The increasing gender gap was most pronounced among highly productive authors and in biology and clinical medicine. Our study demonstrates the importance of reinforcing institutional commitments to diversity through policies that support the inclusion and retention of women in research.
Collapse
Affiliation(s)
- Emil Bargmann Madsen
- Danish Centre for Studies in Research and Research Policy, Aarhus University, Aarhus, Denmark
| | | | - Josefine Bjørnholm
- Danish Centre for Studies in Research and Research Policy, Aarhus University, Aarhus, Denmark
| | - Reshma Jagsi
- Department of Radiation Oncology, University of Michigan, Ann Arbor, United States
| | - Jens Peter Andersen
- Danish Centre for Studies in Research and Research Policy, Aarhus University, Aarhus, Denmark
| |
Collapse
|
11
|
Teixeira da Silva JA. Non-compliance with ethical rules caused by misuse of ORCID accounts: Implications for medical publications in the COVID-19 era. ETHICS, MEDICINE, AND PUBLIC HEALTH 2021; 18:100692. [PMID: 36569745 PMCID: PMC9765410 DOI: 10.1016/j.jemep.2021.100692] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 06/02/2021] [Indexed: 12/27/2022]
|
12
|
Cappelletti-Montano B, Columbu S, Montaldo S, Musio M. New perspectives in bibliometric indicators: Moving from citations to citing authors. J Informetr 2021. [DOI: 10.1016/j.joi.2021.101164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
13
|
Rehs A. A supervised machine learning approach to author disambiguation in the Web of Science. J Informetr 2021. [DOI: 10.1016/j.joi.2021.101166] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
14
|
Exploring the relevance of ORCID as a source of study of data sharing activities at the individual-level: a methodological discussion. Scientometrics 2021. [DOI: 10.1007/s11192-021-04043-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|