1
|
Yu Y, Romero DM. Does the use of unusual combinations of datasets contribute to greater scientific impact? Proc Natl Acad Sci U S A 2024; 121:e2402802121. [PMID: 39356667 PMCID: PMC11474085 DOI: 10.1073/pnas.2402802121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 08/07/2024] [Indexed: 10/04/2024] Open
Abstract
Scientific datasets play a crucial role in contemporary data-driven research, as they allow for the progress of science by facilitating the discovery of new patterns and phenomena. This mounting demand for empirical research raises important questions on how strategic data utilization in research projects can stimulate scientific advancement. In this study, we examine the hypothesis inspired by the recombination theory, which suggests that innovative combinations of existing knowledge, including the use of unusual combinations of datasets, can lead to high-impact discoveries. Focusing on social science, we investigate the scientific outcomes of such atypical data combinations in more than 30,000 publications that leverage over 5,000 datasets curated within one of the largest social science databases, Interuniversity Consortium for Political and Social Research. This study offers four important insights. First, combining datasets, particularly those infrequently paired, significantly contributes to both scientific and broader impacts (e.g., dissemination to the general public). Second, infrequently paired datasets maintain a strong association with citation even after controlling for the atypicality of dataset topics. In contrast, the atypicality of dataset topics has a much smaller positive impact on citation counts. Third, smaller and less experienced research teams tend to use atypical combinations of datasets in research more frequently than their larger and more experienced counterparts. Last, despite the benefits of data combination, papers that amalgamate data remain infrequent. This finding suggests that the unconventional combination of datasets is an underutilized but powerful strategy correlated with the scientific impact and broader dissemination of scientific discoveries.
Collapse
Affiliation(s)
- Yulin Yu
- School of Information, University of Michigan, Ann Arbor, MI48109
| | - Daniel M. Romero
- School of Information, University of Michigan, Ann Arbor, MI48109
- Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI48109
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI48109
| |
Collapse
|
2
|
Ronquillo JG, South B, Naik P, Singh R, De Jesus M, Watt SJ, Habtezion A. Informatics and Artificial Intelligence-Guided Assessment of the Regulatory and Translational Research Landscape of First-in-Class Oncology Drugs in the United States, 2018-2022. JCO Clin Cancer Inform 2024; 8:e2400087. [PMID: 39348666 DOI: 10.1200/cci.24.00087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 06/23/2024] [Accepted: 08/13/2024] [Indexed: 10/02/2024] Open
Abstract
PURPOSE Cancer drug development remains a critical but challenging process that affects millions of patients and their families. Using biomedical informatics and artificial intelligence (AI) approaches, we assessed the regulatory and translational research landscape defining successful first-in-class drugs for patients with cancer. METHODS This is a retrospective observational study of all novel first-in-class drugs approved by the US Food and Drug Administration (FDA) from 2018 to 2022, stratified by cancer versus noncancer drugs. A biomedical informatics pipeline leveraging interoperability standards and ChatGPT performed integration and analysis of public databases provided by the FDA, National Institutes of Health, and WHO. RESULTS Between 2018 and 2022, the FDA approved a total of 247 novel drugs, of which 107 (43.3%) were first-in-class drugs involving a new biologic target. Of these first-in-class drugs, 30 (28%) treatments were indicated for patients with cancer, including 19 (63.3%) for solid tumors and the remaining 11 (36.7%) for hematologic cancers. A median of 68 publications of basic, clinical, and other relevant translational science preceded successful FDA approval of first-in-class cancer drugs, with oncology-related treatments involving fewer median years of target-based research than therapies not related to cancer (33 v 43 years; P < .05). Overall, 94.4% of first-in-class drugs had at least 25 years of target-related research papers, while 85.5% of first-in-class drugs had at least 10 years of translational research publications. CONCLUSION Novel first-in-class cancer treatments are defined by diverse clinical indications, personalized molecular targets, dependence on expedited regulatory pathways, and translational research metrics reflecting this complex landscape. Biomedical informatics and AI provide scalable, data-driven ways to assess and even address important challenges in the drug development pipeline.
Collapse
Affiliation(s)
- Jay G Ronquillo
- Worldwide Medical and Safety, Pfizer Inc, New York, NY
- Pfizer Research and Development, Pfizer Inc, New York, NY
| | - Brett South
- Worldwide Medical and Safety, Pfizer Inc, New York, NY
- Pfizer Research and Development, Pfizer Inc, New York, NY
| | - Prakash Naik
- Pfizer Research and Development, Pfizer Inc, New York, NY
| | - Rominder Singh
- Pfizer Research and Development, Pfizer Inc, New York, NY
| | - Magdia De Jesus
- Worldwide Medical and Safety, Pfizer Inc, New York, NY
- Pfizer Research and Development, Pfizer Inc, New York, NY
| | - Stephen J Watt
- Worldwide Medical and Safety, Pfizer Inc, New York, NY
- Pfizer Research and Development, Pfizer Inc, New York, NY
| | - Aida Habtezion
- Worldwide Medical and Safety, Pfizer Inc, New York, NY
- Pfizer Research and Development, Pfizer Inc, New York, NY
| |
Collapse
|
3
|
Berkes E, Marion M, Milojević S, Weinberg BA. Slow convergence: Career impediments to interdisciplinary biomedical research. Proc Natl Acad Sci U S A 2024; 121:e2402646121. [PMID: 39074264 PMCID: PMC11317606 DOI: 10.1073/pnas.2402646121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 05/29/2024] [Indexed: 07/31/2024] Open
Abstract
Despite the long-standing calls for increased levels of interdisciplinary research as a way to address society's grand challenges, most science is still disciplinary. To understand the slow rate of convergence to more interdisciplinary research, we examine 154,021 researchers who received a PhD in a biomedical field between 1970 and 2013, measuring the interdisciplinarity of their articles using the disciplinary composition of references. We provide a range of evidence that interdisciplinary research is impactful, but that those who conduct it face early career impediments. The researchers who are initially the most interdisciplinary tend to stop publishing earlier in their careers-it takes about 8 y for half of the researchers in the top percentile in terms of initial interdisciplinarity to stop publishing, compared to more than 20 y for moderately interdisciplinary researchers (10th to 75th percentiles). Moreover, perhaps in response to career challenges, initially interdisciplinary researchers on average decrease their interdisciplinarity over time. These forces reduce the stock of interdisciplinary researchers who can train future cohorts. Indeed, new graduates tend to be less interdisciplinary than the stock of active researchers. We show that interdisciplinarity does increase over time despite these dampening forces because initially disciplinary researchers become more interdisciplinary as their careers progress.
Collapse
Affiliation(s)
- Enrico Berkes
- Department of Economics, The University of Maryland Baltimore County, Baltimore, MD21250
| | - Monica Marion
- Center for Complex Networks and Systems Research, Luddy School of Informatics, Computing, and Engineering, Department of Informatics, Indiana University, Bloomington, IN47401
| | - Staša Milojević
- Center for Complex Networks and Systems Research, Luddy School of Informatics, Computing, and Engineering, Department of Informatics, Indiana University, Bloomington, IN47401
| | - Bruce A. Weinberg
- Department of Economics, The Ohio State University, Columbus, OH43210
- IZA Institute of Labor Economics, BonnD-53113, Germany
- National Bureau of Economic Research, Cambridge, MA02138
| |
Collapse
|
4
|
Liao S, Lavender C, Zhai H. Factors influencing the research impact in cancer research: a collaboration and knowledge network analysis. Health Res Policy Syst 2024; 22:96. [PMID: 39107778 PMCID: PMC11304674 DOI: 10.1186/s12961-024-01205-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 07/30/2024] [Indexed: 08/10/2024] Open
Abstract
BACKGROUND Cancer is a major public health challenge globally. However, little is known about the evolution patterns of cancer research communities and the influencing factors of their research capacity and impact, which is affected not only by the social networks established through research collaboration but also by the knowledge networks in which the research projects are embedded. METHODS The focus of this study was narrowed to a specific topic - 'synthetic lethality' - in cancer research. This field has seen vibrant growth and multidisciplinary collaboration in the past decade. Multi-level collaboration and knowledge networks were established and analysed on the basis of bibliometric data from 'synthetic lethality'-related cancer research papers. Negative binomial regression analysis was further applied to explore how node attributes within these networks, along with other potential factors, affected paper citations, which are widely accepted as proxies for assessing research capacity and impact. RESULTS Our study revealed that the synthetic lethality-based cancer research field is characterized by a knowledge network with high integration, alongside a collaboration network exhibiting some clustering. We found significant correlations between certain factors and citation counts. Specifically, a leading status within the nation-level international collaboration network and industry involvement were both found to be significantly related to higher citations. In the individual-level collaboration networks, lead authors' degree centrality has an inverted U-shaped relationship with citations, while their structural holes exhibit a positive and significant effect. Within the knowledge network, however, only measures of structural holes have a positive and significant effect on the number of citations. CONCLUSIONS To enhance cancer research capacity and impact, non-leading countries should take measures to enhance their international collaboration status. For early career researchers, increasing the number of collaborators seems to be more effective. University-industry cooperation should also be encouraged, enhancing the integration of human resources, technology, funding, research platforms and medical resources. Insights gained through this study also provide recommendations to researchers or administrators in designing future research directions from a knowledge network perspective. Focusing on unique issues especially interdisciplinary fields will improve output and influence their research work.
Collapse
Affiliation(s)
- Shuang Liao
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, No. 651 Dongfeng East Road, Guangzhou, 510060, Guangdong, People's Republic of China.
| | - Christopher Lavender
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, No. 651 Dongfeng East Road, Guangzhou, 510060, Guangdong, People's Republic of China
| | - Huiwen Zhai
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, No. 651 Dongfeng East Road, Guangzhou, 510060, Guangdong, People's Republic of China
| |
Collapse
|
5
|
Lewis M, Cahill A, Madnani N, Evans J. Local similarity and global variability characterize the semantic space of human languages. Proc Natl Acad Sci U S A 2023; 120:e2300986120. [PMID: 38079546 PMCID: PMC10743503 DOI: 10.1073/pnas.2300986120] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 11/06/2023] [Indexed: 12/18/2023] Open
Abstract
How does meaning vary across the world's languages? Scholars recognize the existence of substantial variability within specific domains, ranging from nature and color to kinship. The emergence of large language models enables a systems-level approach that directly characterizes this variability through comparison of word organization across semantic domains. Here, we show that meanings across languages manifest lower variability within semantic domains and greater variability between them, using models trained on both 1) large corpora of native language text comprising Wikipedia articles in 35 languages and also 2) Test of English as a Foreign Language (TOEFL) essays written by 38,500 speakers from the same native languages, which cluster into semantic domains. Concrete meanings vary less across languages than abstract meanings, but all vary with geographical, environmental, and cultural distance. By simultaneously examining local similarity and global difference, we harmonize these findings and provide a description of general principles that govern variability in semantic space across languages. In this way, the structure of a speaker's semantic space influences the comparisons cognitively salient to them, as shaped by their native language, and suggests that even successful bilingual communicators likely think with "semantic accents" driven by associations from their native language while writing English. These findings have dramatic implications for language education, cross-cultural communication, and literal translations, which are impossible not because the objects of reference are uncertain, but because associations, metaphors, and narratives interlink meanings in different, predictable ways from one language to another.
Collapse
Affiliation(s)
- Molly Lewis
- Psychology & Social and Decision Sciences, Carnegie Mellon University, Pittsburgh, PA15213
| | | | | | - James Evans
- Sociology & Data Science, University of Chicago, Chicago, IL60637
- Santa Fe Institute, Santa Fe, NM87501
| |
Collapse
|