1
|
Liu Z, Luo P, Tang X, Wang J, Nie L. Unfolding the downloads of datasets: A multifaceted exploration of influencing factors. Sci Data 2024; 11:760. [PMID: 38992048 PMCID: PMC11239836 DOI: 10.1038/s41597-024-03591-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 07/01/2024] [Indexed: 07/13/2024] Open
Abstract
Scientific data are essential to advancing scientific knowledge and are increasingly valued as scholarly output. Understanding what drives dataset downloads is crucial for their effective dissemination and reuse. Our study, analysing 55,473 datasets from 69 data repositories, identifies key factors driving dataset downloads, focusing on interpretability, reliability, and accessibility. We find that while lengthy descriptive texts can deter users due to complexity and time requirements, readability boosts a dataset's appeal. Reliability, evidenced by factors like institutional reputation and citation counts of related papers, also significantly increases a dataset's attractiveness and usage. Additionally, our research shows that open access to datasets increases their downloads and amplifies the importance of interpretability and reliability. This indicates that easy access enhances the overall attractiveness and usage of datasets in the scholarly community. By emphasizing interpretability, reliability, and accessibility, this study offers a comprehensive framework for future research and guides data management practices toward ensuring clarity, credibility, and open access to maximize the impact of scientific datasets.
Collapse
Affiliation(s)
- Zhifeng Liu
- Department of Information Management, Peking University, Beijing, 100871, China
| | | | - Xinglong Tang
- Department of Information Management, Peking University, Beijing, 100871, China
| | - Jimin Wang
- Department of Information Management, Peking University, Beijing, 100871, China
| | - Lei Nie
- Country and Area Studies Academy, Beijing Foreign Studies University, Beijing, 100089, China.
| |
Collapse
|
2
|
Reed CJ, Denise R, Hourihan J, Babor J, Jaroch M, Martinelli M, Hutinet G, de Crécy-Lagard V. Beyond blast: enabling microbiologists to better extract literature, taxonomic distributions and gene neighbourhood information for protein families. Microb Genom 2024; 10:001183. [PMID: 38323604 PMCID: PMC10926702 DOI: 10.1099/mgen.0.001183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 01/08/2024] [Indexed: 02/08/2024] Open
Abstract
Capturing the published corpus of information on all members of a given protein family should be an essential step in any study focusing on specific members of that family. Using a previously gathered dataset of more than 280 references mentioning a member of the DUF34 (NIF3/Ngg1-interacting Factor 3) family, we evaluated the efficiency of different databases and search tools, and devised a workflow that experimentalists can use to capture the most information published on members of a protein family in the least amount of time. To complement this workflow, web-based platforms allowing for the exploration of protein family members across sequenced genomes or for the analysis of gene neighbourhood information were reviewed for their versatility and ease of use. Recommendations that can be used for experimentalist users, as well as educators, are provided and integrated within a customized, publicly accessible Wiki.
Collapse
Affiliation(s)
- Colbie J. Reed
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| | - Rémi Denise
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
- APC Microbiome Ireland, University College Cork, Cork, Ireland
| | - Jacob Hourihan
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| | - Jill Babor
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| | - Marshall Jaroch
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| | - Maria Martinelli
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
- Burnett School of Biomedical Sciences, University of Central Florida, Orlando, FL, USA
| | | | - Valérie de Crécy-Lagard
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
- Department of Biology, Haverford College, Haverford, PA, USA
- UF Genetics Institute, University of Florida, Gainesville, FL, USA
| |
Collapse
|
3
|
Jiao C, Li K, Fang Z. How are exclusively data journals indexed in major scholarly databases? An examination of four databases. Sci Data 2023; 10:737. [PMID: 37880300 PMCID: PMC10600123 DOI: 10.1038/s41597-023-02625-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 10/09/2023] [Indexed: 10/27/2023] Open
Abstract
The data paper is becoming a popular way for researchers to publish their research data. The growing numbers of data papers and journals hosting them have made them an important data source for understanding how research data is published and reused. One barrier to this research agenda is a lack of knowledge as to how data journals and their publications are indexed in the scholarly databases used for quantitative analysis. To address this gap, this study examines how a list of 18 exclusively data journals (i.e., journals that primarily accept data papers) are indexed in four popular scholarly databases: the Web of Science, Scopus, Dimensions, and OpenAlex. We investigate how comprehensively these databases cover the selected data journals and, in particular, how they present the document type information of data papers. We find that the coverage of data papers, as well as their document type information, is highly inconsistent across databases, which creates major challenges for future efforts to study them quantitatively, which should be addressed in the future.
Collapse
Affiliation(s)
- Chenyue Jiao
- School of Information Sciences, University of Illinois Urbana-Champaign, 501 E. Daniel St., Champaign, IL, 61820, USA
| | - Kai Li
- School of Information Sciences, University of Tennessee, Knoxville, 451 Communications Building 1345 Circle Park Drive, Knoxville, TN, 37996, USA.
| | - Zhichao Fang
- School of Information Resource Management, Renmin University of China, Beijing, 100872, China
- Centre for Science and Technology Studies (CWTS), Leiden University, Kolffpad 1, 2333 BN, Leiden, the Netherlands
| |
Collapse
|
4
|
Understanding the meanings of citations using sentiment, role, and citation function classifications. Scientometrics 2022. [DOI: 10.1007/s11192-022-04567-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
AbstractTraditional citation analyses use quantitative methods only, even though there is meaning in the sentences containing citations within the text. This article analyzes three citation meanings: sentiment, role, and function. We compare citation meanings patterns between fields of science and propose an appropriate deep learning model to classify the three meanings automatically at once. The data comes from Indonesian journal articles covering five different areas of science: food, energy, health, computer, and social science. The sentences in the article text were classified manually and used as training data for an automatic classification model. Several classic models were compared with the proposed multi-output convolutional neural network model. The manual classification revealed similar patterns in citation meaning across the science fields: (1) not many authors exhibit polarity when citing, (2) citations are still rarely used, and (3) citations are used mostly for introductions and establishing relations instead of for comparisons with and utilizing previous research. The proposed model’s automatic classification metric achieved a macro F1 score of 0.80 for citation sentiment, 0.84 for citation role, and 0.88 for citation function. The model can classify minority classes well concerning the unbalanced dataset. A machine model that can classify several citation meanings automatically is essential for analyzing big data of journal citations.
Collapse
|
5
|
Jiao C, Li K, Fang Z. Data sharing practices across knowledge domains: A dynamic examination of data availability statements in PLOS ONE publications. J Inf Sci 2022. [DOI: 10.1177/01655515221101830] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
As the importance of research data gradually grows in sciences, data sharing has come to be encouraged and even mandated by journals and funders in recent years. Following this trend, the data availability statement has been increasingly embraced by academic communities as a means of sharing research data as part of research articles. This article presents a quantitative study of which mechanisms and repositories are used to share research data in PLOS ONE articles. We offer a dynamic examination of this topic from the disciplinary and temporal perspectives based on all statements in English-language research articles published between 2014 and 2020 in the journal. We find a slow yet steady growth in the use of data repositories to share data over time, as opposed to sharing data in the article and/or supplementary materials; this indicates improved compliance with the journal’s data sharing policies. We also find that multidisciplinary data repositories have been increasingly used over time, whereas some disciplinary repositories show a decreasing trend. Our findings can help academic publishers and funders to improve their data sharing policies and serve as an important baseline dataset for future studies on data sharing activities.
Collapse
Affiliation(s)
- Chenyue Jiao
- School of Information Sciences, University of Illinois Urbana-Champaign, USA
| | - Kai Li
- School of Information Resource Management, Renmin University of China, China
| | - Zhichao Fang
- Centre for Science and Technology Studies, Leiden University, The Netherlands
| |
Collapse
|
6
|
Fan W, Jeng W, Tang M. Using data citation to define a knowledge domain: A case study of the
Add‐Health
dataset. J Assoc Inf Sci Technol 2022. [DOI: 10.1002/asi.24688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Wei‐Min Fan
- Department of Library and Information Science National Taiwan University Taipei Taiwan
| | - Wei Jeng
- Department of Library and Information Science National Taiwan University Taipei Taiwan
| | - Muh‐Chyun Tang
- Department of Library and Information Science National Taiwan University Taipei Taiwan
| |
Collapse
|
7
|
Credit distribution in relational scientific databases. INFORM SYST 2022. [DOI: 10.1016/j.is.2022.102060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
8
|
Hemphill L, Pienta A, Lafia S, Akmon D, Bleckley DA. How do properties of data, their curation, and their funding relate to reuse? J Assoc Inf Sci Technol 2022; 73:1432-1444. [PMID: 36246529 PMCID: PMC9542848 DOI: 10.1002/asi.24646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 02/11/2022] [Accepted: 03/09/2022] [Indexed: 11/30/2022]
Abstract
Despite large public investments in facilitating the secondary use of data, there is little information about the specific factors that predict data's reuse. Using data download logs from the Inter‐university Consortium for Political and Social Research (ICPSR), this study examines how data properties, curation decisions, and repository funding models relate to data reuse. We find that datasets deposited by institutions, subject to many curatorial tasks, and whose access and preservation is funded externally, are used more often. Our findings confirm that investments in data collection, curation, and preservation are associated with more data reuse.
Collapse
Affiliation(s)
- Libby Hemphill
- Inter‐university Consortium for Political and Social Research (ICPSR) University of Michigan Ann Arbor Michigan
- School of Information (UMSI) University of Michigan Ann Arbor Michigan
| | - Amy Pienta
- Inter‐university Consortium for Political and Social Research (ICPSR) University of Michigan Ann Arbor Michigan
| | - Sara Lafia
- Inter‐university Consortium for Political and Social Research (ICPSR) University of Michigan Ann Arbor Michigan
| | - Dharma Akmon
- Inter‐university Consortium for Political and Social Research (ICPSR) University of Michigan Ann Arbor Michigan
| | - David A. Bleckley
- Inter‐university Consortium for Political and Social Research (ICPSR) University of Michigan Ann Arbor Michigan
| |
Collapse
|
9
|
Mining the evolutionary process of knowledge through multiple relationships between keywords. Scientometrics 2022. [DOI: 10.1007/s11192-022-04272-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
10
|
Buneman P, Dosso D, Lissandrini M, Silvello G. Data citation and the citation graph. QUANTITATIVE SCIENCE STUDIES 2022. [DOI: 10.1162/qss_a_00166] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Abstract
The citation graph is a computational artifact that is widely used to represent the domain of published literature. It represents connections between published works, such as citations and authorship. Among other things, the graph supports the computation of bibliometric measures such as h-indexes and impact factors. There is now an increasing demand that we should treat the publication of data in the same way that we treat conventional publications. In particular, we should cite data for the same reasons that we cite other publications. In this paper we discuss what is needed for the citation graph to represent data citation. We identify two challenges: to model the evolution of credit appropriately (through references) over time and to model data citation not only to a data set treated as a single object but also to parts of it. We describe an extension of the current citation graph model that addresses these challenges. It is built on two central concepts: citable units and reference subsumption. We discuss how this extension would enable data citation to be represented within the citation graph and how it allows for improvements in current practices for bibliometric computations, both for scientific publications and for data.
Collapse
|
11
|
OLIVEIRA CCD, SILVA MCD, PAVÃO CMG, SILVA FCCD, MOURA AMMD, BARROS THB. A teoria da citação de dados: uma revisão da produção científica na América Latina. TRANSINFORMACAO 2022. [DOI: 10.1590/2318-0889202234e210062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Resumo: Trata-se de uma pesquisa bibliográfica, de caráter qualitativo, que buscou identificar o estado da arte acerca da teoria da citação dos dados na produção científica conduzida na América Latina. Para tanto, foram estabelecidas expressões em português, inglês e espanhol acerca da referida temática, que foram utilizadas para explorar as seguintes bases de dados, repositórios e buscadores: Biblioteca Digital Brasileira de Teses e Dissertações, OasisBR, La referencia, Redalyc, Networked Digital Library of Theses and Dissertations, Portal de Periódicos Capes, Google Acadêmico, SciELO e Brapci (Base de Dados Referenciais de Artigos de Periódicos em Ciência da Informação). Após a análise dos trabalhos recuperados, foram considerados somente aqueles que discutiam a temática de citação de dados de pesquisa de maneira aprofundada, com a finalidade de contribuírem para a reflexão acerca de uma teoria da citação de dados, totalizando 19 trabalhos. Conclui-se que existe uma ausência significativa de trabalhos na América Latina concernente à teoria da citação de dados, ao mesmo tempo em que foram identificados trabalhos que, embora não se refiram a uma teoria propriamente, oferecem contribuições significativas para a temática de citação de dados de pesquisa e que podem servir de base para o desenvolvimento de trabalhos sobre a teoria da citação de dados. Constatou-se ainda que o Brasil se destacou na produção de trabalhos sobre citação de dados de pesquisa, sendo que dos 19 trabalhos analisados nesta pesquisa, 17 eram produções brasileiras.
Collapse
|
12
|
Lange M, Alako BTF, Cochrane G, Ghaffar M, Mascher M, Habekost PK, Hillebrand U, Scholz U, Schorch F, Freitag J, Scholz AH. Quantitative monitoring of nucleotide sequence data from genetic resources in context of their citation in the scientific literature. Gigascience 2021; 10:giab084. [PMID: 34966925 PMCID: PMC8716361 DOI: 10.1093/gigascience/giab084] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Revised: 08/04/2021] [Accepted: 11/29/2021] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Linking nucleotide sequence data (NSD) to scientific publication citations can enhance understanding of NSD provenance, scientific use, and reuse in the community. By connecting publications with NSD records, NSD geographical provenance information, and author geographical information, it becomes possible to assess the contribution of NSD to infer trends in scientific knowledge gain at the global level. FINDINGS We extracted and linked records from the European Nucleotide Archive to citations in open-access publications aggregated at Europe PubMed Central. A total of 8,464,292 ENA accessions with geographical provenance information were associated with publications. We conducted a data quality review to uncover potential issues in publication citation information extraction and author affiliation tagging and developed and implemented best-practice recommendations for citation extraction. We constructed flat data tables and a data warehouse with an interactive web application to enable ad hoc exploration of NSD use and summary statistics. CONCLUSIONS The extraction and linking of NSD with associated publication citations enables transparency. The quality review contributes to enhanced text mining methods for identifier extraction and use. Furthermore, the global provision and use of NSD enable scientists worldwide to join literature and sequence databases in a multidimensional fashion. As a concrete use case, we visualized statistics of country clusters concerning NSD access in the context of discussions around digital sequence information under the United Nations Convention on Biological Diversity.
Collapse
Affiliation(s)
- Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research, Department Breeding Research, OT Gatersleben, Corrensstrasse 3, 06466 Seeland, Germany
| | - Blaise T F Alako
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Guy Cochrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Mehmood Ghaffar
- Leibniz Institute of Plant Genetics and Crop Plant Research, Department Breeding Research, OT Gatersleben, Corrensstrasse 3, 06466 Seeland, Germany
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research, Department Breeding Research, OT Gatersleben, Corrensstrasse 3, 06466 Seeland, Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstraße 4, 04103 Leipzig, Germany
| | - Pia-Katharina Habekost
- Leibniz Institute of Plant Genetics and Crop Plant Research, Department Breeding Research, OT Gatersleben, Corrensstrasse 3, 06466 Seeland, Germany
- The Harz University of Applied Science, Department of Automation and Computer Science, Friedrichstraße 57, 38855 Wernigerode, Germany
| | - Upneet Hillebrand
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures GmbH, Department Research - Microbial Ecology and Diversity, Inhoffenstraße 7B, 38124 Braunschweig, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research, Department Breeding Research, OT Gatersleben, Corrensstrasse 3, 06466 Seeland, Germany
| | - Florian Schorch
- Leibniz Institute of Plant Genetics and Crop Plant Research, Department Breeding Research, OT Gatersleben, Corrensstrasse 3, 06466 Seeland, Germany
- The Harz University of Applied Science, Department of Automation and Computer Science, Friedrichstraße 57, 38855 Wernigerode, Germany
| | - Jens Freitag
- Leibniz Institute of Plant Genetics and Crop Plant Research, Department Breeding Research, OT Gatersleben, Corrensstrasse 3, 06466 Seeland, Germany
| | - Amber Hartman Scholz
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures GmbH, Department Research - Microbial Ecology and Diversity, Inhoffenstraße 7B, 38124 Braunschweig, Germany
| |
Collapse
|
13
|
Li K, Jiao C, Sugimoto CR, Larivière V. Versioning boundary objects: the citation profile of the Diagnostic and Statistical Manual for Mental Disorders (DSM). JOURNAL OF DOCUMENTATION 2021. [DOI: 10.1108/jd-06-2021-0117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeResearch objects, such as datasets and classification standards, are difficult to be incorporated into a document-centric framework of citations, which relies on unique citable works. The Diagnostic and Statistical Manual for Mental Disorder (DSM)—a dominant classification scheme used for mental disorder diagnosis—however provides a unique lens on examining citations to a research object, given that it straddles the boundaries as a single research object with changing manifestations.Design/methodology/approachUsing over 180,000 citations received by the DSM, this paper analyzes how the citation history of DSM is represented by its various versions, and how it is cited in different knowledge domains as an important boundary object.FindingsIt shows that all recent DSM versions exhibit a similar citation cascading pattern, which is characterized by a strong replacement effect between two successive versions. Moreover, the shift of the disciplinary contexts of DSM citations can be largely explained by different DSM versions as distinct epistemic objects.Practical implicationsBased on these results, the authors argue that all DSM versions should be treated as a series of connected but distinct citable objects. The work closes with a discussion of the ways in which the existing scholarly infrastructure can be reconfigured to acknowledge and trace a broader array of research objects.Originality/valueThis paper connects quantitative methods and an important sociological concept, i.e. boundary object, to offer deeper insights into the scholarly communication system. Moreover, this work also evaluates how versioning, as a significant yet overlooked attribute of information resources, influenced the citation patterns of citable objects, which will contribute to more material-oriented scientific infrastructures.
Collapse
|
14
|
Thelwall M. Alternative medicines worth researching? Citation analyses of acupuncture, chiropractic, homeopathy, and osteopathy 1996-2017. Scientometrics 2021; 126:8731-8747. [PMID: 34493881 PMCID: PMC8414961 DOI: 10.1007/s11192-021-04145-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 08/20/2021] [Indexed: 11/29/2022]
Abstract
Some complementary and alternative medicines (CAM) are frequently criticised for being based on faith rather than scientific evidence. Despite this, researchers, academic departments, and institutes teach and investigate them. This article assesses whether the scholarship produced by four CAMs is valued by the academic community in terms of citations, and whether the level of citations received might be detrimental to academic authors' careers. Based on an analysis of acupuncture, chiropractic, homeopathy, and osteopathy journal articles indexed in Scopus 1996-2020, the results show that the prevalence of the four areas vary substantially internationally, with acupuncture eclipsing the others in East Asia but homeopathy being more common in India and Brazil. The main broad fields publishing these specialties are Medicine, Nursing, Health Professions, Veterinary Science, and Neuroscience. Whilst the research tends to be cited at a below average rate in most broad fields (n = 27) and years (1996-2017), acupuncture, chiropractic, and homeopathy are exceptions in some broad fields, including some core areas. Thus, studying these alternative medicines may not always lead to research that tends to be ignored in academia, even if many scientists disparage it. As a corollary, citation analysis cannot be relied on to give low scores to widely disparaged areas of scholarship.
Collapse
Affiliation(s)
- Mike Thelwall
- Statistical Cybermetrics Research Group, University of Wolverhampton, Wolverhampton, UK
| |
Collapse
|
15
|
Xie Q, Wang J, Kim G, Lee S, Song M. A sensitivity analysis of factors influential to the popularity of shared data in data repositories. J Informetr 2021. [DOI: 10.1016/j.joi.2021.101142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
16
|
Agarwal DA, Damerow J, Varadharajan C, Christianson DS, Pastorello GZ, Cheah YW, Ramakrishnan L. Balancing the needs of consumers and producers for scientific data collections. ECOL INFORM 2021. [DOI: 10.1016/j.ecoinf.2021.101251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
17
|
Soeharjono S, Roche DG. Reported Individual Costs and Benefits of Sharing Open Data among Canadian Academic Faculty in Ecology and Evolution. Bioscience 2021. [DOI: 10.1093/biosci/biab024] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Abstract
Open data facilitate reproducibility and accelerate scientific discovery but are hindered by perceptions that researchers bear costs and gain few benefits from publicly sharing their data, with limited empirical evidence to the contrary. We surveyed 140 faculty members working in ecology and evolution across Canada's top 20 ranked universities and found that more researchers report benefits (47.9%) and neutral outcomes (43.6%) than costs (21.4%) from openly sharing data. The benefits were independent of career stage and gender, but men and early career researchers were more likely to report costs. We outline mechanisms proposed by the study participants to reduce the individual costs and increase the benefits of open data for faculty members.
Collapse
Affiliation(s)
- Sandrine Soeharjono
- Département de Science Biologiques, Université de Montréal, Montréal, Canada
| | - Dominique G Roche
- Institut de Biologie, Université de Neuchâtel, Neuchâtel, Switzerland
- Department of Biology, Carleton University, Ottawa, Canada
| |
Collapse
|
18
|
|
19
|
Dorta-González P, González-Betancor SM, Dorta-González MI. To what extent is researchers' data-sharing motivated by formal mechanisms of recognition and credit? Scientometrics 2021. [DOI: 10.1007/s11192-021-03869-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
20
|
Yoon J, Chung E, Schalk J, Kim J. Examination of data citation guidelines in style manuals and data repositories. LEARNED PUBLISHING 2020. [DOI: 10.1002/leap.1349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- JungWon Yoon
- Department of Library and Information Science Jeonbuk National University Jeonju‐si South Korea
| | - EunKyung Chung
- Department of Library and Information Science Ewha Womans University Seoul South Korea
| | - Janet Schalk
- Pasco‐Hernando State College, Porter Campus at Wiregrass Ranch Library Wesley Chapel Florida USA
| | - Jihyun Kim
- Department of Library and Information Science Ewha Womans University Seoul South Korea
| |
Collapse
|
21
|
|
22
|
Suhr B, Dungl J, Stocker A. Search, reuse and sharing of research data in materials science and engineering-A qualitative interview study. PLoS One 2020; 15:e0239216. [PMID: 32931508 PMCID: PMC7491734 DOI: 10.1371/journal.pone.0239216] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Accepted: 09/01/2020] [Indexed: 12/05/2022] Open
Abstract
Open research data practices are a relatively new, thus still evolving part of scientific work, and their usage varies strongly within different scientific domains. In the literature, the investigation of open research data practices covers the whole range of big empirical studies covering multiple scientific domains to smaller, in depth studies analysing a single field of research. Despite the richness of literature on this topic, there is still a lack of knowledge on the (open) research data awareness and practices in materials science and engineering. While most current studies focus only on some aspects of open research data practices, we aim for a comprehensive understanding of all practices with respect to the considered scientific domain. Hence this study aims at 1) drawing the whole picture of search, reuse and sharing of research data 2) while focusing on materials science and engineering. The chosen approach allows to explore the connections between different aspects of open research data practices, e.g. between data sharing and data search. In depth interviews with 13 researchers in this field were conducted, transcribed verbatim, coded and analysed using content analysis. The main findings characterised research data in materials science and engineering as extremely diverse, often generated for a very specific research focus and needing a precise description of the data and the complete generation process for possible reuse. Results on research data search and reuse showed that the interviewees intended to reuse data but were mostly unfamiliar with (yet interested in) modern methods as dataset search engines, data journals or searching public repositories. Current research data sharing is not open, but bilaterally and usually encouraged by supervisors or employers. Project funding does affect data sharing in two ways: some researchers argue to share their data openly due to their funding agency's policy, while others face legal restrictions for sharing as their projects are partly funded by industry. The time needed for a precise description of the data and their generation process is named as biggest obstacle for data sharing. From these findings, a precise set of actions is derived suitable to support Open Data, involving training for researchers and introducing rewards for data sharing on the level of universities and funding bodies.
Collapse
|
23
|
Walters WH. Data journals: incentivizing data access and documentation within the scholarly communication system. INSIGHTS THE UKSG JOURNAL 2020. [DOI: 10.1629/uksg.510] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
24
|
Zeng T, Wu L, Bratt S, Acuna DE. Assigning credit to scientific datasets using article citation networks. J Informetr 2020. [DOI: 10.1016/j.joi.2020.101013] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
25
|
Identifying Data Sharing and Reuse with Scholix: Potentials and Limitations. PATTERNS 2020; 1:100007. [PMID: 33205084 PMCID: PMC7660440 DOI: 10.1016/j.patter.2020.100007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Revised: 01/27/2020] [Accepted: 02/24/2020] [Indexed: 11/21/2022]
Abstract
The Scholexplorer API, based on the Scholix (Scholarly Link eXchange) framework, aims to identify links between articles and supporting data. This quantitative case study demonstrates that the API vastly expanded the number of datasets previously known to be affiliated with University of Bath outputs, allowing improved monitoring of compliance with funder mandates by identifying peer-reviewed articles linked to at least one unique dataset. Availability of author names for research outputs increased from 2.4% to 89.2%, which enabled identification of ten articles reusing non-Bath-affiliated datasets published in external repositories in the first phase, giving valuable evidence of data reuse and impact for data producers. Of these, only three were formally cited in the references. Further enhancement of the Scholix schema and enrichment of Scholexplorer metadata using controlled vocabularies would be beneficial. The adoption of standardized data citations by journals will be critical to creating links in a more systematic manner.
Collapse
|
26
|
|
27
|
Groth P, Cousijn H, Clark T, Goble C. FAIR Data Reuse – the Path through Data Citation. DATA INTELLIGENCE 2020. [DOI: 10.1162/dint_a_00030] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
One of the key goals of the FAIR guiding principles is defined by its final principle – to optimize data sets for reuse by both humans and machines. To do so, data providers need to implement and support consistent machine readable metadata to describe their data sets. This can seem like a daunting task for data providers, whether it is determining what level of detail should be provided in the provenance metadata or figuring out what common shared vocabularies should be used. Additionally, for existing data sets it is often unclear what steps should be taken to enable maximal, appropriate reuse. Data citation already plays an important role in making data findable and accessible, providing persistent and unique identifiers plus metadata on over 16 million data sets. In this paper, we discuss how data citation and its underlying infrastructures, in particular associated metadata, provide an important pathway for enabling FAIR data reuse.
Collapse
Affiliation(s)
- Paul Groth
- Informatics Institute, University of Amsterdam, Amsterdam 1090 GH, The Netherlands
| | | | - Tim Clark
- Data Science Institute, University of Virginia, Charlottesville, VA 22903-1738, USA
| | - Carole Goble
- Department of Computer Science, The University of Manchester, Oxford Road, Manchester M13 9PL, UK
| |
Collapse
|
28
|
|
29
|
Abstract
Purpose
Chemical databases have had a significant impact on the way scientists search for and use information. The purpose of this paper is to spark informed discussion and fuel debate on the issue of citations to chemical databases.
Design/methodology/approach
A citation analysis to four major chemical databases was undertaken to examine resource coverage and impact in the scientific literature. Two commercial databases (SciFinder and Reaxys) and two public databases (PubChem and ChemSpider) were analyzed using the “Cited Reference Search” in the Science Citation Index Expanded from the Web of Science (WoS) database. Citations to these databases between 2000 and 2016 (inclusive) were evaluated by document types and publication growth curves. A review of the distribution trends of chemical databases in peer-reviewed articles was conducted through a citation count analysis by country, organization, journal and WoS category.
Findings
In total, 862 scholarly articles containing a citation to one or more of the four databases were identified as only steadily increasing since 2000. The study determined that authors at academic institutions worldwide reference chemical databases in high-impact journals from notable publishers and mainly in the field of chemistry.
Originality/value
The research is a first attempt to evaluate the practice of citation to major chemical databases in the scientific literature. This paper proposes that citing chemical databases gives merit and recognition to the resources as well as credibility and validity to the scholarly communication process and also further discusses recommendations for citing and referencing databases.
Collapse
|
30
|
Abstract
In earth observation and climatological sciences, data and their data services grow on a daily basis in a large spatial extent due to the high coverage rate of satellite sensors, model calculations, but also by continuous meteorological in situ observations. In order to reuse such data, especially data fragments as well as their data services in a collaborative and reproducible manner by citing the origin source, data analysts, e.g., researchers or impact modelers, need a possibility to identify the exact version, precise time information, parameter, and names of the dataset used. A manual process would make the citation of data fragments as a subset of an entire dataset rather complex and imprecise to obtain. Data in climate research are in most cases multidimensional, structured grid data that can change partially over time. The citation of such evolving content requires the approach of “dynamic data citation”. The applied approach is based on associating queries with persistent identifiers. These queries contain the subsetting parameters, e.g., the spatial coordinates of the desired study area or the time frame with a start and end date, which are automatically included in the metadata of the newly generated subset and thus represent the information about the data history, the data provenance, which has to be established in data repository ecosystems. The Research Data Alliance Data Citation Working Group (RDA Data Citation WG) summarized the scientific status quo as well as the state of the art from existing citation and data management concepts and developed the scalable dynamic data citation methodology of evolving data. The Data Centre at the Climate Change Centre Austria (CCCA) has implemented the given recommendations and offers since 2017 an operational service on dynamic data citation on climate scenario data. With the consciousness that the objective of this topic brings a lot of dependencies on bibliographic citation research which is still under discussion, the CCCA service on Dynamic Data Citation focused on the climate domain specific issues, like characteristics of data, formats, software environment, and usage behavior. The current effort beyond spreading made experiences will be the scalability of the implementation, e.g., towards the potential of an Open Data Cube solution.
Collapse
|
31
|
Koltay T. Accepted and Emerging Roles of Academic Libraries in Supporting Research 2.0. JOURNAL OF ACADEMIC LIBRARIANSHIP 2019. [DOI: 10.1016/j.acalib.2019.01.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
32
|
Mendes-Da-Silva W. Promoção de Transparência e Impacto da Pesquisa em Negócios. RAC: REVISTA DE ADMINISTRAÇÃO CONTEMPORÂNEA 2018. [DOI: 10.1590/1982-7849rac2018180210] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
|
33
|
da Silva JR, Ribeiro C, Lopes JC. Ranking Dublin Core descriptor lists from user interactions: a case study with Dublin Core Terms using the Dendro platform. INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES 2018. [DOI: 10.1007/s00799-018-0238-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|