1
|
Porter AL, Zhang Y, Newman NC. Tech mining: a revisit and navigation. Front Res Metr Anal 2024; 9:1364053. [PMID: 38741784 PMCID: PMC11089556 DOI: 10.3389/frma.2024.1364053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 04/11/2024] [Indexed: 05/16/2024] Open
Abstract
This mini-review arrays the pertinent tools and purposes of "Tech Mining" - shorthand for empirical analyses of Science, Technology and Innovation (ST&I) data. The intent is to introduce the range of tools, and show how they can complement each other. Tech Mining aims to generate powerful intelligence to help manage R&D and innovation processes. We offer a 5-part array to help relate the analytical elements. An overview of a case study of Hybrid and Electric Vehicles illustrates the complexities involved and the potential to generate valuable "intel."
Collapse
Affiliation(s)
- Alan L. Porter
- Search Technology, Inc., Peachtree Corners, GA, United States
- Technology Policy and Assessment Center, Georgia Institute of Technology, Atlanta, GA, United States
| | - Yi Zhang
- Faculty of Engineering and Information Technology, Australian Artificial Intelligence Institute, University of Technology Sydney, Ultimo, NSW, Australia
| | - Nils C. Newman
- Search Technology, Inc., Peachtree Corners, GA, United States
| |
Collapse
|
2
|
Jiang H, Fan S, Zhang N, Zhu B. Deep learning for predicting patent application outcome: The fusion of text and network embeddings. J Informetr 2023. [DOI: 10.1016/j.joi.2023.101402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
|
3
|
Chen X, Ye P, Huang L, Wang C, Cai Y, Deng L, Ren H. Exploring science-technology linkages: A deep learning-empowered solution. Inf Process Manag 2023. [DOI: 10.1016/j.ipm.2022.103255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
4
|
Rajagopal P, Aghris T, Fettah FE, Ravana SD. Clustering of Relevant Documents Based on Findability Effort in Information Retrieval. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH 2023. [DOI: 10.4018/ijirr.315764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
A user expresses their information need in the form of a query on an information retrieval (IR) system that retrieves a set of articles related to the query. The performance of the retrieval system is measured based on the retrieved content to the query, judged by expert topic assessors who are trained to find this relevant information. However, real users do not always succeed in finding relevant information in the retrieved list due to the amount of time and effort needed. This paper aims 1) to utilize the findability features to determine the amount of effort needed to find information from relevant documents using the machine learning approach and 2) to demonstrate changes in IR systems' performance when the effort is included in the evaluation. This study uses a natural language processing technique and unsupervised clustering approach to group documents by the amount of effort needed. The results show that relevant documents can be clustered using the k-means clustering approach, and the retrieval system performance varies by 23%, on average.
Collapse
Affiliation(s)
| | - Taoufik Aghris
- EMINES-School of Industrial Management, Mohammed VI Polytechnic University, Morocco
| | | | | |
Collapse
|
5
|
Knisely BM, Pavliscsak HH. Research proposal content extraction using natural language processing and semi-supervised clustering: A demonstration and comparative analysis. Scientometrics 2023; 128:3197-3224. [PMID: 37101971 PMCID: PMC10083066 DOI: 10.1007/s11192-023-04689-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 03/07/2023] [Indexed: 04/28/2023]
Abstract
Funding institutions often solicit text-based research proposals to evaluate potential recipients. Leveraging the information contained in these documents could help institutions understand the supply of research within their domain. In this work, an end-to-end methodology for semi-supervised document clustering is introduced to partially automate classification of research proposals based on thematic areas of interest. The methodology consists of three stages: (1) manual annotation of a document sample; (2) semi-supervised clustering of documents; (3) evaluation of cluster results using quantitative metrics and qualitative ratings (coherence, relevance, distinctiveness) by experts. The methodology is described in detail to encourage replication and is demonstrated on a real-world data set. This demonstration sought to categorize proposals submitted to the US Army Telemedicine and Advanced Technology Research Center (TATRC) related to technological innovations in military medicine. A comparative analysis of method features was performed, including unsupervised vs. semi-supervised clustering, several document vectorization techniques, and several cluster result selection strategies. Outcomes suggest that pretrained Bidirectional Encoder Representations from Transformers (BERT) embeddings were better suited for the task than older text embedding techniques. When comparing expert ratings between algorithms, semi-supervised clustering produced coherence ratings ~ 25% better on average compared to standard unsupervised clustering with negligible differences in cluster distinctiveness. Last, it was shown that a cluster result selection strategy that balances internal and external validity produced ideal results. With further refinement, this methodological framework shows promise as a useful analytical tool for institutions to unlock hidden insights from untapped archives and similar administrative document repositories. Supplementary Information The online version contains supplementary material available at 10.1007/s11192-023-04689-3.
Collapse
Affiliation(s)
- Benjamin M. Knisely
- Telemedicine and Advanced Technology Research Center, United States Army Medical Research and Development Command, Fort Detrick, MD 21702 USA
| | - Holly H. Pavliscsak
- Telemedicine and Advanced Technology Research Center, United States Army Medical Research and Development Command, Fort Detrick, MD 21702 USA
| |
Collapse
|
6
|
Academic collaborations: a recommender framework spanning research interests and network topology. Scientometrics 2022. [DOI: 10.1007/s11192-022-04555-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
7
|
Chen L, Xu S, Zhu L, Zhang J, Yang G, Xu H. A deep learning based method benefiting from characteristics of patents for semantic relation classification. J Informetr 2022. [DOI: 10.1016/j.joi.2022.101312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
8
|
Huang L, Cai Y, Zhao E, Zhang S, Shu Y, Fan J. Measuring the interdisciplinarity of Information and Library Science interactions using citation analysis and semantic analysis. Scientometrics 2022. [DOI: 10.1007/s11192-022-04401-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
9
|
Reviewer recommendation method for scientific research proposals: a case for NSFC. Scientometrics 2022. [DOI: 10.1007/s11192-022-04389-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
10
|
Network dynamics in university-industry collaboration: a collaboration-knowledge dual-layer network perspective. Scientometrics 2022. [DOI: 10.1007/s11192-022-04330-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
11
|
|
12
|
Identification of topic evolution: network analytics with piecewise linear representation and word embedding. Scientometrics 2022. [DOI: 10.1007/s11192-022-04273-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
13
|
Jin Q, Chen H, Wang X, Ma T, Xiong F. Exploring funding patterns with word embedding-enhanced organization–topic networks: a case study on big data. Scientometrics 2022. [DOI: 10.1007/s11192-021-04253-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
14
|
How do people view COVID-19 vaccines- Analyses on tweets about COVID-19 vaccines using Natural Language Processing and Sentiment Analysis. JOURNAL OF GLOBAL INFORMATION MANAGEMENT 2022. [DOI: 10.4018/jgim.300817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
COVID-19 pandemic has been the most devastating public health crisis in the recent decade and vaccination is anticipated as the means to terminate the pandemic. People's views and feelings over COVID-19 vaccines determine the success of vaccination. This study was set to investigate sentiments and common topics about COVID-19 vaccines by machine learning sentiment and topic analyses with natural language processing on massive tweets data. Findings revealed that concern on COVID-19 vaccine grew alongside the introduction and start of vaccination programs. Overall positive sentiments and emotions were greater than negative ones. Common topics include vaccine development for progression, effectiveness, safety, availability, sharing of vaccines received and updates on pandemics and government policies. Outcomes suggested the current atmosphere and its focus over the COVID-19 vaccine issue for the public health sector and policymakers for better decision-making. Evaluations on analytical methods were performed additionally.
Collapse
|
15
|
Yoon B, Kim S, Kim S, Seol H. Doc2vec-based link prediction approach using SAO structures: application to patent network. Scientometrics 2021. [DOI: 10.1007/s11192-021-04187-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
16
|
Zhang Y, Wu M, Miao W, Huang L, Lu J. Bi-layer network analytics: A methodology for characterizing emerging general-purpose technologies. J Informetr 2021. [DOI: 10.1016/j.joi.2021.101202] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
17
|
Xie Q, Zhang X, Song M. A network embedding-based scholar assessment indicator considering four facets: Research topic, author credit allocation, field-normalized journal impact, and published time. J Informetr 2021. [DOI: 10.1016/j.joi.2021.101201] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
18
|
|
19
|
|
20
|
Zhao D, Strotmann A. Intellectual structure of information science 2011–2020: an author co-citation analysis. JOURNAL OF DOCUMENTATION 2021. [DOI: 10.1108/jd-06-2021-0119] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeThis study continues a long history of author co-citation analysis of the intellectual structure of information science into the time period of 2011–2020. It also examines changes in this structure from 2006–2010 through 2011–2015 to 2016–2020. Results will contribute to a better understanding of the information science research field.Design/methodology/approachThe well-established procedures and techniques for author co-citation analysis were followed. Full records of research articles in core information science journals published during 2011–2020 were retrieved and downloaded from the Web of Science database. About 150 most highly cited authors in each of the two five-year time periods were selected from this dataset to represent this field, and their co-citation counts were calculated. Each co-citation matrix was input into SPSS for factor analysis, and results were visualized in Pajek. Factors were interpreted as specialties and labeled upon an examination of articles written by authors who load primarily on each factor.FindingsThe two-camp structure of information science continued to be present clearly. Bibliometric indicators for research evaluation dominated the Knowledge Domain Analysis camp during both fivr-year time periods, whereas interactive information retrieval (IR) dominated the IR camp during 2011–2015 but shared dominance with information behavior during 2016–2020. Bridging between the two camps became increasingly weaker and was only provided by the scholarly communication specialty during 2016–2020. The IR systems specialty drifted further away from the IR camp. The information behavior specialty experienced a deep slump during 2011–2020 in its evolution process. Altmetrics grew to dominate the Webometrics specialty and brought it to a sharp increase during 2016–2020.Originality/valueAuthor co-citation analysis (ACA) is effective in revealing intellectual structures of research fields. Most related studies used term-based methods to identify individual research topics but did not examine the interrelationships between these topics or the overall structure of the field. The few studies that did discuss the overall structure paid little attention to the effect of changes to the source journals on the results. The present study does not have these problems and continues the long history of benchmark contributions to a better understanding of the information science field using ACA.
Collapse
|
21
|
A Topic Detection Method Based on Word-attention Networks. JOURNAL OF DATA AND INFORMATION SCIENCE 2021. [DOI: 10.2478/jdis-2021-0032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Abstract
Purpose
We proposed a method to represent scientific papers by a complex network, which combines the approaches of neural and complex networks.
Design/methodology/approach
Its novelty is representing a paper by a word branch, which carries the sequential structure of words in sentences. The branches are generated by the attention mechanism in deep learning models. We connected those branches at the positions of their common words to generate networks, called word-attention networks, and then detect their communities, defined as topics.
Findings
Those detected topics can carry the sequential structure of words in sentences, represent the intra- and inter-sentential dependencies among words, and reveal the roles of words playing in them by network indexes.
Research limitations
The parameter setting of our method may depend on practical data. Thus it needs human experience to find proper settings.
Practical implications
Our method is applied to the papers of the PNAS, where the discipline designations provided by authors are used as the golden labels of papers’ topics.
Originality/value
This empirical study shows that the proposed method outperforms the Latent Dirichlet Allocation and is more stable.
Collapse
|
22
|
CSO Classifier 3.0: a scalable unsupervised method for classifying documents in terms of research topics. INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES 2021. [DOI: 10.1007/s00799-021-00305-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
AbstractClassifying scientific articles, patents, and other documents according to the relevant research topics is an important task, which enables a variety of functionalities, such as categorising documents in digital libraries, monitoring and predicting research trends, and recommending papers relevant to one or more topics. In this paper, we present the latest version of the CSO Classifier (v3.0), an unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive taxonomy of research areas in the field of Computer Science. The CSO Classifier takes as input the textual components of a research paper (usually title, abstract, and keywords) and returns a set of research topics drawn from the ontology. This new version includes a new component for discarding outlier topics and offers improved scalability. We evaluated the CSO Classifier on a gold standard of manually annotated articles, demonstrating a significant improvement over alternative methods. We also present an overview of applications adopting the CSO Classifier and describe how it can be adapted to other fields.
Collapse
|
23
|
Embedding-based Detection and Extraction of Research Topics from Academic Documents Using Deep Clustering. JOURNAL OF DATA AND INFORMATION SCIENCE 2021. [DOI: 10.2478/jdis-2021-0024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Abstract
Purpose
Detection of research fields or topics and understanding the dynamics help the scientific community in their decisions regarding the establishment of scientific fields. This also helps in having a better collaboration with governments and businesses. This study aims to investigate the development of research fields over time, translating it into a topic detection problem.
Design/methodology/approach
To achieve the objectives, we propose a modified deep clustering method to detect research trends from the abstracts and titles of academic documents. Document embedding approaches are utilized to transform documents into vector-based representations. The proposed method is evaluated by comparing it with a combination of different embedding and clustering approaches and the classical topic modeling algorithms (i.e. LDA) against a benchmark dataset. A case study is also conducted exploring the evolution of Artificial Intelligence (AI) detecting the research topics or sub-fields in related AI publications.
Findings
Evaluating the performance of the proposed method using clustering performance indicators reflects that our proposed method outperforms similar approaches against the benchmark dataset. Using the proposed method, we also show how the topics have evolved in the period of the recent 30 years, taking advantage of a keyword extraction method for cluster tagging and labeling, demonstrating the context of the topics.
Research limitations
We noticed that it is not possible to generalize one solution for all downstream tasks. Hence, it is required to fine-tune or optimize the solutions for each task and even datasets. In addition, interpretation of cluster labels can be subjective and vary based on the readers’ opinions. It is also very difficult to evaluate the labeling techniques, rendering the explanation of the clusters further limited.
Practical implications
As demonstrated in the case study, we show that in a real-world example, how the proposed method would enable the researchers and reviewers of the academic research to detect, summarize, analyze, and visualize research topics from decades of academic documents. This helps the scientific community and all related organizations in fast and effective analysis of the fields, by establishing and explaining the topics.
Originality/value
In this study, we introduce a modified and tuned deep embedding clustering coupled with Doc2Vec representations for topic extraction. We also use a concept extraction method as a labeling approach in this study. The effectiveness of the method has been evaluated in a case study of AI publications, where we analyze the AI topics during the past three decades.
Collapse
|
24
|
A deep-learning based citation count prediction model with paper metadata semantic features. Scientometrics 2021. [DOI: 10.1007/s11192-021-04033-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
25
|
Zhang Y, Wu M, Tian GY, Zhang G, Lu J. Ethics and privacy of artificial intelligence: Understandings from bibliometrics. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106994] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
26
|
Zhang Y, Wu M, Hu Z, Ward R, Zhang X, Porter A. Profiling and predicting the problem-solving patterns in
China’s research systems: A methodology of intelligent bibliometrics and
empirical insights. QUANTITATIVE SCIENCE STUDIES 2021. [DOI: 10.1162/qss_a_00100] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Abstract
Uncovering the driving forces, strategic landscapes, and evolutionary mechanisms of China’s research systems is attracting rising interest around the globe. One topic of interest is to understand the problem-solving patterns in China’s research systems now and in the future. Targeting a set of high-quality research articles published by Chinese researchers between 2009 and 2018, and indexed in the Essential Science Indicators database, we developed an intelligent bibliometrics-based methodology for identifying the problem-solving patterns from scientific documents. Specifically, science overlay maps incorporating link prediction were used to profile China’s disciplinary interactions and predict potential cross-disciplinary innovation at a macro level. We proposed a function incorporating word embedding techniques to represent subjects, actions, and objects (SAO) retrieved from combined titles and abstracts into vectors and constructed a tri-layer SAO network to visualize SAOs and their semantic relationships. Then, at a micro level, we developed network analytics for identifying problems and solutions from the SAO network, and recommending potential solutions for existing problems. Empirical insights derived from this study provide clues to understand China’s research strengths and the science policies underlying them, along with the key research problems and solutions that Chinese researchers are focusing on now and might pursue in the future.
Collapse
Affiliation(s)
- Yi Zhang
- Australian Artificial Intelligence Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, Australia
| | - Mengjia Wu
- Australian Artificial Intelligence Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, Australia
| | - Zhengyin Hu
- Chengdu Library and Information Centre, Chinese Academy of Sciences, China
| | - Robert Ward
- Program in Science, Technology & Innovation Policy (STIP), Georgia Institute of Technology, USA
| | - Xue Zhang
- Chengdu Library and Information Centre, Chinese Academy of Sciences, China
| | - Alan Porter
- Program in Science, Technology & Innovation Policy (STIP), Georgia Institute of Technology, USA
- Search Technology, Inc., USA
| |
Collapse
|
27
|
Zhang Y, Cai X, Fry CV, Wu M, Wagner CS. Topic evolution, disruption and resilience in early COVID-19 research. Scientometrics 2021; 126:4225-4253. [PMID: 33776163 PMCID: PMC7980735 DOI: 10.1007/s11192-021-03946-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 03/05/2021] [Indexed: 11/25/2022]
Abstract
The COVID-19 pandemic presented a challenge to the global research community as scientists rushed to find solutions to the devastating crisis. Drawing expectations from resilience theory, this paper explores how the trajectory of and research community around the coronavirus research was affected by the COVID-19 pandemic. Characterizing epistemic clusters and pathways of knowledge through extracting terms featured in articles in early COVID-19 research, combined with evolutionary pathways and statistical analysis, the results reveal that the pandemic disrupted existing lines of coronavirus research to a large degree. While some communities of coronavirus research are similar pre- and during COVID-19, topics themselves change significantly and there is less cohesion amongst early COVID-19 research compared to that before the pandemic. We find that some lines of research revert to basic research pursued almost a decade earlier, whilst others pursue brand new trajectories. The epidemiology topic is the most resilient among the many subjects related to COVID-19 research. Chinese researchers in particular appear to be driving more novel research approaches in the early months of the pandemic. The findings raise questions about whether shifts are advantageous for global scientific progress, and whether the research community will return to the original equilibrium or reorganize into a different knowledge configuration.
Collapse
Affiliation(s)
- Yi Zhang
- Australian Artificial Intelligence Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW 2007 Australia
| | - Xiaojing Cai
- School of Public Affairs, Zhejiang University, Hangzhou, 310058 Zhejiang China
- John Glenn College of Public Affairs, The Ohio State University, Columbus, OH 43210 USA
| | - Caroline V. Fry
- University of Hawai’i At Manoa Shidler College of Business, Honolulu, USA
| | - Mengjia Wu
- Australian Artificial Intelligence Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW 2007 Australia
| | - Caroline S. Wagner
- John Glenn College of Public Affairs, The Ohio State University, Columbus, OH 43210 USA
| |
Collapse
|
28
|
Chowdhury K. Functional analysis of generalized linear models under non-linear constraints with applications to identifying highly-cited papers. J Informetr 2021. [DOI: 10.1016/j.joi.2020.101112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
29
|
A deep learning framework to early identify emerging technologies in large-scale outlier patents: an empirical study of CNC machine tool. Scientometrics 2021. [DOI: 10.1007/s11192-020-03797-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
30
|
|
31
|
Huangfu C, Zeng Y, Wang Y. Creating Neuroscientific Knowledge Organization System Based on Word Representation and Agglomerative Clustering Algorithm. Front Neuroinform 2020; 14:38. [PMID: 33013345 PMCID: PMC7461893 DOI: 10.3389/fninf.2020.00038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 07/17/2020] [Indexed: 11/24/2022] Open
Abstract
The literature on neuroscience has grown rapidly in recent years with the emergence of new domains of research. In the context of this progress, creating a knowledge organization system (KOS) that can quickly incorporate terms of a given domain is an important aim in the area. In this article, we develop a systematic method based on word representation and the agglomerative clustering algorithm to semi-automatically build a hierarchical KOS. We collected 35,832 research keywords and 11,497 research methods from PubMed Central database, and organized them in a hierarchical structure according to semantic distance. We show that the proposed KOS can help find terms related to the given topics, analyze articles related to specific domains of research, and characterize the features of article clusters. The proposed method can significantly reduce the manual work required by experts to organize the KOS.
Collapse
Affiliation(s)
- Cunqing Huangfu
- Research Center for Brain-Inspired Intelligence, Institute of Automation, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Yi Zeng
- Research Center for Brain-Inspired Intelligence, Institute of Automation, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China.,Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China.,National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Yuwei Wang
- Research Center for Brain-Inspired Intelligence, Institute of Automation, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
32
|
Identification of highly-cited papers using topic-model-based and bibliometric features: the consideration of keyword popularity. J Informetr 2020. [DOI: 10.1016/j.joi.2019.101004] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
33
|
Zhou Y, Dong F, Liu Y, Li Z, Du J, Zhang L. Forecasting emerging technologies using data augmentation and deep learning. Scientometrics 2020. [DOI: 10.1007/s11192-020-03351-6] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
AbstractDeep learning can be used to forecast emerging technologies based on patent data. However, it requires a large amount of labeled patent data as a training set, which is difficult to obtain due to various constraints. This study proposes a novel approach that integrates data augmentation and deep learning methods, which overcome the problem of lacking training samples when applying deep learning to forecast emerging technologies. First, a sample data set was constructed using Gartner’s hype cycle and multiple patent features. Second, a generative adversarial network was used to generate many synthetic samples (data augmentation) to expand the scale of the sample data set. Finally, a deep neural network classifier was trained with the augmented data set to forecast emerging technologies, and it could predict up to 77% of the emerging technologies in a given year with high precision. This approach was used to forecast emerging technologies in Gartner’s hype cycles for 2017 based on patent data from 2000 to 2016. Four out of six of the emerging technologies were forecasted correctly, showing the accuracy and precision of the proposed approach. This approach enables deep learning to forecast emerging technologies with limited training samples.
Collapse
|
34
|
|
35
|
Abstract
Purpose
The purpose of this paper is to trace the knowledge diffusion patterns between the publications of top journals of computer science and physics to uncover the knowledge diffusion trends.
Design/methodology/approach
The degree of information flow between the disciplines is a measure of entropy and received citations. The entropy gives the uncertainty in the citation distribution of a journal; the more a journal is involved in spreading information or affected by other journals, its entropy increases. The citations from outside category give the degree of inter-disciplinarity index as the percentage of references made to papers of another discipline. In this study, the topic-related diffusion across computer science and physics scholarly communication network is studied to examine how the same research topic is studied and shared across disciplines.
Findings
For three indicators, Shannon entropy, citations outside category (COC) and research keywords, a global view of information flow at the journal level between both disciplines is obtained. It is observed that computer science mostly cites knowledge published in physics journals as compared to physics journals that cite knowledge within the field.
Originality/value
To the best of the authors’ knowledge, this is the first study that traces knowledge diffusion trends between computer science and physics publications at journal level using entropy, COC and research keywords.
Collapse
|