1
|
Review on knowledge extraction from text and scope in agriculture domain. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10239-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
2
|
Facchinetti T, Benetti G, Giuffrida D, Nocera A. slr-kit: A semi-supervised machine learning framework for systematic literature reviews. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109266] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
3
|
Li P, Lu W, Cheng Q. Generating a related work section for scientific papers: an optimized approach with adopting problem and method information. Scientometrics 2022. [DOI: 10.1007/s11192-022-04458-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
4
|
Chen G, Peng J, Xu T, Xiao L. Extracting entity relations for “problem-solving” knowledge graph of scientific domains using word analogy. ASLIB J INFORM MANAG 2022. [DOI: 10.1108/ajim-03-2022-0129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeProblem-solving” is the most crucial key insight of scientific research. This study focuses on constructing the “problem-solving” knowledge graph of scientific domains by extracting four entity relation types: problem-solving, problem hierarchy, solution hierarchy and association.Design/methodology/approachThis paper presents a low-cost method for identifying these relationships in scientific papers based on word analogy. The problem-solving and hierarchical relations are represented as offset vectors of the head and tail entities and then classified by referencing a small set of predefined entity relations.FindingsThis paper presents an experiment with artificial intelligence papers from the Web of Science and achieved good performance. The F1 scores of entity relation types problem hierarchy, problem-solving and solution hierarchy, which were 0.823, 0.815 and 0.748, respectively. This paper used computer vision as an example to demonstrate the application of the extracted relations in constructing domain knowledge graphs and revealing historical research trends.Originality/valueThis paper uses an approach that is highly efficient and has a good generalization ability. Instead of relying on a large-scale manually annotated corpus, it only requires a small set of entity relations that can be easily extracted from external knowledge resources.
Collapse
|
5
|
GIL-LEIVA I, FUJITA MSL, REDIGOLO FM, SARAN JF. Extracción de información de documentos PDF para su uso en la indización automática de e-books. TRANSINFORMACAO 2022. [DOI: 10.1590/2318-0889202234e210069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Resumen El número de libros electrónicos que ingresan en las bibliotecas en formato PDF cada día es mayor, complicando y haciendo casi inviables algunos procesos realizados tradicionalmente de forma manual por los bibliotecarios, como es la asignación de materias. En este contexto, se hace necesario el diseño y desarrollo de aplicaciones que asistan a los bibliotecarios. Teniendo esto en consideración, presentamos en este trabajo la evaluación de herramientas de extracción de información de libros en PDF que podrían usarse posteriormente como materia prima para un sistema de indización automática. Para ello, realizamos una primera evaluación de cinco softwares (PDFMiner.six, PDFAct, PDF-extract, PDFExtract y Grobib) y, posteriormente, como PDFAct consiguió el mejor rendimiento, hicimos una segunda evaluación para averiguar su capacidad para identificar y extraer informaciones de los libros, tales como títulos, índices, secciones, títulos de tablas y gráficos y referencias bibliográficas, informaciones relevantes para cualquier sistema de indización. Se concluye que ninguna de las herramientas evaluadas extrae adecuadamente las diferentes partes de libros en PDF, si bien, PDFAct ha logrado un rendimiento superior al del resto.
Collapse
|
6
|
Evaluating BERT-based scientific relation classifiers for scholarly knowledge graph construction on digital library collections. INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES 2021. [DOI: 10.1007/s00799-021-00313-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
7
|
Huang K, Xiao C, Glass LM, Critchlow CW, Gibson G, Sun J. Machine learning applications for therapeutic tasks with genomics data. PATTERNS (NEW YORK, N.Y.) 2021; 2:100328. [PMID: 34693370 PMCID: PMC8515011 DOI: 10.1016/j.patter.2021.100328] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Thanks to the increasing availability of genomics and other biomedical data, many machine learning algorithms have been proposed for a wide range of therapeutic discovery and development tasks. In this survey, we review the literature on machine learning applications for genomics through the lens of therapeutic development. We investigate the interplay among genomics, compounds, proteins, electronic health records, cellular images, and clinical texts. We identify 22 machine learning in genomics applications that span the whole therapeutics pipeline, from discovering novel targets, personalizing medicine, developing gene-editing tools, all the way to facilitating clinical trials and post-market studies. We also pinpoint seven key challenges in this field with potentials for expansion and impact. This survey examines recent research at the intersection of machine learning, genomics, and therapeutic development.
Collapse
Affiliation(s)
- Kexin Huang
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Cao Xiao
- Amplitude, San Francisco, CA 94105, USA
| | - Lucas M. Glass
- Analytics Center of Excellence, IQVIA, Cambridge, MA 02139, USA
| | | | - Greg Gibson
- Center for Integrative Genomics, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Jimeng Sun
- Computer Science Department and Carle's Illinois College of Medicine, University of Illinois at Urbana-Champaign, Urbana, IL 61820, USA
| |
Collapse
|
8
|
Timakum T, Lee S, Song M. Exploring the research landscape of data warehousing and mining based on DaWaK Conference full-text articles. DATA KNOWL ENG 2021. [DOI: 10.1016/j.datak.2021.101926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
9
|
Zhang H, Hu D, Duan H, Li S, Wu N, Lu X. A novel deep learning approach to extract Chinese clinical entities for lung cancer screening and staging. BMC Med Inform Decis Mak 2021; 21:214. [PMID: 34330277 PMCID: PMC8323233 DOI: 10.1186/s12911-021-01575-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 07/07/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Computed tomography (CT) reports record a large volume of valuable information about patients' conditions and the interpretations of radiology images from radiologists, which can be used for clinical decision-making and further academic study. However, the free-text nature of clinical reports is a critical barrier to use this data more effectively. In this study, we investigate a novel deep learning method to extract entities from Chinese CT reports for lung cancer screening and TNM staging. METHODS The proposed approach presents a new named entity recognition algorithm, namely the BERT-based-BiLSTM-Transformer network (BERT-BTN) with pre-training, to extract clinical entities for lung cancer screening and staging. Specifically, instead of traditional word embedding methods, BERT is applied to learn the deep semantic representations of characters. Following the long short-term memory layer, a Transformer layer is added to capture the global dependencies between characters. Besides, pre-training technique is employed to alleviate the problem of insufficient labeled data. RESULTS We verify the effectiveness of the proposed approach on a clinical dataset containing 359 CT reports collected from the Department of Thoracic Surgery II of Peking University Cancer Hospital. The experimental results show that the proposed approach achieves an 85.96% macro-F1 score under exact match scheme, which improves the performance by 1.38%, 1.84%, 3.81%,4.29%,5.12%,5.29% and 8.84% compared to BERT-BTN, BERT-LSTM, BERT-fine-tune, BERT-Transformer, FastText-BTN, FastText-BiLSTM and FastText-Transformer, respectively. CONCLUSIONS In this study, we developed a novel deep learning method, i.e., BERT-BTN with pre-training, to extract the clinical entities from Chinese CT reports. The experimental results indicate that the proposed approach can efficiently recognize various clinical entities about lung cancer screening and staging, which shows the potential for further clinical decision-making and academic research.
Collapse
Affiliation(s)
- Huanyao Zhang
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Zheda Road, Hangzhou, China
- Key Laboratory for Biomedical Engineering, Ministry of Education, Zheda Road, Hangzhou, China
| | - Danqing Hu
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Zheda Road, Hangzhou, China
- Key Laboratory for Biomedical Engineering, Ministry of Education, Zheda Road, Hangzhou, China
| | - Huilong Duan
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Zheda Road, Hangzhou, China
- Key Laboratory for Biomedical Engineering, Ministry of Education, Zheda Road, Hangzhou, China
| | - Shaolei Li
- Department of Thoracic Surgery II, Peking University Cancer Hospital & Institute, Beijing, China
| | - Nan Wu
- Department of Thoracic Surgery II, Peking University Cancer Hospital & Institute, Beijing, China
| | - Xudong Lu
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Zheda Road, Hangzhou, China
- Key Laboratory for Biomedical Engineering, Ministry of Education, Zheda Road, Hangzhou, China
| |
Collapse
|
10
|
Brack A, Hoppe A, Stocker M, Auer S, Ewerth R. Analysing the requirements for an Open Research Knowledge Graph: use cases, quality requirements, and construction strategies. INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES 2021. [DOI: 10.1007/s00799-021-00306-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
AbstractCurrent science communication has a number of drawbacks and bottlenecks which have been subject of discussion lately: Among others, the rising number of published articles makes it nearly impossible to get a full overview of the state of the art in a certain field, or reproducibility is hampered by fixed-length, document-based publications which normally cannot cover all details of a research work. Recently, several initiatives have proposed knowledge graphs (KG) for organising scientific information as a solution to many of the current issues. The focus of these proposals is, however, usually restricted to very specific use cases. In this paper, we aim to transcend this limited perspective and present a comprehensive analysis of requirements for an Open Research Knowledge Graph (ORKG) by (a) collecting and reviewing daily core tasks of a scientist, (b) establishing their consequential requirements for a KG-based system, (c) identifying overlaps and specificities, and their coverage in current solutions. As a result, we map necessary and desirable requirements for successful KG-based science communication, derive implications, and outline possible solutions.
Collapse
|
11
|
Li S, Wang Q. A hybrid approach to recognize generic sections in scholarly documents. INT J DOC ANAL RECOG 2021. [DOI: 10.1007/s10032-021-00381-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
12
|
Dai Z, Hu K, Xie J, Shen S, Zheng J, Wu H, Guo Y. Bipartite Network of Interest (BNOI): Extending Co-Word Network with Interest of Researchers Using Sensor Data and Corresponding Applications as an Example. SENSORS 2021; 21:s21051668. [PMID: 33804324 PMCID: PMC7957500 DOI: 10.3390/s21051668] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 02/02/2021] [Accepted: 02/20/2021] [Indexed: 12/01/2022]
Abstract
Traditional co-word networks do not discriminate keywords of researcher interest from general keywords. Co-word networks are therefore often too general to provide knowledge if interest to domain experts. Inspired by the recent work that uses an automatic method to identify the questions of interest to researchers like “problems” and “solutions”, we try to answer a similar question “what sensors can be used for what kind of applications”, which is great interest in sensor- related fields. By generalizing the specific questions as “questions of interest”, we built a knowledge network considering researcher interest, called bipartite network of interest (BNOI). Different from a co-word approaches using accurate keywords from a list, BNOI uses classification models to find possible entities of interest. A total of nine feature extraction methods including N-grams, Word2Vec, BERT, etc. were used to extract features to train the classification models, including naïve Bayes (NB), support vector machines (SVM) and logistic regression (LR). In addition, a multi-feature fusion strategy and a voting principle (VP) method are applied to assemble the capability of the features and the classification models. Using the abstract text data of 350 remote sensing articles, features are extracted and the models trained. The experiment results show that after removing the biased words and using the ten-fold cross-validation method, the F-measure of “sensors” and “applications” are 93.2% and 85.5%, respectively. It is thus demonstrated that researcher questions of interest can be better answered by the constructed BNOI based on classification results, comparedwith the traditional co-word network approach.
Collapse
Affiliation(s)
- Zongming Dai
- Key Laboratory of Advanced Process Control for Light Industry, Ministry of Education, Jiangnan University, Wuxi 214122, China; (Z.D.); (J.X.); (Y.G.)
| | - Kai Hu
- Key Laboratory of Advanced Process Control for Light Industry, Ministry of Education, Jiangnan University, Wuxi 214122, China; (Z.D.); (J.X.); (Y.G.)
- Correspondence:
| | - Jie Xie
- Key Laboratory of Advanced Process Control for Light Industry, Ministry of Education, Jiangnan University, Wuxi 214122, China; (Z.D.); (J.X.); (Y.G.)
| | - Shengyu Shen
- Soil and Water Conservation Department, Yangtze River Scientific Research Institute, Wuhan 430010, China;
| | - Jie Zheng
- The State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China; (J.Z.); (H.W.)
| | - Huayi Wu
- The State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China; (J.Z.); (H.W.)
| | - Ya Guo
- Key Laboratory of Advanced Process Control for Light Industry, Ministry of Education, Jiangnan University, Wuxi 214122, China; (Z.D.); (J.X.); (Y.G.)
- Department of Bioengineering, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
13
|
Gholamzadeh M, Abtahi H, Safdari R. Suggesting a framework for preparedness against the pandemic outbreak based on medical informatics solutions: a thematic analysis. Int J Health Plann Manage 2021; 36:754-783. [PMID: 33502766 PMCID: PMC8014158 DOI: 10.1002/hpm.3106] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Revised: 11/05/2020] [Accepted: 12/14/2020] [Indexed: 11/15/2022] Open
Abstract
Background When an outbreak emerged, each country needs a coherent and preventive plan to deal with epidemics. In the era of technology, adopting informatics‐based solutions is essential. The main objective of this study is to propose a conceptual framework to provide a rapid and responsive surveillance system against pandemics. Methods A three‐step approach was employed in this research to develop a conceptual framework. These three steps comprise (1) literature review, (2) extracting and coding concepts, and determining main themes based on thematic analysis using ATLAS.ti® software, and (3) mapping concepts. Later, all of the results synthesized under expert consultation to design a conceptual framework based on the main themes and identified strategies related to medical informatics. Results In the literature review phase, 65 articles were identified as eligible studies for analysis. Through line by line coding in thematic analysis, more than 46 themes were extracted as potential foremost themes. Based on the key themes and strategies were employed by studies, the proposed framework designed in three main components. The most appropriate strategies that can be used in each section were identified based on the demands of each part and the available solutions. These solutions were employed in the final framework. Conclusion The presented model in this study can be the first step for a better understanding of the potential of medical informatics solutions in promoting epidemic disease management. It can be applied as a reference model for designing intelligent surveillance systems to prepare for probable future pandemics.
Collapse
Affiliation(s)
- Marsa Gholamzadeh
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Hamidreza Abtahi
- Pulmonary and Critical care Medicine Department, Thoracic Research Center, Imam Khomeini Hospital, Tehran University of Medical Sciences, Tehran, Iran
| | - Reza Safdari
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
14
|
unarXive: a large scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata. Scientometrics 2020. [DOI: 10.1007/s11192-020-03382-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
AbstractIn recent years, scholarly data sets have been used for various purposes, such as paper recommendation, citation recommendation, citation context analysis, and citation context-based document summarization. The evaluation of approaches to such tasks and their applicability in real-world scenarios heavily depend on the used data set. However, existing scholarly data sets are limited in several regards. In this paper, we propose a new data set based on all publications from all scientific disciplines available on arXiv.org. Apart from providing the papers’ plain text, in-text citations were annotated via global identifiers. Furthermore, citing and cited publications were linked to the Microsoft Academic Graph, providing access to rich metadata. Our data set consists of over one million documents and 29.2 million citation contexts. The data set, which is made freely available for research purposes, not only can enhance the future evaluation of research paper-based and citation context-based approaches, but also serve as a basis for new ways to analyze in-text citations, as we show prototypically in this article.
Collapse
|
15
|
Vega-Oliveros DA, Gomes PS, E. Milios E, Berton L. A multi-centrality index for graph-based keyword extraction. Inf Process Manag 2019. [DOI: 10.1016/j.ipm.2019.102063] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|