1
|
Anetta K. Understanding Health Records in West Slavic Languages: Available Resources, Case Study in Oncology. Stud Health Technol Inform 2023; 305:97-101. [PMID: 37386967 DOI: 10.3233/shti230433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Currently, there is very little research aimed at developing medical knowledge extraction tools for major West Slavic languages (Czech, Polish, and Slovak). This project lays the groundwork for a general medical knowledge extraction pipeline, introducing the resource vocabularies available for the respective languages (UMLS resources, ICD-10 translations and national drug databases). It demonstrates the utility of this approach on a case study using a large proprietary corpus of Czech oncology records consisting of more than 40 million words written about more than 4,000 patients. After correlating MedDRA terms found in patients' records with drugs prescribed to them, significant non-obvious associations were found between selected medical conditions being mentioned and the probability of certain drugs being prescribed over the course of the patient's treatment, in some cases increasing the probability of prescriptions by over 250%. This direction of research, producing large amounts of annotated data, is a prerequisite for training deep learning models and predictive systems.
Collapse
Affiliation(s)
- Kristof Anetta
- NLP Centre, Faculty of Informatics, Masaryk University Brno, Czech Republic
| |
Collapse
|
2
|
Wang Y, Fan R, Liang X, Li P, Hei X. Trusted Data Storage Architecture for National Infrastructure. Sensors (Basel) 2022; 22:2318. [PMID: 35336486 PMCID: PMC8955838 DOI: 10.3390/s22062318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 03/15/2022] [Accepted: 03/15/2022] [Indexed: 06/14/2023]
Abstract
National infrastructure is a material engineering facility that provides public services for social production and residents' lives, and a large-scale complex device or system is used to ensure normal social and economic activities. Due to the problems of difficult data collection, long project period, complex data, poor security, difficult traceability and data intercommunication, the archives management of most national infrastructure is still in the pre-information era. To solve these problems, this paper proposes a trusted data storage architecture for national infrastructure based on blockchain. This consists of real-time collection of national infrastructure construction data through sensors and other Internet of Things devices, conversion of heterogeneous data source data into a unified format according to specific business flows, and timely storage of data in the blockchain to ensure data security and persistence. Knowledge extraction of data stored in the chain and the data of multiple regions or fields are jointly modeled through federal learning. The parameters and results are stored in the chain, and the information of each node is shared to solve the problem of data intercommunication.
Collapse
Affiliation(s)
- Yichuan Wang
- School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China; (Y.W.); (R.F.); (X.L.); (P.L.)
- Shaanxi Key Laboratory for Network Computing and Security Technology, Xi’an 710048, China
| | - Rui Fan
- School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China; (Y.W.); (R.F.); (X.L.); (P.L.)
- Shaanxi Key Laboratory for Network Computing and Security Technology, Xi’an 710048, China
| | - Xiaolong Liang
- School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China; (Y.W.); (R.F.); (X.L.); (P.L.)
- Shaanxi Key Laboratory for Network Computing and Security Technology, Xi’an 710048, China
| | - Pengge Li
- School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China; (Y.W.); (R.F.); (X.L.); (P.L.)
- Shaanxi Key Laboratory for Network Computing and Security Technology, Xi’an 710048, China
| | - Xinhong Hei
- School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China; (Y.W.); (R.F.); (X.L.); (P.L.)
- Shaanxi Key Laboratory for Network Computing and Security Technology, Xi’an 710048, China
| |
Collapse
|
3
|
Abstract
In the smart grid era, the number of data available for different applications has increased considerably. However, data could not perfectly represent the phenomenon or process under analysis, so their usability requires a preliminary validation carried out by experts of the specific domain. The process of data gathering and transmission over the communication channels has to be verified to ensure that data are provided in a useful format, and that no external effect has impacted on the correct data to be received. Consistency of the data coming from different sources (in terms of timings and data resolution) has to be ensured and managed appropriately. Suitable procedures are needed for transforming data into knowledge in an effective way. This contribution addresses the previous aspects by highlighting a number of potential issues and the solutions in place in different power and energy system, including the generation, grid and user sides. Recent references, as well as selected historical references, are listed to support the illustration of the conceptual aspects.
Collapse
Affiliation(s)
- Gianfranco Chicco
- Dipartimento Energia "Galileo Ferraris," Politecnico di Torino, Torino, Italy
| |
Collapse
|
4
|
Vrochidis S, Moumtzidou A, Gialampoukidis I, Liparas D, Casamayor G, Wanner L, Heise N, Wagner T, Bilous A, Jamin E, Simeonov B, Alexiev V, Busch R, Arapakis I, Kompatsiaris I. A Multimodal Analytics Platform for Journalists Analyzing Large-Scale, Heterogeneous Multilingual, and Multimedia Content. Front Robot AI 2018; 5:123. [PMID: 33501002 PMCID: PMC7805659 DOI: 10.3389/frobt.2018.00123] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 10/03/2018] [Indexed: 11/13/2022] Open
Abstract
Analysts and journalists face the problem of having to deal with very large, heterogeneous, and multilingual data volumes that need to be analyzed, understood, and aggregated. Automated and simplified editorial and authoring process could significantly reduce time, labor, and costs. Therefore, there is a need for unified access to multilingual and multicultural news story material, beyond the level of a nation, ensuring context-aware, spatiotemporal, and semantic interpretation, correlating also and summarizing the interpreted material into a coherent gist. In this paper, we present a platform integrating multimodal analytics techniques, which are able to support journalists in handling large streams of real-time and diverse information. Specifically, the platform automatically crawls and indexes multilingual and multimedia information from heterogeneous resources. Textual information is automatically summarized and can be translated (on demand) into the language of the journalist. High-level information is extracted from both textual and multimedia content for fast inspection using concept clouds. The textual and multimedia content is semantically integrated and indexed using a common representation, to be accessible through a web-based search engine. The evaluation of the proposed platform was performed by several groups of journalists revealing satisfaction from the user side.
Collapse
Affiliation(s)
- Stefanos Vrochidis
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Anastasia Moumtzidou
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Ilias Gialampoukidis
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Dimitris Liparas
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece.,High Performance Computing Centre, University of Stuttgart, Stuttgart, Germany
| | - Gerard Casamayor
- Department of Information and Communication Technologies, Pompeu Fabra University, Barcelona, Spain
| | - Leo Wanner
- Department of Information and Communication Technologies, Pompeu Fabra University, Barcelona, Spain.,Catalan Institute for Research and Advanced Studies, Barcelona, Spain
| | | | | | | | | | | | | | | | | | - Ioannis Kompatsiaris
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
| |
Collapse
|
5
|
Kothari C, Wack M, Hassen‐Khodja C, Finan S, Savova G, O'Boyle M, Bliss G, Cornell A, Horn EJ, Davis R, Jacobs J, Kohane I, Avillach P. Phelan-McDermid syndrome data network: Integrating patient reported outcomes with clinical notes and curated genetic reports. Am J Med Genet B Neuropsychiatr Genet 2018; 177:613-624. [PMID: 28862395 PMCID: PMC5832521 DOI: 10.1002/ajmg.b.32579] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Accepted: 07/18/2017] [Indexed: 01/29/2023]
Abstract
The heterogeneity of patient phenotype data are an impediment to the research into the origins and progression of neuropsychiatric disorders. This difficulty is compounded in the case of rare disorders such as Phelan-McDermid Syndrome (PMS) by the paucity of patient clinical data. PMS is a rare syndromic genetic cause of autism and intellectual deficiency. In this paper, we describe the Phelan-McDermid Syndrome Data Network (PMS_DN), a platform that facilitates research into phenotype-genotype correlation and progression of PMS by: a) integrating knowledge of patient phenotypes extracted from Patient Reported Outcomes (PRO) data and clinical notes-two heterogeneous, underutilized sources of knowledge about patient phenotypes-with curated genetic information from the same patient cohort and b) making this integrated knowledge, along with a suite of statistical tools, available free of charge to authorized investigators on a Web portal https://pmsdn.hms.harvard.edu. PMS_DN is a Patient Centric Outcomes Research Initiative (PCORI) where patients and their families are involved in all aspects of the management of patient data in driving research into PMS. To foster collaborative research, PMS_DN also makes patient aggregates from this knowledge available to authorized investigators using distributed research networks such as the PCORnet PopMedNet. PMS_DN is hosted on a scalable cloud based environment and complies with all patient data privacy regulations. As of October 31, 2016, PMS_DN integrates high-quality knowledge extracted from the clinical notes of 112 patients and curated genetic reports of 176 patients with preprocessed PRO data from 415 patients.
Collapse
Affiliation(s)
- Cartik Kothari
- Department of Biomedical InformaticsHarvard Medical SchoolBostonMassachusetts
| | - Maxime Wack
- Department of Biomedical InformaticsHarvard Medical SchoolBostonMassachusetts
| | | | - Sean Finan
- Boston Children's HospitalBostonMassachusetts
| | | | | | | | | | | | | | | | - Isaac Kohane
- Department of Biomedical InformaticsHarvard Medical SchoolBostonMassachusetts
| | - Paul Avillach
- Department of Biomedical InformaticsHarvard Medical SchoolBostonMassachusetts
| |
Collapse
|
6
|
Onishi T, Kadohira T, Watanabe I. Relation extraction with weakly supervised learning based on process-structure-property-performance reciprocity. Sci Technol Adv Mater 2018; 19:649-659. [PMID: 30245757 PMCID: PMC6147111 DOI: 10.1080/14686996.2018.1500852] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Revised: 07/12/2018] [Accepted: 07/12/2018] [Indexed: 06/08/2023]
Abstract
In this study, we develop a computer-aided material design system to represent and extract knowledge related to material design from natural language texts. A machine learning model is trained on a text corpus weakly labeled by minimal annotated relationship data (~100 labeled relationships) to extract knowledge from scientific articles. The knowledge is represented by relationships between scientific concepts, such as {annealing, grain size, strength}. The extracted relationships are represented as a knowledge graph formatted according to design charts, inspired by the process-structure-property-performance (PSPP) reciprocity. The design chart provides an intuitive effect of processes on properties and prospective processes to achieve the certain desired properties. Our system semantically searches the scientific literature and provides knowledge in the form of a design chart, and we hope it contributes more efficient developments of new materials.
Collapse
Affiliation(s)
- Takeshi Onishi
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - Takuya Kadohira
- Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, Tsukuba, Ibaraki, Japan
| | - Ikumu Watanabe
- Research Center for Structural Materials, National Institute for Materials Science, Ibaraki, Tsukuba, Japan
| |
Collapse
|
7
|
Mertens S, Gailly F, Poels G. Discovering health-care processes using DeciClareMiner. Health Syst (Basingstoke) 2017; 7:195-211. [PMID: 31214348 DOI: 10.1080/20476965.2017.1405876] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Revised: 10/09/2017] [Accepted: 11/08/2017] [Indexed: 10/27/2022] Open
Abstract
Flexible, human-centric and knowledge-intensive processes occur in many service industries and are prominent in the health-care sector. Knowledge workers (e.g., doctors or other health-care personnel) are given the flexibility to address each process instance (i.e., episode of care) in the way that they deem most suitable. As a result, the knowledge of these processes is generally of a tacit nature, with many stakeholders lacking a clear view of a process. In this paper, we propose an algorithm called DeciClareMiner that combines process and decision mining to extract a process model and the corresponding knowledge from past executions of these processes. The algorithm was evaluated by applying it to a realistic health-care case and comparing the results to a complete search benchmark. In a relatively short time (10 min), DeciClareMiner was able to produce a DeciClare model that represents 93% of episodes of care with atomic constraints. Compared to the 50 h required to calculate the 100%-episode model via an exhaustive search approach, our result is considered a major improvement.
Collapse
Affiliation(s)
- Steven Mertens
- Faculty of Economics and Business Administration, Department of Business Informatics and Operations Management, Ghent University, Ghent, Belgium
| | - Frederik Gailly
- Faculty of Economics and Business Administration, Department of Business Informatics and Operations Management, Ghent University, Ghent, Belgium
| | - Geert Poels
- Faculty of Economics and Business Administration, Department of Business Informatics and Operations Management, Ghent University, Ghent, Belgium
| |
Collapse
|
8
|
Dong K, Wu W, Ye H, Yang M, Ling Z, Yu W. Canoe: An Autonomous Infrastructure-Free Indoor Navigation System. Sensors (Basel) 2017; 17:s17050996. [PMID: 28468291 PMCID: PMC5469349 DOI: 10.3390/s17050996] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Revised: 04/24/2017] [Accepted: 04/25/2017] [Indexed: 11/16/2022]
Abstract
The development of the Internet of Things (IoT) has accelerated research in indoor navigation systems, a majority of which rely on adequate wireless signals and sources. Nonetheless, deploying such a system requires periodic site-survey, which is time consuming and labor intensive. To address this issue, in this paper we present Canoe, an indoor navigation system that considers shopping mall scenarios. In our system, we do not assume any prior knowledge, such as floor-plan or the shop locations, access point placement or power settings, historical RSS measurements or fingerprints, etc. Instead, Canoe requires only that the shop owners collect and publish RSS values at the entrances of their shops and can direct a consumer to any of these shops by comparing the observed RSS values. The locations of the consumers and the shops are estimated using maximum likelihood estimation. In doing this, the direction of the target shop relative to the current orientation of the consumer can be precisely computed, such that the direction that a consumer should move can be determined. We have conducted extensive simulations using a real-world dataset. Our experiments in a real shopping mall demonstrate that if 50% of the shops publish their RSS values, Canoe can precisely navigate a consumer within 30 s, with an error rate below 9%.
Collapse
Affiliation(s)
- Kai Dong
- School of Computer Science and Engineering, Southeast University, Nangjing 211189, China.
| | - Wenjia Wu
- School of Computer Science and Engineering, Southeast University, Nangjing 211189, China.
| | - Haibo Ye
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nangjing 210016, China.
| | - Ming Yang
- School of Computer Science and Engineering, Southeast University, Nangjing 211189, China.
| | - Zhen Ling
- School of Computer Science and Engineering, Southeast University, Nangjing 211189, China.
| | - Wei Yu
- Department of Computer and Information Sciences, Towson University, Towson MD 21252, USA.
| |
Collapse
|
9
|
Rodriguez LM, Fushman DD. Automatic Classification of Structured Product Labels for Pregnancy Risk Drug Categories, a Machine Learning Approach. AMIA Annu Symp Proc 2015; 2015:1093-1102. [PMID: 26958248 PMCID: PMC4765680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
With regular expressions and manual review, 18,342 FDA-approved drug product labels were processed to determine if the five standard pregnancy drug risk categories were mentioned in the label. After excluding 81 drugs with multiple-risk categories, 83% of the labels had a risk category within the text and 17% labels did not. We trained a Sequential Minimal Optimization algorithm on the labels containing pregnancy risk information segmented into standard document sections. For the evaluation of the classifier on the testing set, we used the Micromedex drug risk categories. The precautions section had the best performance for assigning drug risk categories, achieving Accuracy 0.79, Precision 0.66, Recall 0.64 and F1 measure 0.65. Missing pregnancy risk categories could be suggested using machine learning algorithms trained on the existing publicly available pregnancy risk information.
Collapse
Affiliation(s)
- Laritza M Rodriguez
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda, MD
| | - Dina Demner Fushman
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda, MD
| |
Collapse
|
10
|
Roos M, Marshall MS, Gibson AP, Schuemie M, Meij E, Katrenko S, van Hage WR, Krommydas K, Adriaans PW. Structuring and extracting knowledge for the support of hypothesis generation in molecular biology. BMC Bioinformatics 2009; 10 Suppl 10:S9. [PMID: 19796406 PMCID: PMC2755830 DOI: 10.1186/1471-2105-10-s10-s9] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Hypothesis generation in molecular and cellular biology is an empirical process in which knowledge derived from prior experiments is distilled into a comprehensible model. The requirement of automated support is exemplified by the difficulty of considering all relevant facts that are contained in the millions of documents available from PubMed. Semantic Web provides tools for sharing prior knowledge, while information retrieval and information extraction techniques enable its extraction from literature. Their combination makes prior knowledge available for computational analysis and inference. While some tools provide complete solutions that limit the control over the modeling and extraction processes, we seek a methodology that supports control by the experimenter over these critical processes. RESULTS We describe progress towards automated support for the generation of biomolecular hypotheses. Semantic Web technologies are used to structure and store knowledge, while a workflow extracts knowledge from text. We designed minimal proto-ontologies in OWL for capturing different aspects of a text mining experiment: the biological hypothesis, text and documents, text mining, and workflow provenance. The models fit a methodology that allows focus on the requirements of a single experiment while supporting reuse and posterior analysis of extracted knowledge from multiple experiments. Our workflow is composed of services from the 'Adaptive Information Disclosure Application' (AIDA) toolkit as well as a few others. The output is a semantic model with putative biological relations, with each relation linked to the corresponding evidence. CONCLUSION We demonstrated a 'do-it-yourself' approach for structuring and extracting knowledge in the context of experimental research on biomolecular mechanisms. The methodology can be used to bootstrap the construction of semantically rich biological models using the results of knowledge extraction processes. Models specific to particular experiments can be constructed that, in turn, link with other semantic models, creating a web of knowledge that spans experiments. Mapping mechanisms can link to other knowledge resources such as OBO ontologies or SKOS vocabularies. AIDA Web Services can be used to design personalized knowledge extraction procedures. In our example experiment, we found three proteins (NF-Kappa B, p21, and Bax) potentially playing a role in the interplay between nutrients and epigenetic gene regulation.
Collapse
Affiliation(s)
- Marco Roos
- grid.7177.60000000084992262Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ The Netherlands
| | - M Scott Marshall
- grid.7177.60000000084992262Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ The Netherlands
| | - Andrew P Gibson
- grid.7177.60000000084992262Swammerdam Institute for Life Science, University of Amsterdam, Amsterdam, 1018 WB The Netherlands
| | - Martijn Schuemie
- grid.6906.90000000092621349BioSemantics group, Erasmus University of Rotterdam, Rotterdam, 3000 DR The Netherlands
| | - Edgar Meij
- grid.7177.60000000084992262Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ The Netherlands
| | - Sophia Katrenko
- grid.7177.60000000084992262Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ The Netherlands
| | - Willem Robert van Hage
- grid.12380.380000000417549227Business Informatics, Faculty of Sciences, Vrije Universiteit, Amsterdam, 1081 HV The Netherlands
| | - Konstantinos Krommydas
- grid.7177.60000000084992262Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ The Netherlands
| | - Pieter W Adriaans
- grid.7177.60000000084992262Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ The Netherlands
| |
Collapse
|