1
|
Callahan A, Polony V, Posada JD, Banda JM, Gombar S, Shah NH. ACE: the Advanced Cohort Engine for searching longitudinal patient records. J Am Med Inform Assoc 2021; 28:1468-1479. [PMID: 33712854 PMCID: PMC8279796 DOI: 10.1093/jamia/ocab027] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 02/23/2021] [Indexed: 01/02/2023] Open
Abstract
OBJECTIVE To propose a paradigm for a scalable time-aware clinical data search, and to describe the design, implementation and use of a search engine realizing this paradigm. MATERIALS AND METHODS The Advanced Cohort Engine (ACE) uses a temporal query language and in-memory datastore of patient objects to provide a fast, scalable, and expressive time-aware search. ACE accepts data in the Observational Medicine Outcomes Partnership Common Data Model, and is configurable to balance performance with compute cost. ACE's temporal query language supports automatic query expansion using clinical knowledge graphs. The ACE API can be used with R, Python, Java, HTTP, and a Web UI. RESULTS ACE offers an expressive query language for complex temporal search across many clinical data types with multiple output options. ACE enables electronic phenotyping and cohort-building with subsecond response times in searching the data of millions of patients for a variety of use cases. DISCUSSION ACE enables fast, time-aware search using a patient object-centric datastore, thereby overcoming many technical and design shortcomings of relational algebra-based querying. Integrating electronic phenotype development with cohort-building enables a variety of high-value uses for a learning health system. Tradeoffs include the need to learn a new query language and the technical setup burden. CONCLUSION ACE is a tool that combines a unique query language for time-aware search of longitudinal patient records with a patient object datastore for rapid electronic phenotyping, cohort extraction, and exploratory data analyses.
Collapse
Affiliation(s)
- Alison Callahan
- Center for Biomedical Informatics Research, School of Medicine, School of Medicine, Stanford University, Stanford, California, USA
| | - Vladimir Polony
- Center for Biomedical Informatics Research, School of Medicine, School of Medicine, Stanford University, Stanford, California, USA
| | - José D Posada
- Center for Biomedical Informatics Research, School of Medicine, School of Medicine, Stanford University, Stanford, California, USA
| | - Juan M Banda
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| | - Saurabh Gombar
- Department of Pathology, School of Medicine, Stanford University, Stanford, California, USA
| | - Nigam H Shah
- Center for Biomedical Informatics Research, School of Medicine, School of Medicine, Stanford University, Stanford, California, USA
| |
Collapse
|
2
|
Pedrera-Jiménez M, García-Barrio N, Cruz-Rojo J, Terriza-Torres AI, López-Jiménez EA, Calvo-Boyero F, Jiménez-Cerezo MJ, Blanco-Martínez AJ, Roig-Domínguez G, Cruz-Bermúdez JL, Bernal-Sobrino JL, Serrano-Balazote P, Muñoz-Carrero A. Obtaining EHR-derived datasets for COVID-19 research within a short time: a flexible methodology based on Detailed Clinical Models. J Biomed Inform 2021; 115:103697. [PMID: 33548541 PMCID: PMC7857038 DOI: 10.1016/j.jbi.2021.103697] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 12/18/2020] [Accepted: 02/01/2021] [Indexed: 10/27/2022]
Abstract
BACKGROUND COVID-19 ranks as the single largest health incident worldwide in decades. In such a scenario, electronic health records (EHRs) should provide a timely response to healthcare needs and to data uses that go beyond direct medical care and are known as secondary uses, which include biomedical research. However, it is usual for each data analysis initiative to define its own information model in line with its requirements. These specifications share clinical concepts, but differ in format and recording criteria, something that creates data entry redundancy in multiple electronic data capture systems (EDCs) with the consequent investment of effort and time by the organization. OBJECTIVE This study sought to design and implement a flexible methodology based on detailed clinical models (DCM), which would enable EHRs generated in a tertiary hospital to be effectively reused without loss of meaning and within a short time. MATERIAL AND METHODS The proposed methodology comprises four stages: (1) specification of an initial set of relevant variables for COVID-19; (2) modeling and formalization of clinical concepts using ISO 13606 standard and SNOMED CT and LOINC terminologies; (3) definition of transformation rules to generate secondary use models from standardized EHRs and development of them using R language; and (4) implementation and validation of the methodology through the generation of the International Severe Acute Respiratory and emerging Infection Consortium (ISARIC-WHO) COVID-19 case report form. This process has been implemented into a 1300-bed tertiary Hospital for a cohort of 4489 patients hospitalized from 25 February 2020 to 10 September 2020. RESULTS An initial and expandable set of relevant concepts for COVID-19 was identified, modeled and formalized using ISO-13606 standard and SNOMED CT and LOINC terminologies. Similarly, an algorithm was designed and implemented with R and then applied to process EHRs in accordance with standardized concepts, transforming them into secondary use models. Lastly, these resources were applied to obtain a data extract conforming to the ISARIC-WHO COVID-19 case report form, without requiring manual data collection. The methodology allowed obtaining the observation domain of this model with a coverage of over 85% of patients in the majority of concepts. CONCLUSION This study has furnished a solution to the difficulty of rapidly and efficiently obtaining EHR-derived data for secondary use in COVID-19, capable of adapting to changes in data specifications and applicable to other organizations and other health conditions. The conclusion to be drawn from this initial validation is that this DCM-based methodology allows the effective reuse of EHRs generated in a tertiary Hospital during COVID-19 pandemic, with no additional effort or time for the organization and with a greater data scope than that yielded by conventional manual data collection process in ad-hoc EDCs.
Collapse
Affiliation(s)
- Miguel Pedrera-Jiménez
- Hospital Universitario 12 de Octubre, Av. de Córdoba, s/n, 28041 Madrid, Spain; ETSI Telecomunicación, Universidad Politécnica de Madrid, 28040 Madrid, Spain.
| | | | - Jaime Cruz-Rojo
- Hospital Universitario 12 de Octubre, Av. de Córdoba, s/n, 28041 Madrid, Spain.
| | | | | | | | | | | | | | | | | | | | - Adolfo Muñoz-Carrero
- Digital Health Research Dept., Instituto de Salud Carlos III, Av. de Monforte de Lemos, 5, 28029 Madrid, Spain.
| |
Collapse
|
3
|
Standardized electronic health record data modeling and persistence: A comparative review. J Biomed Inform 2020; 114:103670. [PMID: 33359548 DOI: 10.1016/j.jbi.2020.103670] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 12/15/2020] [Accepted: 12/20/2020] [Indexed: 12/12/2022]
Abstract
With the extensive adoption of electronic health records (EHRs) by several healthcare organizations, more efforts are needed to manage and utilize such massive, various, and complex healthcare data. Databases' performance and suitability to health care tasks are dramatically affected by how their data storage model and query capabilities are well-adapted to the use case scenario. On the other hand, standardized healthcare data modeling is one of the most favorable paths for achieving semantic interoperability, facilitating patient data integration from different healthcare systems. This paper compares the state-of-the-art of the most crucial database management systems used for storing standardized EHRs data. It discusses different database models' appropriateness for meeting different EHRs functions with different database specifications and workload scenarios. Insights into relevant literature show how flexible NoSQL databases (document, column, and graph) effectively deal with standardized EHRs data's distinctive features, especially in the distributed healthcare system, leading to better EHR.
Collapse
|
4
|
Ramos M, Sánchez-de-Madariaga R, Barros J, Carrajo L, Vázquez G, Pérez S, Pascual M, Martín-Sánchez F, Muñoz-Carrero A. An Archetype Query Language interpreter into MongoDB: Managing NoSQL standardized Electronic Health Record extracts systems. J Biomed Inform 2020; 101:103339. [DOI: 10.1016/j.jbi.2019.103339] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2018] [Revised: 10/30/2019] [Accepted: 11/10/2019] [Indexed: 01/09/2023]
|
5
|
Satti FA, Ali T, Hussain J, Khan WA, Khattak AM, Lee S. Ubiquitous Health Profile (UHPr): a big data curation platform for supporting health data interoperability. COMPUTING 2020; 102. [PMCID: PMC7437110 DOI: 10.1007/s00607-020-00837-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
The lack of Interoperable healthcare data presents a major challenge, towards achieving ubiquitous health care. The plethora of diverse medical standards, rather than common standards, is widening the gap of interoperability. While many organizations are working towards a standardized solution, there is a need for an alternate strategy, which can intelligently mediate amongst a variety of medical systems, not complying with any mainstream healthcare standards while utilizing the benefits of several standard merging initiates, to eventually create digital health personas. The existence and efficiency of such a platform is dependent upon the underlying storage and processing engine, which can acquire, manage and retrieve the relevant medical data. In this paper, we present the Ubiquitous Health Profile (UHPr), a multi-dimensional data storage solution in a semi-structured data curation engine, which provides foundational support for archiving heterogeneous medical data and achieving partial data interoperability in the healthcare domain. Additionally, we present the evaluation results of this proposed platform in terms of its timeliness, accuracy, and scalability. Our results indicate that the UHPr is able to retrieve an error free comprehensive medical profile of a single patient, from a set of slightly over 116.5 million serialized medical fragments for 390,101 patients while maintaining a good scalablity ratio between amount of data and its retrieval speed.
Collapse
Affiliation(s)
- Fahad Ahmed Satti
- Ubiquitous Computing Lab, Department of Computer Engineering, Kyung Hee University, Global Campus, Yongin, South Korea
| | - Taqdir Ali
- Division of ICT, College of Science and Engineering, Hamad Bin Khalifa University (HBKU), Education City, Doha, Qatar
| | - Jamil Hussain
- Ubiquitous Computing Lab, Department of Computer Engineering, Kyung Hee University, Global Campus, Yongin, South Korea
| | - Wajahat Ali Khan
- College of Engineering and Technology, University of Derby, Markeaton Street, Derby, DE223AW UK
| | | | - Sungyoung Lee
- Ubiquitous Computing Lab, Department of Computer Engineering, Kyung Hee University, Global Campus, Yongin, South Korea
| |
Collapse
|
6
|
Kalogiannis S, Deltouzos K, Zacharaki EI, Vasilakis A, Moustakas K, Ellul J, Megalooikonomou V. Integrating an openEHR-based personalized virtual model for the ageing population within HBase. BMC Med Inform Decis Mak 2019; 19:25. [PMID: 30691467 PMCID: PMC6350370 DOI: 10.1186/s12911-019-0745-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2018] [Accepted: 01/14/2019] [Indexed: 11/17/2022] Open
Abstract
Background Frailty is a common clinical syndrome in ageing population that carries an increased risk for adverse health outcomes including falls, hospitalization, disability, and mortality. As these outcomes affect the health and social care planning, during the last years there is a tendency of investing in monitoring and preventing strategies. Although a number of electronic health record (EHR) systems have been developed, including personalized virtual patient models, there are limited ageing population oriented systems. Methods We exploit the openEHR framework for the representation of frailty in ageing population in order to attain semantic interoperability, and we present the methodology for adoption or development of archetypes. We also propose a framework for a one-to-one mapping between openEHR archetypes and a column-family NoSQL database (HBase) aiming at the integration of existing and newly developed archetypes into it. Results The requirement analysis of our study resulted in the definition of 22 coherent and clinically meaningful parameters for the description of frailty in older adults. The implemented openEHR methodology led to the direct use of 22 archetypes, the modification and reuse of two archetypes, and the development of 28 new archetypes. Additionally, the mapping procedure led to two different HBase tables for the storage of the data. Conclusions In this work, an openEHR-based virtual patient model has been designed and integrated into an HBase storage system, exploiting the advantages of the underlying technologies. This framework can serve as a base for the development of a decision support system using the openEHR’s Guideline Definition Language in the future.
Collapse
Affiliation(s)
- Spyridon Kalogiannis
- Computer Engineering and Informatics Department, University of Patras, University Campus, Rio, 26504, Greece
| | - Konstantinos Deltouzos
- Computer Engineering and Informatics Department, University of Patras, University Campus, Rio, 26504, Greece.
| | - Evangelia I Zacharaki
- Computer Engineering and Informatics Department, University of Patras, University Campus, Rio, 26504, Greece
| | - Andreas Vasilakis
- Information Technologies Institute, Centre for Research and Technology Hellas, 6th km Charilaou-Thermi Rd, Thessaloniki, 57001, Greece
| | - Konstantinos Moustakas
- Information Technologies Institute, Centre for Research and Technology Hellas, 6th km Charilaou-Thermi Rd, Thessaloniki, 57001, Greece
| | - John Ellul
- Department of Neurology, School of Medicine, University of Patras, University Campus, Rio, 26504, Greece
| | - Vasileios Megalooikonomou
- Computer Engineering and Informatics Department, University of Patras, University Campus, Rio, 26504, Greece
| |
Collapse
|
7
|
Sánchez-de-Madariaga R, Muñoz A, Castro AL, Moreno O, Pascual M. Executing Complexity-Increasing Queries in Relational (MySQL) and NoSQL (MongoDB and EXist) Size-Growing ISO/EN 13606 Standardized EHR Databases. J Vis Exp 2018. [PMID: 29608174 PMCID: PMC5933229 DOI: 10.3791/57439] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022] Open
Abstract
This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form.
Collapse
Affiliation(s)
| | - Adolfo Muñoz
- Telemedicine and Information Society Department, Health Institute "Carlos III";
| | - Antonio L Castro
- Telemedicine and Information Society Department, Health Institute "Carlos III"
| | - Oscar Moreno
- Telemedicine and Information Society Department, Health Institute "Carlos III"
| | - Mario Pascual
- Telemedicine and Information Society Department, Health Institute "Carlos III"
| |
Collapse
|