Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Hao T, Liu H, Weng C. Valx: A System for Extracting and Structuring Numeric Lab Test Comparison Statements from Text. Methods Inf Med 2016;55:266-75. [PMID: 26940748 DOI: 10.3414/me15-01-0112] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Accepted: 02/07/2016] [Indexed: 01/08/2023]

For:	Hao T, Liu H, Weng C. Valx: A System for Extracting and Structuring Numeric Lab Test Comparison Statements from Text. Methods Inf Med 2016;55:266-75. [PMID: 26940748 DOI: 10.3414/me15-01-0112] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Accepted: 02/07/2016] [Indexed: 01/08/2023]

Number

Cited by Other Article(s)

Tian S, Yin P, Zhang H, Erdengasileng A, Bian J, He Z. Parsing Clinical Trial Eligibility Criteria for Cohort Query by a Multi-Input Multi-Output Sequence Labeling Model. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2023;2023:4426-4430. [PMID: 39015287 PMCID: PMC11251129 DOI: 10.1109/bibm58861.2023.10385876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/18/2024]

Yang L, Huang X, Wang J, Yang X, Ding L, Li Z, Li J. Identifying stroke-related quantified evidence from electronic health records in real-world studies. Artif Intell Med 2023;140:102552. [PMID: 37210153 DOI: 10.1016/j.artmed.2023.102552] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 02/28/2023] [Accepted: 04/11/2023] [Indexed: 05/22/2023]

Abstract

BACKGROUND

Stroke is one of the leading causes of death and disability worldwide. The National Institutes of Health Stroke Scale (NIHSS) scores in electronic health records (EHRs), which quantitatively describe patients' neurological deficits in evidence-based treatment, are crucial in stroke-related clinical investigations. However, the free-text format and lack of standardization inhibit their effective use. Automatically extracting the scale scores from the clinical free text so that its potential value in real-world studies is realized has become an important goal.

OBJECTIVE

This study aims to develop an automated method to extract scale scores from the free text of EHRs.

METHODS

We propose a two-step pipeline method to identify NIHSS items and numerical scores and validate its feasibility using a freely accessible critical care database: MIMIC-III (Medical Information Mart for Intensive Care III). First, we utilize MIMIC-III to create an annotated corpus. Then, we investigate possible machine learning methods for two subtasks, NIHSS item and score recognition and item-score relation extraction. In the evaluation, we conduct both task-specific and end-to-end evaluations and compare our method with the rule-based method using precision, recall and F1 scores as evaluation metrics.

RESULTS

We use all available discharge summaries of stroke cases in MIMIC-III. The annotated NIHSS corpus contains 312 cases, 2929 scale items, 2774 scores and 2733 relations. The results show that the best F1-score of our method was 0.9006, which was attained by combining BERT-BiLSTM-CRF and Random Forest, and it outperformed the rule-based method (F1-score = 0.8098). In the end-to-end task, our method could successfully recognize the item "1b level of consciousness questions", the score "1" and their relation "('1b level of consciousness questions', '1', 'has value')" from the sentence "1b level of consciousness questions: said name = 1", while the rule-based method could not.

CONCLUSIONS

The two-step pipeline method we propose is an effective approach to identify NIHSS items, scores and their relations. With its help, clinical investigators can easily retrieve and access structured scale data, thereby supporting stroke-related real-world studies.

Collapse

Affiliation(s)

Lin Yang Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China; Key Laboratory of Medical Information Intelligent Technology, Chinese Academy of Medical Sciences, Beijing 100020, China
Xiaoshuo Huang Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China; School of Health Care Technology, Dalian Neusoft University of Information, Dalian 116023, China
Jiayang Wang Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China
Xin Yang China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China; National Center for Healthcare Quality Management in Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
Lingling Ding China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China; Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
Zixiao Li China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China; National Center for Healthcare Quality Management in Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China; Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
Jiao Li Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China; Key Laboratory of Medical Information Intelligent Technology, Chinese Academy of Medical Sciences, Beijing 100020, China.

Collapse

Fang Y, Idnay B, Sun Y, Liu H, Chen Z, Marder K, Xu H, Schnall R, Weng C. Combining human and machine intelligence for clinical trial eligibility querying. J Am Med Inform Assoc 2022;29:1161-1171. [PMID: 35426943 PMCID: PMC9196697 DOI: 10.1093/jamia/ocac051] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 03/29/2022] [Indexed: 11/13/2022] Open

Rafee A, Riepenhausen S, Neuhaus P, Meidt A, Dugas M, Varghese J. ELaPro, a LOINC-mapped core dataset for top laboratory procedures of eligibility screening for clinical trials. BMC Med Res Methodol 2022;22:141. [PMID: 35568796 PMCID: PMC9107639 DOI: 10.1186/s12874-022-01611-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 04/20/2022] [Indexed: 12/21/2022] Open

Abstract

Background

Screening for eligible patients continues to pose a great challenge for many clinical trials. This has led to a rapidly growing interest in standardizing computable representations of eligibility criteria (EC) in order to develop tools that leverage data from electronic health record (EHR) systems. Although laboratory procedures (LP) represent a common entity of EC that is readily available and retrievable from EHR systems, there is a lack of interoperable data models for this entity of EC. A public, specialized data model that utilizes international, widely-adopted terminology for LP, e.g. Logical Observation Identifiers Names and Codes (LOINC®), is much needed to support automated screening tools.

Objective

The aim of this study is to establish a core dataset for LP most frequently requested to recruit patients for clinical trials using LOINC terminology. Employing such a core dataset could enhance the interface between study feasibility platforms and EHR systems and significantly improve automatic patient recruitment.

Methods

We used a semi-automated approach to analyze 10,516 screening forms from the Medical Data Models (MDM) portal’s data repository that are pre-annotated with Unified Medical Language System (UMLS). An automated semantic analysis based on concept frequency is followed by an extensive manual expert review performed by physicians to analyze complex recruitment-relevant concepts not amenable to automatic approach.

Results

Based on analysis of 138,225 EC from 10,516 screening forms, 55 laboratory procedures represented 77.87% of all UMLS laboratory concept occurrences identified in the selected EC forms. We identified 26,413 unique UMLS concepts from 118 UMLS semantic types and covered the vast majority of Medical Subject Headings (MeSH) disease domains.

Conclusions

Only a small set of common LP covers the majority of laboratory concepts in screening EC forms which supports the feasibility of establishing a focused core dataset for LP. We present ELaPro, a novel, LOINC-mapped, core dataset for the most frequent 55 LP requested in screening for clinical trials. ELaPro is available in multiple machine-readable data formats like CSV, ODM and HL7 FHIR. The extensive manual curation of this large number of free-text EC as well as the combining of UMLS and LOINC terminologies distinguishes this specialized dataset from previous relevant datasets in the literature.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12874-022-01611-y.

Collapse

Tian S, Erdengasileng A, Yang X, Guo Y, Wu Y, Zhang J, Bian J, He Z. Transformer-Based Named Entity Recognition for Parsing Clinical Trial Eligibility Criteria. ACM-BCB ... ... : THE ... ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE. ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE 2021;2021. [PMID: 34414397 DOI: 10.1145/3459930.3469560] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Percha B. Modern Clinical Text Mining: A Guide and Review. Annu Rev Biomed Data Sci 2021;4:165-187. [PMID: 34465177 DOI: 10.1146/annurev-biodatasci-030421-030931] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Turchin A, Florez Builes LF. Using Natural Language Processing to Measure and Improve Quality of Diabetes Care: A Systematic Review. J Diabetes Sci Technol 2021;15:553-560. [PMID: 33736486 PMCID: PMC8120048 DOI: 10.1177/19322968211000831] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

He Z, Erdengasileng A, Luo X, Xing A, Charness N, Bian J. How the clinical research community responded to the COVID-19 pandemic: an analysis of the COVID-19 clinical studies in ClinicalTrials.gov. JAMIA Open 2021;4:ooab032. [PMID: 34056559 PMCID: PMC8083215 DOI: 10.1093/jamiaopen/ooab032] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Revised: 02/15/2021] [Accepted: 04/13/2021] [Indexed: 02/07/2023] Open

Liu C, Yuan C, Butler AM, Carvajal RD, Li ZR, Ta CN, Weng C. DQueST: dynamic questionnaire for search of clinical trials. J Am Med Inform Assoc 2021;26:1333-1343. [PMID: 31390010 PMCID: PMC6798577 DOI: 10.1093/jamia/ocz121] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 05/31/2019] [Accepted: 06/18/2019] [Indexed: 11/27/2022] Open

Abstract

Objective

Information overload remains a challenge for patients seeking clinical trials. We present a novel system (DQueST) that reduces information overload for trial seekers using dynamic questionnaires.

Materials and Methods

DQueST first performs information extraction and criteria library curation. DQueST transforms criteria narratives in the ClinicalTrials.gov repository into a structured format, normalizes clinical entities using standard concepts, clusters related criteria, and stores the resulting curated library. DQueST then implements a real-time dynamic question generation algorithm. During user interaction, the initial search is similar to a standard search engine, and then DQueST performs real-time dynamic question generation to select criteria from the library 1 at a time by maximizing its relevance score that reflects its ability to rule out ineligible trials. DQueST dynamically updates the remaining trial set by removing ineligible trials based on user responses to corresponding questions. The process iterates until users decide to stop and begin manually reviewing the remaining trials.

Results

In simulation experiments initiated by 10 diseases, DQueST reduced information overload by filtering out 60%–80% of initial trials after 50 questions. Reviewing the generated questions against previous answers, on average, 79.7% of the questions were relevant to the queried conditions. By examining the eligibility of random samples of trials ruled out by DQueST, we estimate the accuracy of the filtering procedure is 63.7%. In a study using 5 mock patient profiles, DQueST on average retrieved trials with a 1.465 times higher density of eligible trials than an existing search engine. In a patient-centered usability evaluation, patients found DQueST useful, easy to use, and returning relevant results.

Conclusion

DQueST contributes a novel framework for transforming free-text eligibility criteria to questions and filtering out clinical trials based on user answers to questions dynamically. It promises to augment keyword-based methods to improve clinical trial search.

Collapse

Lu Y, Luo X, Zhang Z, Ding H, He Z. Retrieving Lab Test Related Questions from Social Q&A Sites by Combining Shallow Features and Deep Representations. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2021;2020:783-792. [PMID: 33936453 PMCID: PMC8075538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

He Z, Erdengasileng A, Luo X, Xing A, Charness N, Bian J. How the clinical research community responded to the COVID-19 pandemic: An analysis of the COVID-19 clinical studies in ClinicalTrials.gov. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2020:2020.09.16.20195552. [PMID: 32995807 PMCID: PMC7523146 DOI: 10.1101/2020.09.16.20195552] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Jiang K, Yang T, Wu C, Chen L, Mao L, Wu Y, Deng L, Jiang T. LATTE: A knowledge-based method to normalize various expressions of laboratory test results in free text of Chinese electronic health records. J Biomed Inform 2020;102:103372. [PMID: 31901507 DOI: 10.1016/j.jbi.2019.103372] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2019] [Revised: 12/29/2019] [Accepted: 12/30/2019] [Indexed: 12/23/2022]

Abstract

BACKGROUND

A wealth of clinical information is buried in free text of electronic health records (EHR), and converting clinical information to machine-understandable form is crucial for the secondary use of EHRs. Laboratory test results, as one of the most important types of clinical information, are written in various styles in free text of EHRs. This has brought great difficulties for data integration and utilization of EHRs. Therefore, developing technology to normalize different expressions of laboratory test results in free text is indispensable for the secondary use of EHRs.

METHODS

In this study, we developed a knowledge-based method named LATTE (transforming lab test results), which could transform various expressions of laboratory test results into a normalized and machine-understandable format. We first identified the analyte of a laboratory test result with a dictionary-based method and then designed a series of rules to detect information associated with the analyte, including its specimen, measured value, unit of measure, conclusive phrase and sampling factor. We determined whether a test result is normal or abnormal by understanding the meaning of conclusive phrases or by comparing its measured value with an appropriate normal range. Finally, we converted various expressions of laboratory test results, either in numeric or textual form, into a normalized form as "specimen-analyte-abnormality". With this method, a laboratory test with the same type of abnormality would have the same representation, regardless of the way that it is mentioned in free text.

RESULTS

LATTE was developed and optimized on a training set including 8894 laboratory test results from 756 EHRs, and evaluated on a test set including 3740 laboratory test results from 210 EHRs. Compared to experts' annotations, LATTE achieved a precision of 0.936, a recall of 0.897 and an F1 score of 0.916 on the training set, and a precision of 0.892, a recall of 0.843 and an F1 score of 0.867 on the test set. For 223 laboratory tests with at least two different expression forms in the test set, LATTE transformed 85.7% (2870/3350) of laboratory test results into a normalized form. Besides, LATTE achieved F1 scores above 0.8 for EHRs from 18 of 21 different hospital departments, indicating its generalization capabilities in normalizing laboratory test results.

CONCLUSION

In conclusion, LATTE is an effective method for normalizing various expressions of laboratory test results in free text of EHRs. LATTE will facilitate EHR-based applications such as cohort querying, patient clustering and machine learning.

AVAILABILITY

LATTE is freely available for download on GitHub (https://github.com/denglizong/LATTE).

Collapse

Przybyła P, Brockmeier AJ, Ananiadou S. Quantifying risk factors in medical reports with a context-aware linear model. J Am Med Inform Assoc 2019;26:537-546. [PMID: 30840055 PMCID: PMC6515525 DOI: 10.1093/jamia/ocz004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Revised: 12/14/2018] [Accepted: 01/03/2019] [Indexed: 12/03/2022] Open

He Z, Rizvi RF, Yang F, Adam TJ, Zhang R. Comparing the Study Populations in Dietary Supplement and Drug Clinical Trials for Metabolic Syndrome and Related Disorders. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2019;2019:799-808. [PMID: 31259037 PMCID: PMC6568097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Yuan C, Ryan PB, Ta C, Guo Y, Li Z, Hardin J, Makadia R, Jin P, Shang N, Kang T, Weng C. Criteria2Query: a natural language interface to clinical databases for cohort definition. J Am Med Inform Assoc 2019;26:294-305. [PMID: 30753493 PMCID: PMC6402359 DOI: 10.1093/jamia/ocy178] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 11/16/2018] [Accepted: 11/29/2018] [Indexed: 11/12/2022] Open

Abstract

OBJECTIVE

Cohort definition is a bottleneck for conducting clinical research and depends on subjective decisions by domain experts. Data-driven cohort definition is appealing but requires substantial knowledge of terminologies and clinical data models. Criteria2Query is a natural language interface that facilitates human-computer collaboration for cohort definition and execution using clinical databases.

MATERIALS AND METHODS

Criteria2Query uses a hybrid information extraction pipeline combining machine learning and rule-based methods to systematically parse eligibility criteria text, transforms it first into a structured criteria representation and next into sharable and executable clinical data queries represented as SQL queries conforming to the OMOP Common Data Model. Users can interactively review, refine, and execute queries in the ATLAS web application. To test effectiveness, we evaluated 125 criteria across different disease domains from ClinicalTrials.gov and 52 user-entered criteria. We evaluated F1 score and accuracy against 2 domain experts and calculated the average computation time for fully automated query formulation. We conducted an anonymous survey evaluating usability.

RESULTS

Criteria2Query achieved 0.795 and 0.805 F1 score for entity recognition and relation extraction, respectively. Accuracies for negation detection, logic detection, entity normalization, and attribute normalization were 0.984, 0.864, 0.514 and 0.793, respectively. Fully automatic query formulation took 1.22 seconds/criterion. More than 80% (11+ of 13) of users would use Criteria2Query in their future cohort definition tasks.

CONCLUSIONS

We contribute a novel natural language interface to clinical databases. It is open source and supports fully automated and interactive modes for autonomous data-driven cohort definition by researchers with minimal human effort. We demonstrate its promising user friendliness and usability.

Collapse

Hao T, Pan X, Gu Z, Qu Y, Weng H. A pattern learning-based method for temporal expression extraction and normalization from multi-lingual heterogeneous clinical texts. BMC Med Inform Decis Mak 2018;18:22. [PMID: 29589563 PMCID: PMC5872502 DOI: 10.1186/s12911-018-0595-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Chen X, Xie H, Wang FL, Liu Z, Xu J, Hao T. A bibliometric analysis of natural language processing in medical research. BMC Med Inform Decis Mak 2018;18:14. [PMID: 29589569 PMCID: PMC5872501 DOI: 10.1186/s12911-018-0594-x] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open

Abstract

Background

Natural language processing (NLP) has become an increasingly significant role in advancing medicine. Rich research achievements of NLP methods and applications for medical information processing are available. It is of great significance to conduct a deep analysis to understand the recent development of NLP-empowered medical research field. However, limited study examining the research status of this field could be found. Therefore, this study aims to quantitatively assess the academic output of NLP in medical research field.

Methods

We conducted a bibliometric analysis on NLP-empowered medical research publications retrieved from PubMed in the period 2007–2016. The analysis focused on three aspects. Firstly, the literature distribution characteristics were obtained with a statistics analysis method. Secondly, a network analysis method was used to reveal scientific collaboration relations. Finally, thematic discovery and evolution was reflected using an affinity propagation clustering method.

Results

There were 1405 NLP-empowered medical research publications published during the 10 years with an average annual growth rate of 18.39%. 10 most productive publication sources together contributed more than 50% of the total publications. The USA had the highest number of publications. A moderately significant correlation between country’s publications and GDP per capita was revealed. Denny, Joshua C was the most productive author. Mayo Clinic was the most productive affiliation. The annual co-affiliation and co-country rates reached 64.04% and 15.79% in 2016, respectively. 10 main great thematic areas were identified including Computational biology, Terminology mining, Information extraction, Text classification, Social medium as data source, Information retrieval, etc.

Conclusions

A bibliometric analysis of NLP-empowered medical research publications for uncovering the recent research status is presented. The results can assist relevant researchers, especially newcomers in understanding the research development systematically, seeking scientific cooperation partners, optimizing research topic choices and monitoring new scientific or technological activities.

Collapse

Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N, Liu S, Zeng Y, Mehrabi S, Sohn S, Liu H. Clinical information extraction applications: A literature review. J Biomed Inform 2018;77:34-49. [PMID: 29162496 PMCID: PMC5771858 DOI: 10.1016/j.jbi.2017.11.011] [Citation(s) in RCA: 316] [Impact Index Per Article: 52.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 11/01/2017] [Accepted: 11/17/2017] [Indexed: 12/24/2022]

Kang T, Zhang S, Tang Y, Hruby GW, Rusanov A, Elhadad N, Weng C. EliIE: An open-source information extraction system for clinical trial eligibility criteria. J Am Med Inform Assoc 2017;24:1062-1071. [PMID: 28379377 PMCID: PMC6259668 DOI: 10.1093/jamia/ocx019] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Revised: 01/31/2017] [Accepted: 03/02/2017] [Indexed: 12/22/2022] Open

Sen A, Goldstein A, Chakrabarti S, Shang N, Kang T, Yaman A, Ryan PB, Weng C. The representativeness of eligible patients in type 2 diabetes trials: a case study using GIST 2.0. J Am Med Inform Assoc 2017;25:239-247. [PMID: 29025047 PMCID: PMC7378875 DOI: 10.1093/jamia/ocx091] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2017] [Revised: 06/23/2017] [Accepted: 08/08/2017] [Indexed: 01/23/2023] Open

Abstract

Objective

The population representativeness of a clinical study is influenced by how real-world patients qualify for the study. We analyze the representativeness of eligible patients for multiple type 2 diabetes trials and the relationship between representativeness and other trial characteristics.

Methods

Sixty-nine study traits available in the electronic health record data for 2034 patients with type 2 diabetes were used to profile the target patients for type 2 diabetes trials. A set of 1691 type 2 diabetes trials was identified from ClinicalTrials.gov, and their population representativeness was calculated using the published Generalizability Index of Study Traits 2.0 metric. The relationships between population representativeness and number of traits and between trial duration and trial metadata were statistically analyzed. A focused analysis with only phase 2 and 3 interventional trials was also conducted.

Results

A total of 869 of 1691 trials (51.4%) and 412 of 776 phase 2 and 3 interventional trials (53.1%) had a population representativeness of <5%. The overall representativeness was significantly correlated with the representativeness of the Hba1c criterion. The greater the number of criteria or the shorter the trial, the less the representativeness. Among the trial metadata, phase, recruitment status, and start year were found to have a statistically significant effect on population representativeness. For phase 2 and 3 interventional trials, only start year was significantly associated with representativeness.

Conclusions

Our study quantified the representativeness of multiple type 2 diabetes trials. The common low representativeness of type 2 diabetes trials could be attributed to specific study design requirements of trials or safety concerns. Rather than criticizing the low representativeness, we contribute a method for increasing the transparency of the representativeness of clinical trials.

Collapse

Correlating Lab Test Results in Clinical Notes with Structured Lab Data: A Case Study in HbA1c and Glucose. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2017;2017:221-228. [PMID: 28815133 PMCID: PMC5543347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

He Z, Gonzalez-Izquierdo A, Denaxas S, Sura A, Guo Y, Hogan WR, Shenkman E, Bian J. Comparing and Contrasting A Priori and A Posteriori Generalizability Assessment of Clinical Trials on Type 2 Diabetes Mellitus. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017;2017:849-858. [PMID: 29854151 PMCID: PMC5977671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]

Si Y, Weng C. An OMOP CDM-Based Relational Database of Clinical Research Eligibility Criteria. Stud Health Technol Inform 2017;245:950-954. [PMID: 29295240 PMCID: PMC5893219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

GIST 2.0: A scalable multi-trait metric for quantifying population representativeness of individual clinical studies. J Biomed Inform 2016;63:325-336. [PMID: 27600407 DOI: 10.1016/j.jbi.2016.09.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Revised: 07/02/2016] [Accepted: 09/02/2016] [Indexed: 12/20/2022]