1
|
Rogers JR, Pavisic J, Ta CN, Liu C, Soroush A, Cheung YK, Hripcsak G, Weng C. Leveraging electronic health record data for clinical trial planning by assessing eligibility criteria's impact on patient count and safety. J Biomed Inform 2022; 127:104032. [PMID: 35189334 PMCID: PMC8920749 DOI: 10.1016/j.jbi.2022.104032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 02/14/2022] [Accepted: 02/15/2022] [Indexed: 10/19/2022]
Abstract
OBJECTIVE To present an approach on using electronic health record (EHR) data that assesses how different eligibility criteria, either individually or in combination, can impact patient count and safety (exemplified by all-cause hospitalization risk) and further assist with criteria selection for prospective clinical trials. MATERIALS AND METHODS Trials in three disease domains - relapsed/refractory (r/r) lymphoma/leukemia; hepatitis C virus (HCV); stages 3 and 4 chronic kidney disease (CKD) - were analyzed as case studies for this approach. For each disease domain, criteria were identified and all criteria combinations were used to create EHR cohorts. Per combination, two values were derived: (1) number of eligible patients meeting the selected criteria; (2) hospitalization risk, measured as the hazard ratio between those that qualified and those that did not. From these values, k-means clustering was applied to derive which criteria combinations maximized patient counts but minimized hospitalization risk. RESULTS Criteria combinations that reduced hospitalization risk without substantial reductions on patient counts were as follows: for r/r lymphoma/leukemia (23 trials; 9 criteria; 623 patients), applying no infection and adequate absolute neutrophil count while forgoing no prior malignancy; for HCV (15; 7; 751), applying no human immunodeficiency virus and no hepatocellular carcinoma while forgoing no decompensated liver disease/cirrhosis; for CKD (10; 9; 23893), applying no congestive heart failure. CONCLUSIONS Within each disease domain, the more drastic effects were generally driven by a few criteria. Similar criteria across different disease domains introduce different changes. Although results are contingent on the trial sample and the EHR data used, this approach demonstrates how EHR data can inform the impact on safety and available patients when exploring different criteria combinations for designing clinical trials.
Collapse
Affiliation(s)
- James R. Rogers
- Department of Biomedical Informatics, Columbia University, New York, NY
| | - Jovana Pavisic
- Department of Pediatrics, Division of Pediatric Hematology, Oncology, and Stem Cell Transplantation, Columbia University Irving Medical Center, New York, NY
| | - Casey N. Ta
- Department of Biomedical Informatics, Columbia University, New York, NY
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY
| | - Ali Soroush
- Department of Biomedical Informatics, Columbia University, New York, NY,Division of Gastroenterology, Department of Medicine, Columbia University Irving Medical Center, New York, NY
| | | | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, NY,Medical Informatics Services, New York-Presbyterian Hospital, New York, NY
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, United States.
| |
Collapse
|
2
|
Lamy JB. A data science approach to drug safety: Semantic and visual mining of adverse drug events from clinical trials of pain treatments. Artif Intell Med 2021; 115:102074. [PMID: 34001324 DOI: 10.1016/j.artmed.2021.102074] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 01/21/2021] [Accepted: 04/07/2021] [Indexed: 10/21/2022]
Abstract
Clinical trials are the basis of Evidence-Based Medicine. Trial results are reviewed by experts and consensus panels for producing meta-analyses and clinical practice guidelines. However, reviewing these results is a long and tedious task, hence the meta-analyses and guidelines are not updated each time a new trial is published. Moreover, the independence of experts may be difficult to appraise. On the contrary, in many other domains, including medical risk analysis, the advent of data science, big data and visual analytics allowed moving from expert-based to fact-based knowledge. Since 12 years, many trial results are publicly available online in trial registries. Nevertheless, data science methods have not yet been applied widely to trial data. In this paper, we present a platform for analyzing the safety events reported during clinical trials and published in trial registries. This platform is based on an ontological model including 582 trials on pain treatments, and uses semantic web technologies for querying this dataset at various levels of granularity. It also relies on a 26-dimensional flower glyph for the visualization of the Adverse Drug Events (ADE) rates in 13 categories and 2 levels of seriousness. We illustrate the interest of this platform through several use cases and we were able to find back conclusions that were initially found during meta-analyses. The platform was presented to four experts in drug safety, and is publicly available online, with the ontology of pain treatment ADE.
Collapse
Affiliation(s)
- Jean-Baptiste Lamy
- Université Sorbonne Paris Nord, LIMICS, Sorbonne Université, INSERM, UMR 1142, F-93000 Bobigny, France; Laboratoire de Recherche en Informatique, CNRS/Université Paris-Sud/Université Paris-Saclay, Orsay, France.
| |
Collapse
|
3
|
A knowledge base of clinical trial eligibility criteria. J Biomed Inform 2021; 117:103771. [PMID: 33813032 DOI: 10.1016/j.jbi.2021.103771] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 03/25/2021] [Accepted: 03/30/2021] [Indexed: 11/23/2022]
Abstract
OBJECTIVE We present the Clinical Trial Knowledge Base, a regularly updated knowledge base of discrete clinical trial eligibility criteria equipped with a web-based user interface for querying and aggregate analysis of common eligibility criteria. MATERIALS AND METHODS We used a natural language processing (NLP) tool named Criteria2Query (Yuan et al., 2019) to transform free text clinical trial eligibility criteria from ClinicalTrials.gov into discrete criteria concepts and attributes encoded using the widely adopted Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) and stored in a relational SQL database. A web application accessible via RESTful APIs was implemented to enable queries and visual aggregate analyses. We demonstrate CTKB's potential role in EHR phenotype knowledge engineering using ten validated phenotyping algorithms. RESULTS At the time of writing, CTKB contained 87,504 distinctive OMOP CDM standard concepts, including Condition (47.82%), Drug (23.01%), Procedure (13.73%), Measurement (24.70%) and Observation (5.28%), with 34.78% for inclusion criteria and 65.22% for exclusion criteria, extracted from 352,110 clinical trials. The average hit rate of criteria concepts in eMERGE phenotype algorithms is 77.56%. CONCLUSION CTKB is a novel comprehensive knowledge base of discrete eligibility criteria concepts with the potential to enable knowledge engineering for clinical trial cohort definition, clinical trial population representativeness assessment, electronical phenotyping, and data gap analyses for using electronic health records to support clinical trial recruitment.
Collapse
|
4
|
Gerido LH, Tang X, Ernst B, Langford A, He Z. Patient Engagement in Medical Research Among Older Adults: Analysis of the Health Information National Trends Survey. J Med Internet Res 2019; 21:e15035. [PMID: 31663860 PMCID: PMC6914241 DOI: 10.2196/15035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Revised: 09/08/2019] [Accepted: 09/24/2019] [Indexed: 01/29/2023] Open
Abstract
Background By 2035, it is expected that older adults (aged 65 years and older) will outnumber children and will represent 78 million people in the US population. As the aging population continues to grow, it is critical to reduce disparities in their representation in medical research. Objective This study aimed to describe sociodemographic characteristics and health and information behaviors as factors that influence US adults’ interest in engaging in medical research, beyond participation as study subjects. Methods Nationally representative cross-sectional data from the 2014 Health Information National Trends Survey (N=3677) were analyzed. Descriptive statistics and weighted multivariable logistic regression analyses were performed to assess predictors of one’s interest in patient engagement in medical research. The independent variables included age, general health, income, race and ethnicity, education level, insurance status, marital status, and health information behaviors. Results We examined the association between the independent variables and patient interest in engaging in medical research (PTEngage_Interested). Patient interest in engaging in medical research has a statistically significant association with age (adjusted P<.01). Younger adults (aged 18-34 years), lower middle-aged adults (aged 35-49 years), and higher middle-aged adults (aged 50-64 years) indicated interest at relatively the same frequency (29.08%, 29.56%, and 25.12%, respectively), but older adults (aged ≥65 years) expressed less interest (17.10%) than the other age groups. After the multivariate model was run, older adults (odds ratio 0.738, 95% CI 0.500-1.088) were found to be significantly less likely to be interested in engaging in medical research than adults aged 50 to 64 years. Regardless of age, the strongest correlation was found between interest in engaging in medical research and actively looking for health information (P<.001). Respondents who did not seek health information were significantly less likely than those who did seek health information to be interested in engaging in medical research. Conclusions Patients’ interest in engaging in medical research vary by age and information-seeking behaviors. As the aging population continues to grow, it is critical to reduce disparities in their representation in medical research. Interest in participatory research methods may reflect an opportunity for consumer health informatics technologies to improve the representation of older adults in future medical research.
Collapse
Affiliation(s)
| | - Xiang Tang
- Department of Statistics, Florida State University, Tallahassee, FL, United States
| | - Brittany Ernst
- College of Human Sciences, Florida State University, Tallahassee, FL, United States
| | - Aisha Langford
- Department of Population Health, School of Medicine, New York University, New York, NY, United States
| | - Zhe He
- School of Information, Florida State University, Tallahassee, FL, United States
| |
Collapse
|
5
|
Surian D, Dunn AG, Orenstein L, Bashir R, Coiera E, Bourgeois FT. A shared latent space matrix factorisation method for recommending new trial evidence for systematic review updates. J Biomed Inform 2018; 79:32-40. [PMID: 29410356 DOI: 10.1016/j.jbi.2018.01.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Revised: 12/09/2017] [Accepted: 01/22/2018] [Indexed: 10/18/2022]
Abstract
BACKGROUND Clinical trial registries can be used to monitor the production of trial evidence and signal when systematic reviews become out of date. However, this use has been limited to date due to the extensive manual review required to search for and screen relevant trial registrations. Our aim was to evaluate a new method that could partially automate the identification of trial registrations that may be relevant for systematic review updates. MATERIALS AND METHODS We identified 179 systematic reviews of drug interventions for type 2 diabetes, which included 537 clinical trials that had registrations in ClinicalTrials.gov. Text from the trial registrations were used as features directly, or transformed using Latent Dirichlet Allocation (LDA) or Principal Component Analysis (PCA). We tested a novel matrix factorisation approach that uses a shared latent space to learn how to rank relevant trial registrations for each systematic review, comparing the performance to document similarity to rank relevant trial registrations. The two approaches were tested on a holdout set of the newest trials from the set of type 2 diabetes systematic reviews and an unseen set of 141 clinical trial registrations from 17 updated systematic reviews published in the Cochrane Database of Systematic Reviews. The performance was measured by the number of relevant registrations found after examining 100 candidates (recall@100) and the median rank of relevant registrations in the ranked candidate lists. RESULTS The matrix factorisation approach outperformed the document similarity approach with a median rank of 59 (of 128,392 candidate registrations in ClinicalTrials.gov) and recall@100 of 60.9% using LDA feature representation, compared to a median rank of 138 and recall@100 of 42.8% in the document similarity baseline. In the second set of systematic reviews and their updates, the highest performing approach used document similarity and gave a median rank of 67 (recall@100 of 62.9%). CONCLUSIONS A shared latent space matrix factorisation method was useful for ranking trial registrations to reduce the manual workload associated with finding relevant trials for systematic review updates. The results suggest that the approach could be used as part of a semi-automated pipeline for monitoring potentially new evidence for inclusion in a review update.
Collapse
Affiliation(s)
- Didi Surian
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia.
| | - Adam G Dunn
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Liat Orenstein
- Computational Health Informatics Program, Boston Children's Hospital, Boston, United States
| | - Rabia Bashir
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Enrico Coiera
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Florence T Bourgeois
- Computational Health Informatics Program, Boston Children's Hospital, Boston, United States; Department of Pediatrics, Harvard Medical School, Boston, United States
| |
Collapse
|
6
|
Kang T, Zhang S, Tang Y, Hruby GW, Rusanov A, Elhadad N, Weng C. EliIE: An open-source information extraction system for clinical trial eligibility criteria. J Am Med Inform Assoc 2017; 24:1062-1071. [PMID: 28379377 PMCID: PMC6259668 DOI: 10.1093/jamia/ocx019] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Revised: 01/31/2017] [Accepted: 03/02/2017] [Indexed: 12/22/2022] Open
Abstract
OBJECTIVE To develop an open-source information extraction system called Eligibility Criteria Information Extraction (EliIE) for parsing and formalizing free-text clinical research eligibility criteria (EC) following Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) version 5.0. MATERIALS AND METHODS EliIE parses EC in 4 steps: (1) clinical entity and attribute recognition, (2) negation detection, (3) relation extraction, and (4) concept normalization and output structuring. Informaticians and domain experts were recruited to design an annotation guideline and generate a training corpus of annotated EC for 230 Alzheimer's clinical trials, which were represented as queries against the OMOP CDM and included 8008 entities, 3550 attributes, and 3529 relations. A sequence labeling-based method was developed for automatic entity and attribute recognition. Negation detection was supported by NegEx and a set of predefined rules. Relation extraction was achieved by a support vector machine classifier. We further performed terminology-based concept normalization and output structuring. RESULTS In task-specific evaluations, the best F1 score for entity recognition was 0.79, and for relation extraction was 0.89. The accuracy of negation detection was 0.94. The overall accuracy for query formalization was 0.71 in an end-to-end evaluation. CONCLUSIONS This study presents EliIE, an OMOP CDM-based information extraction system for automatic structuring and formalization of free-text EC. According to our evaluation, machine learning-based EliIE outperforms existing systems and shows promise to improve.
Collapse
Affiliation(s)
- Tian Kang
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Shaodian Zhang
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Youlan Tang
- Institute of Human Nutrition, Columbia University, New York, NY, USA
| | - Gregory W Hruby
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Alexander Rusanov
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
7
|
Johnson SB. Clinical Research Informatics: Supporting the Research Study Lifecycle. Yearb Med Inform 2017; 26:193-200. [PMID: 29063565 PMCID: PMC6239240 DOI: 10.15265/iy-2017-022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Indexed: 12/27/2022] Open
Abstract
Objectives: The primary goal of this review is to summarize significant developments in the field of Clinical Research Informatics (CRI) over the years 2015-2016. The secondary goal is to contribute to a deeper understanding of CRI as a field, through the development of a strategy for searching and classifying CRI publications. Methods: A search strategy was developed to query the PubMed database, using medical subject headings to both select and exclude articles, and filtering publications by date and other characteristics. A manual review classified publications using stages in the "research study lifecycle", with key stages that include study definition, participant enrollment, data management, data analysis, and results dissemination. Results: The search strategy generated 510 publications. The manual classification identified 125 publications as relevant to CRI, which were classified into seven different stages of the research lifecycle, and one additional class that pertained to multiple stages, referring to general infrastructure or standards. Important cross-cutting themes included new applications of electronic media (Internet, social media, mobile devices), standardization of data and procedures, and increased automation through the use of data mining and big data methods. Conclusions: The review revealed increased interest and support for CRI in large-scale projects across institutions, regionally, nationally, and internationally. A search strategy based on medical subject headings can find many relevant papers, but a large number of non-relevant papers need to be detected using text words which pertain to closely related fields such as computational statistics and clinical informatics. The research lifecycle was useful as a classification scheme by highlighting the relevance to the users of clinical research informatics solutions.
Collapse
Affiliation(s)
- S. B. Johnson
- Healthcare Policy and Research, Weill Cornell Medicine, New York, USA
| |
Collapse
|
8
|
George TJ, Lipori G. Assessing the population representativeness of colorectal cancer treatment clinical trials. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2017; 2016:2970-2973. [PMID: 28268936 DOI: 10.1109/embc.2016.7591353] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
The generalizability (external validity) of clinical trials has long been a concern for both clinical research community as well as the general public. Results of trials that do not represent the target population may not be applicable to the broader patient population. In this study, we used a previously published metric Generalizability Index for Study Traits (GIST) to assess the population representativeness of colorectal cancer (CRC) treatment trials. Our analysis showed that the quantitative eligibility criteria of CRC trials are in general not restrictive. However, the qualitative eligibility criteria in these trials are with moderate or strict restrictions, which may impact their population representativeness of the real-world patient population.
Collapse
|
9
|
He Z, Gonzalez-Izquierdo A, Denaxas S, Sura A, Guo Y, Hogan WR, Shenkman E, Bian J. Comparing and Contrasting A Priori and A Posteriori Generalizability Assessment of Clinical Trials on Type 2 Diabetes Mellitus. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017; 2017:849-858. [PMID: 29854151 PMCID: PMC5977671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
Abstract
Clinical trials are indispensable tools for evidence-based medicine. However, they are often criticized for poor generalizability. Traditional trial generalizability assessment can only be done after the trial results are published, which compares the enrolled patients with a convenience sample of real-world patients. However, the proliferation of electronic data in clinical trial registries and clinical data warehouses offer a great opportunity to assess the generalizability during the design phase of a new trial. In this work, we compared and contrasted a priori (based on eligibility criteria) and a posteriori (based on enrolled patients) generalizability of Type 2 diabetes clinical trials. Further, we showed that comparing the study population selected by the clinical trial eligibility criteria to the real-world patient population is a good indicator of the generalizability of trials. Our findings demonstrate that the a priori generalizability of a trial is comparable to its a posteriori generalizability in identifying restrictive quantitative eligibility criteria.
Collapse
Affiliation(s)
- Zhe He
- Florida State University, Tallahassee, FL, USA
| | | | | | | | - Yi Guo
- University of Florida, Gainesville, FL, USA
| | | | | | - Jiang Bian
- University of Florida, Gainesville, FL, USA
| |
Collapse
|
10
|
Weng C, Kahn MG. Clinical Research Informatics for Big Data and Precision Medicine. Yearb Med Inform 2016:211-218. [PMID: 27830253 DOI: 10.15265/iy-2016-019] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
OBJECTIVES To reflect on the notable events and significant developments in Clinical Research Informatics (CRI) in the year of 2015 and discuss near-term trends impacting CRI. METHODS We selected key publications that highlight not only important recent advances in CRI but also notable events likely to have significant impact on CRI activities over the next few years or longer, and consulted the discussions in relevant scientific communities and an online living textbook for modern clinical trials. We also related the new concepts with old problems to improve the continuity of CRI research. RESULTS The highlights in CRI in 2015 include the growing adoption of electronic health records (EHR), the rapid development of regional, national, and global clinical data research networks for using EHR data to integrate scalable clinical research with clinical care and generate robust medical evidence. Data quality, integration, and fusion, data access by researchers, study transparency, results reproducibility, and infrastructure sustainability are persistent challenges. CONCLUSION The advances in Big Data Analytics and Internet technologies together with the engagement of citizens in sciences are shaping the global clinical research enterprise, which is getting more open and increasingly stakeholder-centered, where stakeholders include patients, clinicians, researchers, and sponsors.
Collapse
Affiliation(s)
- C Weng
- Chunhua Weng, PhD, FACMI, Department of Biomedical Informatics, Columbia University, 622 W 168 Street, PH-20, New York, NY 10032, USA, E-mail:
| | | |
Collapse
|
11
|
Atal I, Zeitoun JD, Névéol A, Ravaud P, Porcher R, Trinquart L. Automatic classification of registered clinical trials towards the Global Burden of Diseases taxonomy of diseases and injuries. BMC Bioinformatics 2016; 17:392. [PMID: 27659604 PMCID: PMC5034670 DOI: 10.1186/s12859-016-1247-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 09/08/2016] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Clinical trial registries may allow for producing a global mapping of health research. However, health conditions are not described with standardized taxonomies in registries. Previous work analyzed clinical trial registries to improve the retrieval of relevant clinical trials for patients. However, no previous work has classified clinical trials across diseases using a standardized taxonomy allowing a comparison between global health research and global burden across diseases. We developed a knowledge-based classifier of health conditions studied in registered clinical trials towards categories of diseases and injuries from the Global Burden of Diseases (GBD) 2010 study. The classifier relies on the UMLS® knowledge source (Unified Medical Language System®) and on heuristic algorithms for parsing data. It maps trial records to a 28-class grouping of the GBD categories by automatically extracting UMLS concepts from text fields and by projecting concepts between medical terminologies. The classifier allows deriving pathways between the clinical trial record and candidate GBD categories using natural language processing and links between knowledge sources, and selects the relevant GBD classification based on rules of prioritization across the pathways found. We compared automatic and manual classifications for an external test set of 2,763 trials. We automatically classified 109,603 interventional trials registered before February 2014 at WHO ICTRP. RESULTS In the external test set, the classifier identified the exact GBD categories for 78 % of the trials. It had very good performance for most of the 28 categories, especially "Neoplasms" (sensitivity 97.4 %, specificity 97.5 %). The sensitivity was moderate for trials not relevant to any GBD category (53 %) and low for trials of injuries (16 %). For the 109,603 trials registered at WHO ICTRP, the classifier did not assign any GBD category to 20.5 % of trials while the most common GBD categories were "Neoplasms" (22.8 %) and "Diabetes" (8.9 %). CONCLUSIONS We developed and validated a knowledge-based classifier allowing for automatically identifying the diseases studied in registered trials by using the taxonomy from the GBD 2010 study. This tool is freely available to the research community and can be used for large-scale public health studies.
Collapse
Affiliation(s)
- Ignacio Atal
- Centre d’Épidémiologie Clinique, Hôpital Hôtel-Dieu, Paris, France
- INSERM U1153, Paris, France
- Université Paris Descartes, Paris, France
| | - Jean-David Zeitoun
- Centre d’Épidémiologie Clinique, Hôpital Hôtel-Dieu, Paris, France
- INSERM U1153, Paris, France
- Université Paris Descartes, Paris, France
| | - Aurélie Névéol
- LIMSI, CNRS UPR 3251, Université Paris-Saclay, Orsay, France
| | - Philippe Ravaud
- Centre d’Épidémiologie Clinique, Hôpital Hôtel-Dieu, Paris, France
- INSERM U1153, Paris, France
- Université Paris Descartes, Paris, France
- Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY USA
| | - Raphaël Porcher
- Centre d’Épidémiologie Clinique, Hôpital Hôtel-Dieu, Paris, France
- INSERM U1153, Paris, France
- Université Paris Descartes, Paris, France
| | - Ludovic Trinquart
- Centre d’Épidémiologie Clinique, Hôpital Hôtel-Dieu, Paris, France
- INSERM U1153, Paris, France
- Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY USA
| |
Collapse
|
12
|
Weng C. Optimizing Clinical Research Participant Selection with Informatics. Trends Pharmacol Sci 2016; 36:706-709. [PMID: 26549161 DOI: 10.1016/j.tips.2015.08.007] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2015] [Accepted: 08/07/2015] [Indexed: 02/08/2023]
Abstract
Clinical research participants are often not reflective of real-world patients due to overly restrictive eligibility criteria. Meanwhile, unselected participants introduce confounding factors and reduce research efficiency. Biomedical informatics, especially Big Data increasingly made available from electronic health records, offers promising aids to optimize research participant selection through data-driven transparency.
Collapse
Affiliation(s)
- Chunhua Weng
- Department of Biomedical Informatics, Columbia University, 622 W 168 Street, PH-20, Room 407, New York, NY 10032, USA.
| |
Collapse
|
13
|
He Z, Ryan P, Hoxha J, Wang S, Carini S, Sim I, Weng C. Multivariate analysis of the population representativeness of related clinical studies. J Biomed Inform 2016; 60:66-76. [PMID: 26820188 PMCID: PMC4837055 DOI: 10.1016/j.jbi.2016.01.007] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2015] [Revised: 01/15/2016] [Accepted: 01/19/2016] [Indexed: 12/25/2022]
Abstract
OBJECTIVE To develop a multivariate method for quantifying the population representativeness across related clinical studies and a computational method for identifying and characterizing underrepresented subgroups in clinical studies. METHODS We extended a published metric named Generalizability Index for Study Traits (GIST) to include multiple study traits for quantifying the population representativeness of a set of related studies by assuming the independence and equal importance among all study traits. On this basis, we compared the effectiveness of GIST and multivariate GIST (mGIST) qualitatively. We further developed an algorithm called "Multivariate Underrepresented Subgroup Identification" (MAGIC) for constructing optimal combinations of distinct value intervals of multiple traits to define underrepresented subgroups in a set of related studies. Using Type 2 diabetes mellitus (T2DM) as an example, we identified and extracted frequently used quantitative eligibility criteria variables in a set of clinical studies. We profiled the T2DM target population using the National Health and Nutrition Examination Survey (NHANES) data. RESULTS According to the mGIST scores for four example variables, i.e., age, HbA1c, BMI, and gender, the included observational T2DM studies had superior population representativeness than the interventional T2DM studies. For the interventional T2DM studies, Phase I trials had better population representativeness than Phase III trials. People at least 65years old with HbA1c value between 5.7% and 7.2% were particularly underrepresented in the included T2DM trials. These results confirmed well-known knowledge and demonstrated the effectiveness of our methods in population representativeness assessment. CONCLUSIONS mGIST is effective at quantifying population representativeness of related clinical studies using multiple numeric study traits. MAGIC identifies underrepresented subgroups in clinical studies. Both data-driven methods can be used to improve the transparency of design bias in participation selection at the research community level.
Collapse
Affiliation(s)
- Zhe He
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
| | - Patrick Ryan
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA; Janssen Research and Development, Titusville, NJ 08560, USA; Observational Health Data Sciences and Informatics, New York, NY 10032, USA
| | - Julia Hoxha
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Shuang Wang
- Department of Biostatistics, Columbia University, New York, NY 10032, USA
| | - Simona Carini
- Department of Medicine, University of California, San Francisco, CA 94143, USA
| | - Ida Sim
- Department of Medicine, University of California, San Francisco, CA 94143, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA; Observational Health Data Sciences and Informatics, New York, NY 10032, USA
| |
Collapse
|
14
|
Hao T, Liu H, Weng C. Valx: A System for Extracting and Structuring Numeric Lab Test Comparison Statements from Text. Methods Inf Med 2016; 55:266-75. [PMID: 26940748 DOI: 10.3414/me15-01-0112] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Accepted: 02/07/2016] [Indexed: 01/08/2023]
Abstract
OBJECTIVES To develop an automated method for extracting and structuring numeric lab test comparison statements from text and evaluate the method using clinical trial eligibility criteria text. METHODS Leveraging semantic knowledge from the Unified Medical Language System (UMLS) and domain knowledge acquired from the Internet, Valx takes seven steps to extract and normalize numeric lab test expressions: 1) text preprocessing, 2) numeric, unit, and comparison operator extraction, 3) variable identification using hybrid knowledge, 4) variable - numeric association, 5) context-based association filtering, 6) measurement unit normalization, and 7) heuristic rule-based comparison statements verification. Our reference standard was the consensus-based annotation among three raters for all comparison statements for two variables, i.e., HbA1c and glucose, identified from all of Type 1 and Type 2 diabetes trials in ClinicalTrials.gov. RESULTS The precision, recall, and F-measure for structuring HbA1c comparison statements were 99.6%, 98.1%, 98.8% for Type 1 diabetes trials, and 98.8%, 96.9%, 97.8% for Type 2 diabetes trials, respectively. The precision, recall, and F-measure for structuring glucose comparison statements were 97.3%, 94.8%, 96.1% for Type 1 diabetes trials, and 92.3%, 92.3%, 92.3% for Type 2 diabetes trials, respectively. CONCLUSIONS Valx is effective at extracting and structuring free-text lab test comparison statements in clinical trial summaries. Future studies are warranted to test its generalizability beyond eligibility criteria text. The open-source Valx enables its further evaluation and continued improvement among the collaborative scientific community.
Collapse
Affiliation(s)
| | | | - Chunhua Weng
- Chunhua Weng, Ph.D., Department of Biomedical Informatics, Columbia University, New York City, 622 W 168th Street, PH-20, New York, NY 10032, USA, E-mail:
| |
Collapse
|
15
|
He Z, Chandar P, Ryan P, Weng C. Simulation-based Evaluation of the Generalizability Index for Study Traits. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2015; 2015:594-603. [PMID: 26958194 PMCID: PMC4765558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The Generalizability Index for Study Traits (GIST) has been proposed recently for assessing the population representativeness of a set of related clinical trials using eligibility features (e.g., age or BMI), one each time. However, GIST has not yet been evaluated. To bridge this knowledge gap, this paper reports a simulation-based validation study for GIST. Using the National Health and Nutrition Examination Survey (NHANES) data, we demonstrated the effectiveness of GIST at quantifying the population representativeness of a set of related trials that differ in disease domains, study phases, sponsor types, and study designs, respectively. We also showed that among seven example medical conditions, the GIST of age increases from Phase I trials to Phase III trials in the seven disease domains and is the lowest in asthma trials. We concluded that GIST correlates with simulation-based generalizability results and is a valid metric for quantifying population representativeness of related clinical trials.
Collapse
Affiliation(s)
- Zhe He
- Department of Biomedical Informatics, Columbia University, New York, NY USA
| | - Praveen Chandar
- Department of Biomedical Informatics, Columbia University, New York, NY USA
| | - Patrick Ryan
- Department of Biomedical Informatics, Columbia University, New York, NY USA; Janssen Research and Development, Titusville, NJ USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY USA
| |
Collapse
|
16
|
Milian K, Hoekstra R, Bucur A, ten Teije A, van Harmelen F, Paulissen J. Enhancing reuse of structured eligibility criteria and supporting their relaxation. J Biomed Inform 2015; 56:205-19. [DOI: 10.1016/j.jbi.2015.05.005] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2013] [Revised: 03/20/2015] [Accepted: 05/07/2015] [Indexed: 11/26/2022]
|