1
|
Nourani A, Ayatollahi H, Solaymani Dodaran M. Data management in diabetes clinical trials: a qualitative study. Trials 2022; 23:187. [PMID: 35241149 PMCID: PMC8895796 DOI: 10.1186/s13063-022-06110-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Accepted: 02/15/2022] [Indexed: 11/16/2022] Open
Abstract
Background Clinical trials play an important role in expanding the knowledge of diabetes prevention, diagnosis, and treatment, and data management is one of the main issues in clinical trials. Lack of appropriate planning for data management in clinical trials may negatively influence achieving the desired results. The aim of this study was to explore data management processes in diabetes clinical trials in three research institutes in Iran. Method This was a qualitative study conducted in 2019. In this study, data were collected through in-depth semi-structured interviews with 16 researchers in three endocrinology and metabolism research institutes. To analyze data, the method of thematic analysis was used. Results The five themes that emerged from data analysis included (1) clinical trial data collection, (2) technologies used in data management, (3) data security and confidentiality management, (4) data quality management, and (5) data management standards. In general, the findings indicated that no clear and standard process was used for data management in diabetes clinical trials, and each research center executed its own methods and processes. Conclusion According to the results, the common methods of data management in diabetes clinical trials included a set of paper-based processes. It seems that using information technology can help facilitate data management processes in a variety of clinical trials, including diabetes clinical trials.
Collapse
Affiliation(s)
- Aynaz Nourani
- Department of Health Information Technology, Urmia University of Medical Sciences, Urmia, Iran
| | - Haleh Ayatollahi
- Health Management and Economics Research Center, Health Management Research Institute, Iran University of Medical Sciences, Tehran, Iran. .,Department of Health Information Management, School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran.
| | | |
Collapse
|
2
|
McKenzie KA, Hunt SL, Hulshof G, Mudaranthakam DP, Meyer K, Vidoni ED, Burns JM, Mahnken JD. A semi-automated pipeline for fulfillment of resource requests from a longitudinal Alzheimer's disease registry. JAMIA Open 2019; 2:516-520. [PMID: 32025648 PMCID: PMC6993996 DOI: 10.1093/jamiaopen/ooz032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 06/21/2019] [Accepted: 07/22/2019] [Indexed: 12/22/2022] Open
Abstract
Objective Managing registries with continual data collection poses challenges, such as following reproducible research protocols and guaranteeing data accessibility. The University of Kansas (KU) Alzheimer’s Disease Center (ADC) maintains one such registry: Curated Clinical Cohort Phenotypes and Observations (C3PO). We created an automated and reproducible process by which investigators have access to C3PO data. Materials and Methods Data was input into Research Electronic Data Capture. Monthly, data part of the Uniform Data Set (UDS), that is data also collected at other ADCs, was uploaded to the National Alzheimer’s Coordinating Center (NACC). Quarterly, NACC cleaned, curated, and returned the UDS to the KU Data Management and Statistics (DMS) Core, where it was stored in C3PO with other quarterly curated site-specific data. Investigators seeking to utilize C3PO submitted a research proposal and requested variables via the publicly accessible and searchable data dictionary. The DMS Core used this variable list and an automated SAS program to create a subset of C3PO. Results C3PO contained 1913 variables stored in 15 datasets. From 2017 to 2018, 38 data requests were completed for several KU departments and other research institutions. Completing data requests became more efficient; C3PO subsets were produced in under 10 seconds. Discussion The data management strategy outlined above facilitated reproducible research practices, which is fundamental to the future of research as it allows replication and verification to occur. Conclusion We created a transparent, automated, and efficient process of extracting subsets of data from a registry where data was changing daily.
Collapse
Affiliation(s)
- Katelyn A McKenzie
- Department of Biostatistics, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Suzanne L Hunt
- Department of Biostatistics, University of Kansas Medical Center, Kansas City, Kansas, USA.,University of Kansas Alzheimer's Disease Center, Fairway, Kansas, USA
| | - Genevieve Hulshof
- Department of Biostatistics, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Dinesh Pal Mudaranthakam
- Department of Biostatistics, University of Kansas Medical Center, Kansas City, Kansas, USA.,University of Kansas Alzheimer's Disease Center, Fairway, Kansas, USA
| | - Kayla Meyer
- University of Kansas Alzheimer's Disease Center, Fairway, Kansas, USA
| | - Eric D Vidoni
- University of Kansas Alzheimer's Disease Center, Fairway, Kansas, USA.,Department of Neurology, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Jeffrey M Burns
- University of Kansas Alzheimer's Disease Center, Fairway, Kansas, USA.,Department of Neurology, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Jonathan D Mahnken
- Department of Biostatistics, University of Kansas Medical Center, Kansas City, Kansas, USA.,University of Kansas Alzheimer's Disease Center, Fairway, Kansas, USA
| |
Collapse
|
3
|
Mukherjee P, Leroy G, Kauchak D. Using Lexical Chains to Identify Text Difficulty: A Corpus Statistics and Classification Study. IEEE J Biomed Health Inform 2018; 23:2164-2173. [PMID: 30530380 DOI: 10.1109/jbhi.2018.2885465] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Our goal is data-driven discovery of features for text simplification. In this paper, we investigate three types of lexical chains: exact, synonymous, and semantic. A lexical chain links semantically related words in a document. We examine their potential with a document-level corpus statistics study (914 texts) to estimate their overall capacity to differentiate between easy and difficult text and a classification task (11 000 sentences) to determine usefulness of features at sentence-level for simplification. For the corpus statistics study we tested five document-level features for each chain type: total number of chains, average chain length, average chain span, number of crossing chains, and the number of chains longer than half the document length. We found significant differences between easy and difficult text for average chain length and the average number of cross chains. For the sentence classification study, we compared the lexical chain features to standard bag-of-words features on a range of classifiers: logistic regression, naïve Bayes, decision trees, linear and RBF kernel SVM, and random forest. The lexical chain features performed significantly better than the bag-of-words baseline across all classifiers with the best classifier achieving an accuracy of ∼90% (compared to 78% for bag-of-words). Overall, we find several lexical chain features provide specific information useful for identifying difficult sentences of text, beyond what is available from standard lexical features.
Collapse
|
4
|
Johnson SB. Clinical Research Informatics: Supporting the Research Study Lifecycle. Yearb Med Inform 2017; 26:193-200. [PMID: 29063565 PMCID: PMC6239240 DOI: 10.15265/iy-2017-022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Indexed: 12/27/2022] Open
Abstract
Objectives: The primary goal of this review is to summarize significant developments in the field of Clinical Research Informatics (CRI) over the years 2015-2016. The secondary goal is to contribute to a deeper understanding of CRI as a field, through the development of a strategy for searching and classifying CRI publications. Methods: A search strategy was developed to query the PubMed database, using medical subject headings to both select and exclude articles, and filtering publications by date and other characteristics. A manual review classified publications using stages in the "research study lifecycle", with key stages that include study definition, participant enrollment, data management, data analysis, and results dissemination. Results: The search strategy generated 510 publications. The manual classification identified 125 publications as relevant to CRI, which were classified into seven different stages of the research lifecycle, and one additional class that pertained to multiple stages, referring to general infrastructure or standards. Important cross-cutting themes included new applications of electronic media (Internet, social media, mobile devices), standardization of data and procedures, and increased automation through the use of data mining and big data methods. Conclusions: The review revealed increased interest and support for CRI in large-scale projects across institutions, regionally, nationally, and internationally. A search strategy based on medical subject headings can find many relevant papers, but a large number of non-relevant papers need to be detected using text words which pertain to closely related fields such as computational statistics and clinical informatics. The research lifecycle was useful as a classification scheme by highlighting the relevance to the users of clinical research informatics solutions.
Collapse
Affiliation(s)
- S. B. Johnson
- Healthcare Policy and Research, Weill Cornell Medicine, New York, USA
| |
Collapse
|
5
|
Mukherjee P, Leroy G, Kauchak D, Rajanarayanan S, Romero Diaz DY, Yuan NP, Pritchard TG, Colina S. NegAIT: A new parser for medical text simplification using morphological, sentential and double negation. J Biomed Inform 2017; 69:55-62. [PMID: 28342946 PMCID: PMC5933936 DOI: 10.1016/j.jbi.2017.03.014] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Revised: 12/17/2016] [Accepted: 03/20/2017] [Indexed: 12/22/2022]
Abstract
Many different text features influence text readability and content comprehension. Negation is commonly suggested as one such feature, but few general-purpose tools exist to discover negation and studies of the impact of negation on text readability are rare. In this paper, we introduce a new negation parser (NegAIT) for detecting morphological, sentential, and double negation. We evaluated the parser using a human annotated gold standard containing 500 Wikipedia sentences and achieved 95%, 89% and 67% precision with 100%, 80%, and 67% recall, respectively. We also investigate two applications of this new negation parser. First, we performed a corpus statistics study to demonstrate different negation usage in easy and difficult text. Negation usage was compared in six corpora: patient blogs (4K sentences), Cochrane reviews (91K sentences), PubMed abstracts (20K sentences), clinical trial texts (48K sentences), and English and Simple English Wikipedia articles for different medical topics (60K and 6K sentences). The most difficult text contained the least negation. However, when comparing negation types, difficult texts (i.e., Cochrane, PubMed, English Wikipedia and clinical trials) contained significantly (p<0.01) more morphological negations. Second, we conducted a predictive analytics study to show the importance of negation in distinguishing between easy and difficulty text. Five binary classifiers (Naïve Bayes, SVM, decision tree, logistic regression and linear regression) were trained using only negation information. All classifiers achieved better performance than the majority baseline. The Naïve Bayes' classifier achieved the highest accuracy at 77% (9% higher than the majority baseline).
Collapse
Affiliation(s)
| | - Gondy Leroy
- University of Arizona, Tucson, AZ, United States
| | | | | | | | | | | | - Sonia Colina
- University of Arizona, Tucson, AZ, United States
| |
Collapse
|