1
|
Ernst MA, Draghi BN, Cimino JJ, Patel VL, Zhou Y, Shubrook JH, De Lacalle S, Weaver A, Liu C, Jing X. The quality of data-driven hypotheses generated by inexperienced clinical researchers: A case study. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.08.12.24311877. [PMID: 39185523 PMCID: PMC11343241 DOI: 10.1101/2024.08.12.24311877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Objectives We invited inexperienced clinical researchers to analyze coded health datasets and develop hypotheses. We recorded and analyzed their hypothesis generation process. All the hypotheses generated in the process were rated by the same group of seven experts by using the same metrics. This case study examines the higher quality (i.e., higher ratings) and lower quality of hypotheses and participants who generated them. We characterized the contextual factors associated with the quality of hypotheses. Methods All participants (i.e., clinical researchers) completed a 2-hour study session to analyze data and generate scientific hypotheses using the think-aloud method. Participants' screen activity and audio were recorded and transcribed. These transcriptions were used to measure the time used to generate each hypothesis and to code cognitive events (i.e., cognitive activities used when generating hypotheses, for example, "Seeking for Connection" describes an attempt to draw connections between data points). The hypothesis ratings by the expert panel were used as the quality of the hypotheses during the analysis. We analyzed the factors associated with (1) the five highest and (2) five lowest rated hypotheses and (3) the participants who generated them, including the number of hypotheses per participant, the validity of those hypotheses, the number of cognitive events used for each hypothesis, as well as the participant's research experience and basic demographics. Results Participants who generated the five highest-rated hypotheses used similar lengths of time (difference 3:03), whereas those who generated the five lowest-rated hypotheses used more varying lengths of time (difference 7:13). Participants who generated the five highest-rated hypotheses also utilized slightly fewer cognitive events on average compared to the five lowest-rated hypotheses (4 per hypothesis vs. 4.8 per hypothesis). When we examine the participants (who generated the five highest and five lowest hypotheses) and their total hypotheses generated during the 2-hour study sessions, the participants with the five highest-rated hypotheses again had a shorter range of time per hypothesis on average (0:03:34 vs. 0:07:17). They (with the five highest ratings) used fewer cognitive events per hypothesis (3.498 vs. 4.626). They (with the five highest ratings) also had a higher percentage of valid rate (75.51% vs. 63.63%) and generally had more experience with clinical research. Conclusion The quality of the hypotheses was shown to be associated with the time taken to generate them, where too long or too short time to generate hypotheses appears to be negatively associated with the hypotheses' quality ratings. Also, having more experience seems to positively correlate with higher ratings of hypotheses and higher valid rates. Validity is a quality dimension used by the expert panel during rating. However, we acknowledge that our results are anecdotal. The effect may not be simply linear, and future research is necessary. These results underscore the multi-factor nature of hypothesis generation.
Collapse
Affiliation(s)
- Mytchell A. Ernst
- Department of Public Health Sciences, Clemson University, Clemson, SC
| | - Brooke N. Draghi
- Department of Public Health Sciences, Clemson University, Clemson, SC
| | - James J. Cimino
- Department of Biomedical Informatics and Data Science, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL
| | - Vimla L. Patel
- Cognitive Studies in Medicine and Public Health, The New York Academy of Medicine, New York City, NY
| | - Yuchun Zhou
- Patton College of Education, Ohio University, Athens, OH
| | - Jay H. Shubrook
- Department of Clinical Sciences and Community Health, Touro University California College of Osteopathic Medicine, Vallejo, CA
| | - Sonsoles De Lacalle
- Department of Health Science, California State University Channel Islands, Camarillo, CA
| | - Aneesa Weaver
- Department of Public Health Sciences, Clemson University, Clemson, SC
| | - Chang Liu
- Russ College of Engineering and Technology, Ohio University, Athens, OH
| | - Xia Jing
- Department of Public Health Sciences, Clemson University, Clemson, SC
| |
Collapse
|
2
|
Jing X, Cimino JJ, Patel VL, Zhou Y, Shubrook JH, Liu C, De Lacalle S. Data-Driven Hypothesis Generation in Clinical Research: What We Learned from a Human Subject Study? MEDICAL RESEARCH ARCHIVES 2024; 12:10.18103/mra.v12i2.5132. [PMID: 39211055 PMCID: PMC11361316 DOI: 10.18103/mra.v12i2.5132] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
Hypothesis generation is an early and critical step in any hypothesis-driven clinical research project. Because it is not yet a well-understood cognitive process, the need to improve the process goes unrecognized. Without an impactful hypothesis, the significance of any research project can be questionable, regardless of the rigor or diligence applied in other steps of the study, e.g., study design, data collection, and result analysis. In this perspective article, the authors provide a literature review on the following topics first: scientific thinking, reasoning, medical reasoning, literature-based discovery, and a field study to explore scientific thinking and discovery. Over the years, scientific thinking has shown excellent progress in cognitive science and its applied areas: education, medicine, and biomedical research. However, a review of the literature reveals the lack of original studies on hypothesis generation in clinical research. The authors then summarize their first human participant study exploring data-driven hypothesis generation by clinical researchers in a simulated setting. The results indicate that a secondary data analytical tool, VIADS-a visual interactive analytic tool for filtering, summarizing, and visualizing large health data sets coded with hierarchical terminologies, can shorten the time participants need, on average, to generate a hypothesis and also requires fewer cognitive events to generate each hypothesis. As a counterpoint, this exploration also indicates that the quality ratings of the hypotheses thus generated carry significantly lower ratings for feasibility when applying VIADS. Despite its small scale, the study confirmed the feasibility of conducting a human participant study directly to explore the hypothesis generation process in clinical research. This study provides supporting evidence to conduct a larger-scale study with a specifically designed tool to facilitate the hypothesis-generation process among inexperienced clinical researchers. A larger study could provide generalizable evidence, which in turn can potentially improve clinical research productivity and overall clinical research enterprise.
Collapse
Affiliation(s)
- Xia Jing
- Department of Public Health Sciences, College of Behavioral, Social and Health Sciences, Clemson University, Clemson, SC
| | - James J. Cimino
- Informatics Institute, School of Medicine, University of Alabama, Birmingham, Birmingham, AL
| | - Vimla L. Patel
- Cognitive Studies in Medicine and Public Health, The New York Academy of Medicine, New York City, NY
| | - Yuchun Zhou
- Department of Educational Studies, Patton College of Education, Ohio University, Athens, OH
| | - Jay H. Shubrook
- Department of Clinical Sciences and Community Health, Touro University California College of Osteopathic Medicine, Vallejo, CA
| | - Chang Liu
- Department of Electrical Engineering and Computer Science, Russ College of Engineering and Technology, Ohio University, Athens, OH
| | - Sonsoles De Lacalle
- Department of Health Science, California State University Channel Islands, Camarillo, CA
| |
Collapse
|
3
|
Jing X, Cimino JJ, Patel VL, Zhou Y, Shubrook JH, De Lacalle S, Draghi BN, Ernst MA, Weaver A, Sekar S, Liu C. Data-driven hypothesis generation among inexperienced clinical researchers: A comparison of secondary data analyses with visualization (VIADS) and other tools. J Clin Transl Sci 2024; 8:e13. [PMID: 38384898 PMCID: PMC10880005 DOI: 10.1017/cts.2023.708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 11/21/2023] [Accepted: 12/20/2023] [Indexed: 02/23/2024] Open
Abstract
Objectives To compare how clinical researchers generate data-driven hypotheses with a visual interactive analytic tool (VIADS, a visual interactive analysis tool for filtering and summarizing large datasets coded with hierarchical terminologies) or other tools. Methods We recruited clinical researchers and separated them into "experienced" and "inexperienced" groups. Participants were randomly assigned to a VIADS or control group within the groups. Each participant conducted a remote 2-hour study session for hypothesis generation with the same study facilitator on the same datasets by following a think-aloud protocol. Screen activities and audio were recorded, transcribed, coded, and analyzed. Hypotheses were evaluated by seven experts on their validity, significance, and feasibility. We conducted multilevel random effect modeling for statistical tests. Results Eighteen participants generated 227 hypotheses, of which 147 (65%) were valid. The VIADS and control groups generated a similar number of hypotheses. The VIADS group took a significantly shorter time to generate one hypothesis (e.g., among inexperienced clinical researchers, 258 s versus 379 s, p = 0.046, power = 0.437, ICC = 0.15). The VIADS group received significantly lower ratings than the control group on feasibility and the combination rating of validity, significance, and feasibility. Conclusion The role of VIADS in hypothesis generation seems inconclusive. The VIADS group took a significantly shorter time to generate each hypothesis. However, the combined validity, significance, and feasibility ratings of their hypotheses were significantly lower. Further characterization of hypotheses, including specifics on how they might be improved, could guide future tool development.
Collapse
Affiliation(s)
- Xia Jing
- Department of Public Health Sciences, College of Behavioral, Social and Health Sciences, Clemson University, Clemson, SC, USA
| | - James J. Cimino
- Informatics Institute, School of Medicine, University of Alabama, Birmingham, AL, USA
| | - Vimla L. Patel
- Cognitive Studies in Medicine and Public Health, The New York Academy of Medicine, New York City, NY, USA
| | - Yuchun Zhou
- Department of Educational Studies, The Patton College of Education, Ohio University, Athens, OH, USA
| | - Jay H. Shubrook
- Department of Clinical Sciences and Community Health, College of Osteopathic Medicine, Touro University California, Vallejo, CA, USA
| | - Sonsoles De Lacalle
- Department of Health Science, California State University Channel Islands, Camarillo, CA, USA
| | - Brooke N. Draghi
- Department of Public Health Sciences, College of Behavioral, Social and Health Sciences, Clemson University, Clemson, SC, USA
| | - Mytchell A. Ernst
- Department of Public Health Sciences, College of Behavioral, Social and Health Sciences, Clemson University, Clemson, SC, USA
| | - Aneesa Weaver
- Department of Public Health Sciences, College of Behavioral, Social and Health Sciences, Clemson University, Clemson, SC, USA
| | - Shriram Sekar
- Electrical Engineering and Computer Science, Russ College of Engineering and Technology, Ohio University, Athens, OH, USA
| | - Chang Liu
- Russ College of Engineering and Technology, Ohio University, Athens, OH, USA
| |
Collapse
|
4
|
Jing X, Cimino JJ, Patel VL, Zhou Y, Shubrook JH, De Lacalle S, Draghi BN, Ernst MA, Weaver A, Sekar S, Liu C. Data-driven hypothesis generation among inexperienced clinical researchers: A comparison of secondary data analyses with visualization (VIADS) and other tools. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.05.30.23290719. [PMID: 37333271 PMCID: PMC10274969 DOI: 10.1101/2023.05.30.23290719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Objectives To compare how clinical researchers generate data-driven hypotheses with a visual interactive analytic tool (VIADS, a visual interactive analysis tool for filtering and summarizing large data sets coded with hierarchical terminologies) or other tools. Methods We recruited clinical researchers and separated them into "experienced" and "inexperienced" groups. Participants were randomly assigned to a VIADS or control group within the groups. Each participant conducted a remote 2-hour study session for hypothesis generation with the same study facilitator on the same datasets by following a think-aloud protocol. Screen activities and audio were recorded, transcribed, coded, and analyzed. Hypotheses were evaluated by seven experts on their validity, significance, and feasibility. We conducted multilevel random effect modeling for statistical tests. Results Eighteen participants generated 227 hypotheses, of which 147 (65%) were valid. The VIADS and control groups generated a similar number of hypotheses. The VIADS group took a significantly shorter time to generate one hypothesis (e.g., among inexperienced clinical researchers, 258 seconds versus 379 seconds, p = 0.046, power = 0.437, ICC = 0.15). The VIADS group received significantly lower ratings than the control group on feasibility and the combination rating of validity, significance, and feasibility. Conclusion The role of VIADS in hypothesis generation seems inconclusive. The VIADS group took a significantly shorter time to generate each hypothesis. However, the combined validity, significance, and feasibility ratings of their hypotheses were significantly lower. Further characterization of hypotheses, including specifics on how they might be improved, could guide future tool development.
Collapse
Affiliation(s)
- Xia Jing
- Department of Public Health Sciences, Clemson University, Clemson, SC
| | - James J Cimino
- Informatics Institute, School of Medicine, University of Alabama, Birmingham, Birmingham, AL
| | - Vimla L Patel
- Cognitive Studies in Medicine and Public Health, The New York Academy of Medicine, New York City, NY
| | - Yuchun Zhou
- Patton College of Education, Ohio University, Athens, OH
| | - Jay H Shubrook
- College of Osteopathic Medicine, Touro University, Vallejo, CA
| | - Sonsoles De Lacalle
- Department of Health Science, California State University Channel Islands, Camarillo, CA
| | - Brooke N Draghi
- Department of Public Health Sciences, Clemson University, Clemson, SC
| | - Mytchell A Ernst
- Department of Public Health Sciences, Clemson University, Clemson, SC
| | - Aneesa Weaver
- Department of Public Health Sciences, Clemson University, Clemson, SC
| | - Shriram Sekar
- Schoole of Computing, Clemson University, Clemson, SC
| | - Chang Liu
- Russ College of Engineering and Technology, Ohio University, Athens, OH
| |
Collapse
|
5
|
Jing X, Draghi BN, Ernst MA, Patel VL, Cimino JJ, Shubrook JH, Zhou Y, Liu C, De Lacalle S. How do clinical researchers generate data-driven scientific hypotheses? Cognitive events using think-aloud protocol. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.10.31.23297860. [PMID: 37961555 PMCID: PMC10635246 DOI: 10.1101/2023.10.31.23297860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Objectives This study aims to identify the cognitive events related to information use (e.g., "Analyze data", "Seek connection") during hypothesis generation among clinical researchers. Specifically, we describe hypothesis generation using cognitive event counts and compare them between groups. Methods The participants used the same datasets, followed the same scripts, used VIADS (a visual interactive analysis tool for filtering and summarizing large data sets coded with hierarchical terminologies) or other analytical tools (as control) to analyze the datasets, and came up with hypotheses while following the think-aloud protocol. Their screen activities and audio were recorded and then transcribed and coded for cognitive events. Results The VIADS group exhibited the lowest mean number of cognitive events per hypothesis and the smallest standard deviation. The experienced clinical researchers had approximately 10% more valid hypotheses than the inexperienced group. The VIADS users among the inexperienced clinical researchers exhibit a similar trend as the experienced clinical researchers in terms of the number of cognitive events and their respective percentages out of all the cognitive events. The highest percentages of cognitive events in hypothesis generation were "Using analysis results" (30%) and "Seeking connections" (23%). Conclusion VIADS helped inexperienced clinical researchers use fewer cognitive events to generate hypotheses than the control group. This suggests that VIADS may guide participants to be more structured during hypothesis generation compared with the control group. The results provide evidence to explain the shorter average time needed by the VIADS group in generating each hypothesis.
Collapse
Affiliation(s)
- Xia Jing
- Department of Public Health Sciences, Clemson University, Clemson, SC
| | - Brooke N Draghi
- Department of Public Health Sciences, Clemson University, Clemson, SC
| | - Mytchell A Ernst
- Department of Public Health Sciences, Clemson University, Clemson, SC
| | - Vimla L Patel
- Cognitive Studies in Medicine and Public Health, The New York Academy of Medicine, New York City, NY
| | - James J Cimino
- Informatics Institute, School of Medicine, University of Alabama, Birmingham, Birmingham, AL
| | - Jay H Shubrook
- College of Osteopathic Medicine, Touro University, Vallejo, CA
| | - Yuchun Zhou
- Patton College of Education, Ohio University, Athens, OH
| | - Chang Liu
- Russ College of Engineering and Technology, Ohio University, Athens, OH
| | - Sonsoles De Lacalle
- Department of Health Science, California State University Channel Islands, Camarillo, CA
| |
Collapse
|
6
|
Jing X, Zhou Y, Cimino JJ, Shubrook JH, Patel VL, De Lacalle S, Weaver A, Liu C. Development, validation, and usage of metrics to evaluate the quality of clinical research hypotheses. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.01.17.23284666. [PMID: 36711561 PMCID: PMC9882446 DOI: 10.1101/2023.01.17.23284666] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Objectives Metrics and instruments can provide guidance for clinical researchers to assess their potential research projects at an early stage before significant investment. Furthermore, metrics can also provide structured criteria for peer reviewers to assess others' clinical research manuscripts or grant proposals. This study aimed to develop, test, validate, and use evaluation metrics and instruments to accurately, consistently, and conveniently assess the quality of scientific hypotheses for clinical research projects. Materials and Methods Metrics development went through iterative stages, including literature review, metrics and instrument development, internal and external testing and validation, and continuous revisions in each stage based on feedback. Furthermore, two experiments were conducted to determine brief and comprehensive versions of the instrument. Results The brief version of the instrument contained three dimensions: validity, significance, and feasibility. The comprehensive version of metrics included novelty, clinical relevance, potential benefits and risks, ethicality, testability, clarity, interestingness, and the three dimensions of the brief version. Each evaluation dimension included 2 to 5 subitems to evaluate the specific aspects of each dimension. For example, validity included clinical validity and scientific validity. The brief and comprehensive versions of the instruments included 12 and 39 subitems, respectively. Each subitem used a 5-point Likert scale. Conclusion The validated brief and comprehensive versions of metrics can provide standardized, consistent, and generic measurements for clinical research hypotheses, allow clinical researchers to prioritize their research ideas systematically, objectively, and consistently, and can be used as a tool for quality assessment during the peer review process.
Collapse
Affiliation(s)
- Xia Jing
- College of Behavioral, Social, and Health Sciences, Clemson University, Clemson, South Carolina, USA
| | - Yuchun Zhou
- Patton College of Education, Ohio University, Athens, Ohio, USA
| | - James J. Cimino
- Informatics Institute, School of Medicine, University of Alabama, Birmingham, Alabama, USA
| | - Jay H. Shubrook
- College of Osteopathic Medicine, Touro University, Vallejo, California, USA
| | - Vimla L. Patel
- The New York Academy of Medicine, New York, New York, USA
| | - Sonsoles De Lacalle
- College of Art and Science, California State University Channel Islands, Camarillo, California, USA
| | - Aneesa Weaver
- College of Behavioral, Social, and Health Sciences, Clemson University, Clemson, South Carolina, USA
| | - Chang Liu
- Russ College of Engineering and Technology, Ohio University, Athens, Ohio, USA
| |
Collapse
|
7
|
Jing X, Patel VL, Cimino JJ, Shubrook JH, Zhou Y, Draghi BN, Ernst MA, Liu C, De Lacalle S. A visual analytic tool, VIADS, to assist the hypothesis generation process in clinical research—A usability study using mixed methods (Preprint). JMIR Hum Factors 2022; 10:e44644. [PMID: 37011112 PMCID: PMC10176142 DOI: 10.2196/44644] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 03/08/2023] [Accepted: 03/30/2023] [Indexed: 04/03/2023] Open
Abstract
BACKGROUND Visualization can be a powerful tool for comprehending datasets, especially when they can be represented via hierarchical structures. Enhanced comprehension can facilitate the development of scientific hypotheses. However, the inclusion of excessive data can make a visualization overwhelming. OBJECTIVE We developed a Visual Interactive Analytic tool for filtering and summarizing large health Data Sets (VIADS) coded with hierarchical terminologies. In this study, we evaluated the usability of VIADS for visualizing data sets of patient diagnoses and procedures coded in the International Classification of Diseases, ninth revisions, clinical modification (ICD-9-CM). METHODS We used mixed methods in the study. A group of 12 clinical researchers participated in the generation of data-driven hypotheses using the same datasets and time frame (a 1-hour training session and a 2-hour study session), utilizing VIADS via the think-aloud protocol. The audio and screen activities were recorded remotely. A modified version of the System Usability Scale (SUS) survey and a brief survey with open-ended questions were administered after the study to assess the usability of VIADS and verify their intense usage experience of VIADS. RESULTS The range of SUS scores was 37.5 - 87.5. The mean SUS score for VIADS was 71.88 (out of a possible 100, standard deviation: 14.62 ), and the median SUS was 75. The participants unanimously agreed that VIADS offers new perspectives on data sets (100%), while 75% agreed that VIADS facilitates understanding, presentation, and interpretation of underlying datasets. The comments on the utility of VIADS were positive and aligned well with the design objectives of VIADS. The answers to the open-ended questions in the modified SUS provided specific suggestions regarding potential improvements in VIADS, and identified problems in usability were used to update the tool. CONCLUSIONS This usability study demonstrates that VIADS is a usable tool for analyzing secondary datasets with good average usability, SUS score, and favorable utility. Currently, VIADS accepts datasets with hierarchical codes and their corresponding frequencies. Consequently, only specific types of use cases are supported by the analytical results. Participants agreed, however, that VIADS provides new perspectives on datasets and is relatively easy to use. The functionalities mostly appreciated by participants were VIADS' ability to filter, summarize, compare, and visualize data. CLINICALTRIAL INTERNATIONAL REGISTERED REPORT RR2-10.2196/39414.
Collapse
Affiliation(s)
- Xia Jing
- Department of Public Health Sciences, College of Behavioral, Social and Health Sciences, Clemson University, Clemson, SC, United States
| | - Vimla L Patel
- Cognitive Studies in Medicine and Public Health, The New York Academy of Medicine, New York, NY, United States
| | - James J Cimino
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Jay H Shubrook
- Primary Care Department, College of Osteopathic Medicine, Touro University, Vallejo, CA, United States
| | - Yuchun Zhou
- Department of Educational Studies, The Patton College of Education, Ohio University, Athens, OH, United States
| | - Brooke N Draghi
- Department of Public Health Sciences, College of Behavioral, Social and Health Sciences, Clemson University, Clemson, SC, United States
| | - Mytchell A Ernst
- Department of Public Health Sciences, College of Behavioral, Social and Health Sciences, Clemson University, Clemson, SC, United States
| | - Chang Liu
- Electrical Engineering and Computer Science, Russ College of Engineering and Technology, Ohio University, Athens, OH, United States
| | - Sonsoles De Lacalle
- Health Science Program, California State University Channel Islands, Camarillo, CA, United States
| |
Collapse
|