1
|
Waldrop AM, Cheadle JB, Bradford K, Preiss A, Chew R, Holt JR, Kebede Y, Braswell N, Watson M, Hench V, Crerar A, Ball CM, Schreep C, Linebaugh PJ, Hiles H, Boyles R, Bizon C, Krishnamurthy A, Cox S. Dug: a semantic search engine leveraging peer-reviewed knowledge to query biomedical data repositories. Bioinformatics 2022; 38:3252-3258. [PMID: 35441678 PMCID: PMC9991886 DOI: 10.1093/bioinformatics/btac284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 03/04/2022] [Accepted: 04/15/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION As the number of public data resources continues to proliferate, identifying relevant datasets across heterogenous repositories is becoming critical to answering scientific questions. To help researchers navigate this data landscape, we developed Dug: a semantic search tool for biomedical datasets utilizing evidence-based relationships from curated knowledge graphs to find relevant datasets and explain why those results are returned. RESULTS Developed through the National Heart, Lung and Blood Institute's (NHLBI) BioData Catalyst ecosystem, Dug has indexed more than 15 911 study variables from public datasets. On a manually curated search dataset, Dug's total recall (total relevant results/total results) of 0.79 outperformed default Elasticsearch's total recall of 0.76. When using synonyms or related concepts as search queries, Dug (0.36) far outperformed Elasticsearch (0.14) in terms of total recall with no significant loss in the precision of its top results. AVAILABILITY AND IMPLEMENTATION Dug is freely available at https://github.com/helxplatform/dug. An example Dug deployment is also available for use at https://search.biodatacatalyst.renci.org/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alexander M Waldrop
- Center for Genomics, Bioinformatics, and Translational Research, RTI International, Research Triangle Park, NC 27709-2194, USA
| | - John B Cheadle
- Research Computing Division, RTI International, Research Triangle Park, NC 27709-2194, USA
| | - Kira Bradford
- Renaissance Computing Institute, University of Chapel Hill, North Carolina, Chapel Hill, NC 27599-7568, USA
| | - Alexander Preiss
- Center for Data Science, RTI International, Research Triangle Park, NC 27709-2194, USA
| | - Robert Chew
- Center for Data Science, RTI International, Research Triangle Park, NC 27709-2194, USA
| | - Jonathan R Holt
- Center for Data Science, RTI International, Research Triangle Park, NC 27709-2194, USA
| | - Yaphet Kebede
- Renaissance Computing Institute, University of Chapel Hill, North Carolina, Chapel Hill, NC 27599-7568, USA
| | - Nathan Braswell
- Research Computing Division, RTI International, Research Triangle Park, NC 27709-2194, USA
| | - Matt Watson
- Renaissance Computing Institute, University of Chapel Hill, North Carolina, Chapel Hill, NC 27599-7568, USA
| | - Virginia Hench
- Center for Genomics, Bioinformatics, and Translational Research, RTI International, Research Triangle Park, NC 27709-2194, USA
| | - Andrew Crerar
- Center for Genomics, Bioinformatics, and Translational Research, RTI International, Research Triangle Park, NC 27709-2194, USA
| | - Chris M Ball
- Research Computing Division, RTI International, Research Triangle Park, NC 27709-2194, USA
| | - Carl Schreep
- Renaissance Computing Institute, University of Chapel Hill, North Carolina, Chapel Hill, NC 27599-7568, USA
| | - P J Linebaugh
- Renaissance Computing Institute, University of Chapel Hill, North Carolina, Chapel Hill, NC 27599-7568, USA
| | - Hannah Hiles
- Renaissance Computing Institute, University of Chapel Hill, North Carolina, Chapel Hill, NC 27599-7568, USA
| | - Rebecca Boyles
- Research Computing Division, RTI International, Research Triangle Park, NC 27709-2194, USA
| | - Chris Bizon
- Renaissance Computing Institute, University of Chapel Hill, North Carolina, Chapel Hill, NC 27599-7568, USA
| | - Ashok Krishnamurthy
- Renaissance Computing Institute, University of Chapel Hill, North Carolina, Chapel Hill, NC 27599-7568, USA.,Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7548, USA
| | - Steve Cox
- Renaissance Computing Institute, University of Chapel Hill, North Carolina, Chapel Hill, NC 27599-7568, USA
| |
Collapse
|
2
|
Jiang L, Chen S, Beals J, Siddique J, Hamman RF, Bullock A, Manson SM. Evaluating Community-Based Translational Interventions Using Historical Controls: Propensity Score vs. Disease Risk Score Approach. PREVENTION SCIENCE : THE OFFICIAL JOURNAL OF THE SOCIETY FOR PREVENTION RESEARCH 2019; 20:598-608. [PMID: 30747394 PMCID: PMC6520136 DOI: 10.1007/s11121-019-0980-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Many community-based translations of evidence-based interventions are designed as one-arm studies due to ethical and other considerations. Evaluating the impacts of such programs is challenging. Here, we examine the effectiveness of the lifestyle intervention implemented by the Special Diabetes Program for Indians Diabetes Prevention (SDPI-DP) demonstration project, a translational lifestyle intervention among American Indian and Alaska Native communities. Data from the landmark Diabetes Prevention Program placebo group was used as a historical control. We compared the use of propensity score (PS) and disease risk score (DRS) matching to adjust for potential confounder imbalance between groups. The unadjusted hazard ratio (HR) for diabetes risk was 0.35 for SDPI-DP lifestyle intervention vs. control. However, when relevant diabetes risk factors were considered, the adjusted HR estimates were attenuated toward 1, ranging from 0.56 (95% CI 0.44-0.71) to 0.69 (95% CI 0.56-0.96). The differences in estimated HRs using the PS and DRS approaches were relatively small but DRS matching resulted in more participants being matched and smaller standard errors of effect estimates. Carefully employed, publicly available randomized clinical trial data can be used as a historical control to evaluate the intervention effectiveness of one-arm community translational initiatives. It is critical to use a proper statistical method to balance the distributions of potential confounders between comparison groups in this kind of evaluations.
Collapse
Affiliation(s)
- Luohua Jiang
- Department of Epidemiology, School of Medicine, University of California Irvine, Irvine, CA, 92697-7550, USA.
| | - Shuai Chen
- Division of Biostatistics, Department of Public Health Sciences, University of California Davis, Davis, CA, USA
| | - Janette Beals
- Centers for American Indian and Alaska Native Health, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Juned Siddique
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Richard F Hamman
- Department of Epidemiology, Colorado School of Public Health, LEAD Center, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Ann Bullock
- Division of Diabetes Treatment and Prevention, Indian Health Service, Rockville, MD, USA
| | - Spero M Manson
- Centers for American Indian and Alaska Native Health, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
3
|
Vlahou A. Implementation of Clinical Proteomics: A Step Closer to Personalized Medicine? Proteomics Clin Appl 2018; 13:e1800088. [DOI: 10.1002/prca.201800088] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Revised: 11/23/2018] [Indexed: 01/19/2023]
Affiliation(s)
- Antonia Vlahou
- Biomedical Research FoundationAcademy of Athens Soranou Efessiou 4 11527 Athens Greece
| |
Collapse
|
4
|
Shou H, Hsu JY, Xie D, Yang W, Roy J, Anderson AH, Landis JR, Feldman HI, Parsa A, Jepson C. Analytic Considerations for Repeated Measures of eGFR in Cohort Studies of CKD. Clin J Am Soc Nephrol 2017; 12:1357-1365. [PMID: 28751576 PMCID: PMC5544518 DOI: 10.2215/cjn.11311116] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Repeated measures of various biomarkers provide opportunities for us to enhance understanding of many important clinical aspects of CKD, including patterns of disease progression, rates of kidney function decline under different risk factors, and the degree of heterogeneity in disease manifestations across patients. However, because of unique features, such as correlations across visits and time dependency, these data must be appropriately handled using longitudinal data analysis methods. We provide a general overview of the characteristics of data collected in cohort studies and compare appropriate statistical methods for the analysis of longitudinal exposures and outcomes. We use examples from the Chronic Renal Insufficiency Cohort Study to illustrate these methods. More specifically, we model longitudinal kidney outcomes over annual clinical visits and assess the association with both baseline and longitudinal risk factors.
Collapse
Affiliation(s)
- Haochang Shou
- Department of Biostatistics, Epidemiology and Informatics and
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Jesse Y. Hsu
- Department of Biostatistics, Epidemiology and Informatics and
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Dawei Xie
- Department of Biostatistics, Epidemiology and Informatics and
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Wei Yang
- Department of Biostatistics, Epidemiology and Informatics and
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Jason Roy
- Department of Biostatistics, Epidemiology and Informatics and
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Amanda H. Anderson
- Department of Biostatistics, Epidemiology and Informatics and
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - J. Richard Landis
- Department of Biostatistics, Epidemiology and Informatics and
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Harold I. Feldman
- Department of Biostatistics, Epidemiology and Informatics and
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Afshin Parsa
- Department of Medicine, Division of Nephrology, University of Maryland School of Medicine, Baltimore, Maryland; and
- Department of Medicine, Baltimore Veterans Affairs Medical Center, Baltimore, Maryland
| | - Christopher Jepson
- Department of Biostatistics, Epidemiology and Informatics and
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| |
Collapse
|
5
|
Hopkins C, Sydes M, Murray G, Woolfall K, Clarke M, Williamson P, Tudur Smith C. UK publicly funded Clinical Trials Units supported a controlled access approach to share individual participant data but highlighted concerns. J Clin Epidemiol 2016; 70:17-25. [PMID: 26169841 PMCID: PMC4742521 DOI: 10.1016/j.jclinepi.2015.07.002] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Revised: 05/22/2015] [Accepted: 07/06/2015] [Indexed: 02/08/2023]
Abstract
OBJECTIVES Evaluate current data sharing activities of UK publicly funded Clinical Trial Units (CTUs) and identify good practices and barriers. STUDY DESIGN AND SETTING Web-based survey of Directors of 45 UK Clinical Research Collaboration (UKCRC)-registered CTUs. RESULTS Twenty-three (51%) CTUs responded: Five (22%) of these had an established data sharing policy and eight (35%) specifically requested consent to use patient data beyond the scope of the original trial. Fifteen (65%) CTUs had received requests for data, and seven (30%) had made external requests for data in the previous 12 months. CTUs supported the need for increased data sharing activities although concerns were raised about patient identification, misuse of data, and financial burden. Custodianship of clinical trial data and requirements for a CTU to align its policy to their parent institutes were also raised. No CTUs supported the use of an open access model for data sharing. CONCLUSION There is support within the publicly funded UKCRC-registered CTUs for data sharing, but many perceived barriers remain. CTUs are currently using a variety of approaches and procedures for sharing data. This survey has informed further work, including development of guidance for publicly funded CTUs, to promote good practice and facilitate data sharing.
Collapse
Affiliation(s)
- Carolyn Hopkins
- MRC North West Hub for Trials Methodology Research, Department of Biostatistics, University of Liverpool, Block F Waterhouse Building, 1-5 Brownlow Street, Liverpool, L69 3GL, UK
| | - Matthew Sydes
- MRC Clinical Trials Unit, University College London, Aviation House, 125 Kingsway, London, WC2B 6NH, UK
| | - Gordon Murray
- Centre for Population Health Sciences, University of Edinburgh, Teviot Place, Edinburgh, EH8 9AG, UK
| | - Kerry Woolfall
- MRC North West Hub for Trials Methodology Research, Department of Psychological Sciences, Block B Waterhouse Building, Brownlow Street, Liverpool L69 3GL, UK
| | - Mike Clarke
- All-Ireland Hub for Trials Methodology Research, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Health Sciences Building, 97 Lisburn Road, Belfast, BT9 7BL, UK
| | - Paula Williamson
- MRC North West Hub for Trials Methodology Research, Department of Biostatistics, University of Liverpool, Block F Waterhouse Building, 1-5 Brownlow Street, Liverpool, L69 3GL, UK
| | - Catrin Tudur Smith
- MRC North West Hub for Trials Methodology Research, Department of Biostatistics, University of Liverpool, Block F Waterhouse Building, 1-5 Brownlow Street, Liverpool, L69 3GL, UK.
| |
Collapse
|
6
|
Kozminski MA, Wei JT, Nelson J, Kent DM. Baseline characteristics predict risk of progression and response to combined medical therapy for benign prostatic hyperplasia (BPH). BJU Int 2014; 115:308-16. [PMID: 24825577 DOI: 10.1111/bju.12802] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
OBJECTIVE To better risk stratify patients, using baseline characteristics, to help optimise decision-making for men with moderate-to-severe lower urinary tract symptoms (LUTS) secondary to benign prostatic hyperplasia (BPH) through a secondary analysis of the Medical Therapy of Prostatic Symptoms (MTOPS) trial. PATIENTS AND METHODS After review of the literature, we identified potential baseline risk factors for BPH progression. Using bivariate tests in a secondary analysis of MTOPS data, we determined which variables retained prognostic significance. We then used these factors in Cox proportional hazard modelling to: i) more comprehensively risk stratify the study population based on pre-treatment parameters and ii) to determine which risk strata stood to benefit most from medical intervention. RESULTS In all, 3047 men were followed in MTOPS for a mean of 4.5 years. We found varying risks of progression across quartiles. Baseline BPH Impact Index score, post-void residual urine volume, serum prostate-specific antigen (PSA) level, age, American Urological Association Symptom Index score, and maximum urinary flow rate were found to significantly correlate with overall BPH progression in multivariable analysis. CONCLUSIONS Using baseline factors permits estimation of individual patient risk for clinical progression and the benefits of medical therapy. A novel clinical decision tool based on these analyses will allow clinicians to weigh patient-specific benefits against possible risks of adverse effects for a given patient.
Collapse
Affiliation(s)
- Michael A Kozminski
- Department of Urology, University of Michigan Medical School, Ann Arbor, MI, USA
| | | | | | | |
Collapse
|
7
|
Pan H, Ardini MA, Bakalov V, DeLatte M, Eggers P, Ganapathi L, Hollingsworth CR, Levy J, Li S, Pratt J, Pugh N, Qin Y, Rasooly R, Ray H, Richardson JE, Flynn Riley A, Rogers SM, Tan S, Turner CF, White S, Cooley PC. 'What's in the NIDDK CDR?'--public query tools for the NIDDK central data repository. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bas058. [PMID: 23396299 PMCID: PMC3625049 DOI: 10.1093/database/bas058] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The National Institute of Diabetes and Digestive Disease (NIDDK) Central Data Repository (CDR) is a web-enabled resource available to researchers and the general public. The CDR warehouses clinical data and study documentation from NIDDK funded research, including such landmark studies as The Diabetes Control and Complications Trial (DCCT, 1983–93) and the Epidemiology of Diabetes Interventions and Complications (EDIC, 1994–present) follow-up study which has been ongoing for more than 20 years. The CDR also houses data from over 7 million biospecimens representing 2 million subjects. To help users explore the vast amount of data stored in the NIDDK CDR, we developed a suite of search mechanisms called the public query tools (PQTs). Five individual tools are available to search data from multiple perspectives: study search, basic search, ontology search, variable summary and sample by condition. PQT enables users to search for information across studies. Users can search for data such as number of subjects, types of biospecimens and disease outcome variables without prior knowledge of the individual studies. This suite of tools will increase the use and maximize the value of the NIDDK data and biospecimen repositories as important resources for the research community. Database URL:https://www.niddkrepository.org/niddk/home.do
Collapse
Affiliation(s)
- Huaqin Pan
- RTI International, Social, Statistical and Environmental Sciences, PO Box 12194, Research Triangle Park, NC 27709, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Prokosch HU, Beck A, Ganslandt T, Hummel M, Kiehntopf M, Sax U, Uckert F, Semler S. IT Infrastructure Components for Biobanking. Appl Clin Inform 2010; 1:419-29. [PMID: 23616851 DOI: 10.4338/aci-2010-05-ra-0034] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2010] [Accepted: 08/26/2010] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVE Within translational research projects in the recent years large biobanks have been established, mostly supported by homegrown, proprietary software solutions. No general requirements for biobanking IT infrastructures have been published yet. This paper presents an exemplary biobanking IT architecture, a requirements specification for a biorepository management tool and exemplary illustrations of three major types of requirements. METHODS We have pursued a comprehensive literature review for biobanking IT solutions and established an interdisciplinary expert panel for creating the requirements specification. The exemplary illustrations were derived from a requirements analysis within two university hospitals. RESULTS The requirements specification comprises a catalog with more than 130 detailed requirements grouped into 3 major categories and 20 subcategories. Special attention is given to multitenancy capabilities in order to support the project-specific definition of varying research and bio-banking contexts, the definition of workflows to track sample processing, sample transportation and sample storage and the automated integration of preanalytic handling and storage robots. CONCLUSION IT support for biobanking projects can be based on a federated architectural framework comprising primary data sources for clinical annotations, a pseudonymization service, a clinical data warehouse with a flexible and user-friendly query interface and a biorepository management system. Flexibility and scalability of all such components are vital since large medical facilities such as university hospitals will have to support biobanking for varying monocentric and multicentric research scenarios and multiple medical clients.
Collapse
Affiliation(s)
- H U Prokosch
- Chair of Medical Informatics, University of Erlangen-Nuremberg , Germany
| | | | | | | | | | | | | | | |
Collapse
|
9
|
West DS, Elaine Prewitt T, Bursac Z, Felix HC. Weight loss of black, white, and Hispanic men and women in the Diabetes Prevention Program. Obesity (Silver Spring) 2008; 16:1413-20. [PMID: 18421273 DOI: 10.1038/oby.2008.224] [Citation(s) in RCA: 226] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
OBJECTIVE To provide the specific weight loss outcomes for African-American, Hispanic, and white men and women in the lifestyle and metformin treatment arms of the Diabetes Prevention Program (DPP) by race-gender group to facilitate researchers translating similar interventions to minority populations, as well as provide realistic weight loss expectations for clinicians. METHODS AND PROCEDURES Secondary analyses of weight loss of 2,921 overweight participants (22% black; 17% Hispanic; 61% white; and 68% women) with impaired glucose tolerance randomized in the DPP to intensive lifestyle modification, metformin or placebo. Data over a 30-month period are examined for comparability across treatment arms by race and gender. RESULTS Within lifestyle treatment, all race-gender groups lost comparable amounts of weight with the exception of black women who exhibited significantly smaller weight losses (P < 0.01). For example, at 12 months, weight losses for white men (-8.4%), white women (-8.1%), Hispanic men (-7.8%), Hispanic women (-7.1%), and black men (-7.1%) were similar and significantly higher than black women (-4.5%). In contrast, within metformin treatment, all race-gender groups including black women lost similar amounts of weight. Race-gender specific mean weight loss data are provided by treatment arm for each follow-up period. DISCUSSION Diminished weight losses were apparent among black women in comparison with other race-gender groups in a lifestyle intervention but not metformin, underscoring the critical nature of examining sociocultural and environmental contributors to successful lifestyle intervention for black women.
Collapse
Affiliation(s)
- Delia S West
- Department of Health Behavior and Health Education, University of Arkansas for Medical Sciences, Fay W. Boozman College of Public Health, Little Rock, Arkansas, USA.
| | | | | | | |
Collapse
|