1
|
Schauberger G, Klug SJ, Berger M. Random forests for the analysis of matched case-control studies. BMC Bioinformatics 2024; 25:253. [PMID: 39090608 PMCID: PMC11292918 DOI: 10.1186/s12859-024-05877-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 07/22/2024] [Indexed: 08/04/2024] Open
Abstract
BACKGROUND Conditional logistic regression trees have been proposed as a flexible alternative to the standard method of conditional logistic regression for the analysis of matched case-control studies. While they allow to avoid the strict assumption of linearity and automatically incorporate interactions, conditional logistic regression trees may suffer from a relatively high variability. Further machine learning methods for the analysis of matched case-control studies are missing because conventional machine learning methods cannot handle the matched structure of the data. RESULTS A random forest method for the analysis of matched case-control studies based on conditional logistic regression trees is proposed, which overcomes the issue of high variability. It provides an accurate estimation of exposure effects while being more flexible in the functional form of covariate effects. The efficacy of the method is illustrated in a simulation study and within an application to real-world data from a matched case-control study on the effect of regular participation in cervical cancer screening on the development of cervical cancer. CONCLUSIONS The proposed random forest method is a promising add-on to the toolbox for the analysis of matched case-control studies and addresses the need for machine-learning methods in this field. It provides a more flexible approach compared to the standard method of conditional logistic regression, but also compared to conditional logistic regression trees. It allows for non-linearity and the automatic inclusion of interaction effects and is suitable both for exploratory and explanatory analyses.
Collapse
Affiliation(s)
- Gunther Schauberger
- Chair of Epidemiology, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany.
| | - Stefanie J Klug
- Chair of Epidemiology, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
| | - Moritz Berger
- Institute of Medical Biometry, Informatics and Epidemiology, Faculty of Medicine, University of Bonn, Bonn, Germany
| |
Collapse
|
2
|
Djordjilović V, Ponzi E, Nøst TH, Thoresen M. penalizedclr: an R package for penalized conditional logistic regression for integration of multiple omics layers. BMC Bioinformatics 2024; 25:226. [PMID: 38937668 PMCID: PMC11212437 DOI: 10.1186/s12859-024-05850-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 06/20/2024] [Indexed: 06/29/2024] Open
Abstract
BACKGROUND The matched case-control design, up until recently mostly pertinent to epidemiological studies, is becoming customary in biomedical applications as well. For instance, in omics studies, it is quite common to compare cancer and healthy tissue from the same patient. Furthermore, researchers today routinely collect data from various and variable sources that they wish to relate to the case-control status. This highlights the need to develop and implement statistical methods that can take these tendencies into account. RESULTS We present an R package penalizedclr, that provides an implementation of the penalized conditional logistic regression model for analyzing matched case-control studies. It allows for different penalties for different blocks of covariates, and it is therefore particularly useful in the presence of multi-source omics data. Both L1 and L2 penalties are implemented. Additionally, the package implements stability selection for variable selection in the considered regression model. CONCLUSIONS The proposed method fills a gap in the available software for fitting high-dimensional conditional logistic regression models accounting for the matched design and block structure of predictors/features. The output consists of a set of selected variables that are significantly associated with case-control status. These variables can then be investigated in terms of functional interpretation or validation in further, more targeted studies.
Collapse
Affiliation(s)
- Vera Djordjilović
- Department of Economics, Ca' Foscari University of Venice, Venice, Italy.
- Department of Biostatistics, University of Oslo, Oslo, Norway.
| | - Erica Ponzi
- Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Therese Haugdahl Nøst
- Department of Public Health and Nursing, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Community Medicine, Faculty of Health Sciences, The Arctic University of Norway, Tromsø, Norway
| | - Magne Thoresen
- Department of Biostatistics, University of Oslo, Oslo, Norway
| |
Collapse
|
3
|
Uematsu T, Kawakami Y, Nojiri S, Saito T, Irie Y, Kasai T, Hiratsuka Y, Ishijima M, Kuroki M, Daida H, Nishizaki Y. Association between number of medications and hip fractures in Japanese elderly using conditional logistic LASSO regression. Sci Rep 2023; 13:16831. [PMID: 37803071 PMCID: PMC10558461 DOI: 10.1038/s41598-023-43876-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 09/29/2023] [Indexed: 10/08/2023] Open
Abstract
To examine the association between hip fracture and associated factors, including polypharmacy, and develop an optimal predictive model, we conducted a population-based matched case-control study using the health insurance claims data on hip fracture among Japanese patients. We included 34,717 hospitalized Japanese patients aged ≥ 65 years with hip fracture and 34,717 age- and sex- matched controls who were matched 1:1. This study included 69,434 participants. Overall, 16 variable comorbidities and 60 variable concomitant medications were used as explanatory variables. The participants were added to early elderly and late elderly categories for further analysis. The odds ratio of hip fracture increased with the number of medications only in the early elderly. AUC was highest for early elderly (AUC, 0.74, 95% CI 0.72-0.76). Use of anti-Parkinson's drugs had the largest coefficient and was the most influential variable in many categories. This study confirmed the association between risk factors, including polypharmacy and hip fracture. The risk of hip fracture increased with an increase in medication number taken by the early elderly and showed good predictive accuracy, whereas there was no such association in the late elderly. Therefore, the early elderly in Japan should be an active target population for hip fracture prevention.
Collapse
Affiliation(s)
- Takuya Uematsu
- Clinical Translational Science, Juntendo University School of Medicine Graduate School of Medicine, 2-1-1 Hongo, Bunkyo-ku, Tokyo, 113-8421, Japan
- Department of Hospital Pharmacy, Juntendo University Hospital, Tokyo, Japan
| | - Yuta Kawakami
- Clinical Research and Trial Center, Juntendo University, Tokyo, Japan
- Graduate School of Engineering Science, Yokohama National University, Kanagawa, Japan
| | - Shuko Nojiri
- Clinical Translational Science, Juntendo University School of Medicine Graduate School of Medicine, 2-1-1 Hongo, Bunkyo-ku, Tokyo, 113-8421, Japan.
- Medical Technology Innovation Center, Juntendo University, Tokyo, Japan.
| | - Tomoyuki Saito
- Medical Technology Innovation Center, Juntendo University, Tokyo, Japan
| | - Yoshiki Irie
- Clinical Research and Trial Center, Juntendo University, Tokyo, Japan
- Graduate School of Engineering Science, Tokyo University of Science, Tokyo, Japan
| | - Takatoshi Kasai
- Department of Cardiology, Juntendo University School of Medicine Graduate School of Medicine, Tokyo, Japan
| | - Yoshimune Hiratsuka
- Department of Ophthalmology, Juntendo University School of Medicine Graduate School of Medicine, Tokyo, Japan
| | - Muneaki Ishijima
- Department of Medicine for Orthopedics and Motor Organ, Juntendo University School of Medicine Graduate School of Medicine, Tokyo, Japan
| | - Manabu Kuroki
- Graduate School of Engineering Science, Yokohama National University, Kanagawa, Japan
| | - Hiroyuki Daida
- Department of Cardiology, Juntendo University School of Medicine Graduate School of Medicine, Tokyo, Japan
| | - Yuji Nishizaki
- Clinical Translational Science, Juntendo University School of Medicine Graduate School of Medicine, 2-1-1 Hongo, Bunkyo-ku, Tokyo, 113-8421, Japan
- Medical Technology Innovation Center, Juntendo University, Tokyo, Japan
| |
Collapse
|
4
|
Nakayasu ES, Bramer LM, Ansong C, Schepmoes AA, Fillmore TL, Gritsenko MA, Clauss TR, Gao Y, Piehowski PD, Stanfill BA, Engel DW, Orton DJ, Moore RJ, Qian WJ, Sechi S, Frohnert BI, Toppari J, Ziegler AG, Lernmark Å, Hagopian W, Akolkar B, Smith RD, Rewers MJ, Webb-Robertson BJM, Metz TO. Plasma protein biomarkers predict the development of persistent autoantibodies and type 1 diabetes 6 months prior to the onset of autoimmunity. Cell Rep Med 2023; 4:101093. [PMID: 37390828 PMCID: PMC10394168 DOI: 10.1016/j.xcrm.2023.101093] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 04/14/2023] [Accepted: 06/01/2023] [Indexed: 07/02/2023]
Abstract
Type 1 diabetes (T1D) results from autoimmune destruction of β cells. Insufficient availability of biomarkers represents a significant gap in understanding the disease cause and progression. We conduct blinded, two-phase case-control plasma proteomics on the TEDDY study to identify biomarkers predictive of T1D development. Untargeted proteomics of 2,252 samples from 184 individuals identify 376 regulated proteins, showing alteration of complement, inflammatory signaling, and metabolic proteins even prior to autoimmunity onset. Extracellular matrix and antigen presentation proteins are differentially regulated in individuals who progress to T1D vs. those that remain in autoimmunity. Targeted proteomics measurements of 167 proteins in 6,426 samples from 990 individuals validate 83 biomarkers. A machine learning analysis predicts if individuals would remain in autoimmunity or develop T1D 6 months before autoantibody appearance, with areas under receiver operating characteristic curves of 0.871 and 0.918, respectively. Our study identifies and validates biomarkers, highlighting pathways affected during T1D development.
Collapse
Affiliation(s)
- Ernesto S Nakayasu
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Lisa M Bramer
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Charles Ansong
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Athena A Schepmoes
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Thomas L Fillmore
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Marina A Gritsenko
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Therese R Clauss
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Yuqian Gao
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Paul D Piehowski
- Environmental and Molecular Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Bryan A Stanfill
- Computational Analytics Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Dave W Engel
- Computational Analytics Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Daniel J Orton
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Ronald J Moore
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Wei-Jun Qian
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Salvatore Sechi
- National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, USA
| | | | - Jorma Toppari
- Department of Pediatrics, Turku University Hospital, Turku, Finland; Institute of Biomedicine, Research Centre for Integrative Physiology and Pharmacology and Centre for Population Health Research, University of Turku, Turku, Finland
| | - Anette-G Ziegler
- Institute of Diabetes Research, Helmholtz Zentrum München, Munich, Germany; Forschergruppe Diabetes, Technical University of Munich, Klinikum Rechts der Isar, Munich, Germany; Forschergruppe Diabetes e.V. at Helmholtz Zentrum München, Munich, Germany
| | - Åke Lernmark
- Unit for Diabetes and Celiac Disease, Wallenberg/CRC, Department of Clinical Sciences, Lund University/CRC, Skåne University Hospital SUS, 21428 Malmö, Sweden
| | | | - Beena Akolkar
- National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Richard D Smith
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Marian J Rewers
- Barbara Davis Center for Diabetes, University of Colorado, Aurora, CO, USA
| | | | - Thomas O Metz
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA.
| |
Collapse
|
5
|
Schauberger G, Tanaka LF, Berger M. A tree-based modeling approach for matched case-control studies. Stat Med 2023; 42:676-692. [PMID: 36631256 DOI: 10.1002/sim.9637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 10/10/2022] [Accepted: 12/14/2022] [Indexed: 01/13/2023]
Abstract
Conditional logistic regression (CLR) is the indisputable standard method for the analysis of matched case-control studies. However, CLR is strongly restricted with respect to the inclusion of non-linear effects and interactions of confounding variables. A novel tree-based modeling method is proposed which accounts for this issue and provides a flexible framework allowing for a more complex confounding structure. The proposed machine learning model is fitted within the framework of CLR and, therefore, allows to account for the matched strata in the data. A simulation study demonstrates the efficacy of the method. Furthermore, for illustration the method is applied to a matched case-control study on cervical cancer.
Collapse
Affiliation(s)
- Gunther Schauberger
- Chair of Epidemiology, Department of Sport and Health Sciences, Technical University of Munich, Munich, Germany
| | - Luana Fiengo Tanaka
- Chair of Epidemiology, Department of Sport and Health Sciences, Technical University of Munich, Munich, Germany
| | - Moritz Berger
- Institute of Biomedical Statistics, Computer Science and Epidemiology, University of Bonn, Bonn, Germany
| |
Collapse
|
6
|
Ballout N, Garcia C, Viallon V. Sparse estimation for case-control studies with multiple disease subtypes. Biostatistics 2021; 22:738-755. [PMID: 31977036 DOI: 10.1093/biostatistics/kxz063] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Revised: 12/13/2019] [Accepted: 12/16/2019] [Indexed: 11/15/2022] Open
Abstract
The analysis of case-control studies with several disease subtypes is increasingly common, e.g. in cancer epidemiology. For matched designs, a natural strategy is based on a stratified conditional logistic regression model. Then, to account for the potential homogeneity among disease subtypes, we adapt the ideas of data shared lasso, which has been recently proposed for the estimation of stratified regression models. For unmatched designs, we compare two standard methods based on $L_1$-norm penalized multinomial logistic regression. We describe formal connections between these two approaches, from which practical guidance can be derived. We show that one of these approaches, which is based on a symmetric formulation of the multinomial logistic regression model, actually reduces to a data shared lasso version of the other. Consequently, the relative performance of the two approaches critically depends on the level of homogeneity that exists among disease subtypes: more precisely, when homogeneity is moderate to high, the non-symmetric formulation with controls as the reference is not recommended. Empirical results obtained from synthetic data are presented, which confirm the benefit of properly accounting for potential homogeneity under both matched and unmatched designs, in terms of estimation and prediction accuracy, variable selection and identification of heterogeneities. We also present preliminary results from the analysis of a case-control study nested within the EPIC (European Prospective Investigation into Cancer and nutrition) cohort, where the objective is to identify metabolites associated with the occurrence of subtypes of breast cancer.
Collapse
Affiliation(s)
- Nadim Ballout
- IFSTTAR, TS2, UMRESTTE, Université Claude Bernard Lyon 1, 25, avenue François Mitterrand, Case24, Cité des mobilités, 69675 Bron Cedex, France
| | - Cedric Garcia
- IFSTTAR, AME, DEST, 14-20 Boulevard Newton, Cité Descartes, Champs sur Marne, 77447 Marne la Vallée Cedex 2, France
| | - Vivian Viallon
- Nutritional Methodology and Biostatistics Group, International Agency for Research on Cancer, World Health Organization, 150, Cours Albert Thomas, 69372 Lyon Cedex 08, France
| |
Collapse
|
7
|
Zeleznik OA, Balasubramanian R, Zhao Y, Frueh L, Jeanfavre S, Avila-Pacheco J, Clish CB, Tworoger SS, Eliassen AH. Circulating amino acids and amino acid-related metabolites and risk of breast cancer among predominantly premenopausal women. NPJ Breast Cancer 2021; 7:54. [PMID: 34006878 PMCID: PMC8131633 DOI: 10.1038/s41523-021-00262-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 04/15/2021] [Indexed: 02/03/2023] Open
Abstract
Known modifiable risk factors account for a small fraction of premenopausal breast cancers. We investigated associations between pre-diagnostic circulating amino acid and amino acid-related metabolites (N = 207) and risk of breast cancer among predominantly premenopausal women of the Nurses' Health Study II using conditional logistic regression (1057 cases, 1057 controls) and multivariable analyses evaluating all metabolites jointly. Eleven metabolites were associated with breast cancer risk (q-value < 0.2). Seven metabolites remained associated after adjustment for established risk factors (p-value < 0.05) and were selected by at least one multivariable modeling approach: higher levels of 2-aminohippuric acid, kynurenic acid, piperine (all three with q-value < 0.2), DMGV and phenylacetylglutamine were associated with lower breast cancer risk (e.g., piperine: ORadjusted (95%CI) = 0.84 (0.77-0.92)) while higher levels of creatine and C40:7 phosphatidylethanolamine (PE) plasmalogen were associated with increased breast cancer risk (e.g., C40:7 PE plasmalogen: ORadjusted (95%CI) = 1.11 (1.01-1.22)). Five amino acids and amino acid-related metabolites (2-aminohippuric acid, DMGV, kynurenic acid, phenylacetylglutamine, and piperine) were inversely associated, while one amino acid and a phospholipid (creatine and C40:7 PE plasmalogen) were positively associated with breast cancer risk among predominately premenopausal women, independent of established breast cancer risk factors.
Collapse
Affiliation(s)
- Oana A Zeleznik
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| | - Raji Balasubramanian
- Department of Biostatistics & Epidemiology, University of Massachusetts - Amherst, Amherst, MA, USA
| | - Yibai Zhao
- Department of Biostatistics & Epidemiology, University of Massachusetts - Amherst, Amherst, MA, USA
| | - Lisa Frueh
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Sarah Jeanfavre
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| | - Julian Avila-Pacheco
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| | - Clary B Clish
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| | - Shelley S Tworoger
- Department of Cancer Epidemiology, Moffitt Cancer Center, Tampa, FL, USA
| | - A Heather Eliassen
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
8
|
Kalina J, Matonoha C. A sparse pair-preserving centroid-based supervised learning method for high-dimensional biomedical data or images. Biocybern Biomed Eng 2020. [DOI: 10.1016/j.bbe.2020.03.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
9
|
Lei Y, Shu HK, Tian S, Wang T, Liu T, Mao H, Shim H, Curran WJ, Yang X. Pseudo CT Estimation using Patch-based Joint Dictionary Learning. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2019; 2018:5150-5153. [PMID: 30441499 DOI: 10.1109/embc.2018.8513475] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Magnetic resonance (MR) simulators have recently gained popularity; it avoids the unnecessary radiation exposure associated with Computed Tomography (CT) when used for radiation therapy planning. We propose a method for pseudo CT estimation from MR images based on joint dictionary learning. Patient-specific anatomical features were extracted from the aligned training images and adopted as signatures for each voxel. The most relevant and informative features were identified to train the joint dictionary learning-based model. The well-trained dictionary was used to predict the pseudo CT of a new patient. This prediction technique was validated with a clinical study of 12 patients with MR and CT images of the brain. The mean absolute error (MAE), peak signal-to-noise ratio (PSNR), normalized cross correlation (NCC) indexes were used to quantify the prediction accuracy. We compared our proposed method with a state-of-the-art dictionary learning method. Overall our proposed method significantly improves the prediction accuracy over the state-of-the-art dictionary learning method. We have investigated a novel joint dictionary Iearning- based approach to predict CT images from routine MRIs and demonstrated its reliability. This CT prediction technique could be a useful tool for MRI-based radiation treatment planning or attenuation correction for quantifying PET images for PET/MR imaging.
Collapse
|
10
|
Stanfill B, Reehl S, Bramer L, Nakayasu ES, Rich SS, Metz TO, Rewers M, Webb-Robertson BJ. Extending Classification Algorithms to Case-Control Studies. Biomed Eng Comput Biol 2019; 10:1179597219858954. [PMID: 31320812 PMCID: PMC6630079 DOI: 10.1177/1179597219858954] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2019] [Accepted: 04/26/2019] [Indexed: 12/16/2022] Open
Abstract
Classification is a common technique applied to 'omics data to build predictive models and identify potential markers of biomedical outcomes. Despite the prevalence of case-control studies, the number of classification methods available to analyze data generated by such studies is extremely limited. Conditional logistic regression is the most commonly used technique, but the associated modeling assumptions limit its ability to identify a large class of sufficiently complicated 'omic signatures. We propose a data preprocessing step which generalizes and makes any linear or nonlinear classification algorithm, even those typically not appropriate for matched design data, available to be used to model case-control data and identify relevant biomarkers in these study designs. We demonstrate on simulated case-control data that both the classification and variable selection accuracy of each method is improved after applying this processing step and that the proposed methods are comparable to or outperform existing variable selection methods. Finally, we demonstrate the impact of conditional classification algorithms on a large cohort study of children with islet autoimmunity.
Collapse
Affiliation(s)
- Bryan Stanfill
- Computing and Analytics Division, National Security Directorate, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Sarah Reehl
- Computing and Analytics Division, National Security Directorate, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Lisa Bramer
- Computing and Analytics Division, National Security Directorate, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Ernesto S Nakayasu
- Biological Sciences Division, Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Thomas O Metz
- Biological Sciences Division, Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Marian Rewers
- Barbara Davis Center for Childhood Diabetes, University of Colorado Denver, Aurora, CO, USA
| | - Bobbie-Jo Webb-Robertson
- Biological Sciences Division, Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, USA
| | | |
Collapse
|
11
|
Lei Y, Tang X, Higgins K, Lin J, Jeong J, Liu T, Dhabaan A, Wang T, Dong X, Press R, Curran WJ, Yang X. Learning-based CBCT correction using alternating random forest based on auto-context model. Med Phys 2018; 46:601-618. [PMID: 30471129 DOI: 10.1002/mp.13295] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2017] [Revised: 10/17/2018] [Accepted: 11/12/2018] [Indexed: 11/11/2022] Open
Abstract
PURPOSE Quantitative Cone Beam CT (CBCT) imaging is increasing in demand for precise image-guided radiotherapy because it provides a foundation for advanced image-guided techniques, including accurate treatment setup, online tumor delineation, and patient dose calculation. However, CBCT is currently limited only to patient setup in the clinic because of the severe issues in its image quality. In this study, we develop a learning-based approach to improve CBCT's image quality for extended clinical applications. MATERIALS AND METHODS An auto-context model is integrated into a machine learning framework to iteratively generate corrected CBCT (CCBCT) with high-image quality. The first step is data preprocessing for the built training dataset, in which uninformative image regions are removed, noise is reduced, and CT and CBCT images are aligned. After a CBCT image is divided into a set of patches, the most informative and salient anatomical features are extracted to train random forests. Within each patch, alternating RF is applied to create a CCBCT patch as the output. Moreover, an iterative refinement strategy is exercised to enhance the image quality of CCBCT. Then, all the CCBCT patches are integrated to reconstruct final CCBCT images. RESULTS The learning-based CBCT correction algorithm was evaluated using the leave-one-out cross-validation method applied on a cohort of 12 patients' brain data and 14 patients' pelvis data. The mean absolute error (MAE), peak signal-to-noise ratio (PSNR), normalized cross-correlation (NCC) indexes, and spatial nonuniformity (SNU) in the selected regions of interest (ROIs) were used to quantify the proposed algorithm's correction accuracy and generat the following results: mean MAE = 12.81 ± 2.04 and 19.94 ± 5.44 HU, mean PSNR = 40.22 ± 3.70 and 31.31 ± 2.85 dB, mean NCC = 0.98 ± 0.02 and 0.95 ± 0.01, and SNU = 2.07 ± 3.36% and 2.07 ± 3.36% for brain and pelvis data. CONCLUSION Preliminary results demonstrated that the novel learning-based correction method can significantly improve CBCT image quality. Hence, the proposed algorithm is of great potential in improving CBCT's image quality to support its clinical utility in CBCT-guided adaptive radiotherapy.
Collapse
Affiliation(s)
- Yang Lei
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA, 30322, USA
| | - Xiangyang Tang
- Department of Radiology and Imaging Sciences and Winship Cancer Institute, Emory University, Atlanta, GA, 30322, USA
| | - Kristin Higgins
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA, 30322, USA
| | - Jolinta Lin
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA, 30322, USA
| | - Jiwoong Jeong
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA, 30322, USA.,Department of Medical Physics, Georgia Institute of Technology, Atlanta, GA, 30322, USA
| | - Tian Liu
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA, 30322, USA
| | - Anees Dhabaan
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA, 30322, USA
| | - Tonghe Wang
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA, 30322, USA
| | - Xue Dong
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA, 30322, USA
| | - Robert Press
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA, 30322, USA
| | - Walter J Curran
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA, 30322, USA
| | - Xiaofeng Yang
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA, 30322, USA
| |
Collapse
|
12
|
Lei Y, Shu HK, Tian S, Jeong JJ, Liu T, Shim H, Mao H, Wang T, Jani AB, Curran WJ, Yang X. Magnetic resonance imaging-based pseudo computed tomography using anatomic signature and joint dictionary learning. J Med Imaging (Bellingham) 2018; 5:034001. [PMID: 30155512 DOI: 10.1117/1.jmi.5.3.034001] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2018] [Accepted: 08/06/2018] [Indexed: 12/30/2022] Open
Abstract
Magnetic resonance imaging (MRI) provides a number of advantages over computed tomography (CT) for radiation therapy treatment planning; however, MRI lacks the key electron density information necessary for accurate dose calculation. We propose a dictionary-learning-based method to derive electron density information from MRIs. Specifically, we first partition a given MR image into a set of patches, for which we used a joint dictionary learning method to directly predict a CT patch as a structured output. Then a feature selection method is used to ensure prediction robustness. Finally, we combine all the predicted CT patches to obtain the final prediction for the given MR image. This prediction technique was validated for a clinical application using 14 patients with brain MR and CT images. The peak signal-to-noise ratio (PSNR), mean absolute error (MAE), normalized cross-correlation (NCC) indices and similarity index (SI) for air, soft-tissue and bone region were used to quantify the prediction accuracy. The mean ± std of PSNR, MAE, and NCC were: 22.4±1.9 dB , 82.6±26.1 HU, and 0.91±0.03 for the 14 patients. The SIs for air, soft-tissue, and bone regions are 0.98±0.01 , 0.88±0.03 , and 0.69±0.08 . These indices demonstrate the CT prediction accuracy of the proposed learning-based method. This CT image prediction technique could be used as a tool for MRI-based radiation treatment planning, or for PET attenuation correction in a PET/MRI scanner.
Collapse
Affiliation(s)
- Yang Lei
- Emory University, Winship Cancer Institute, Department of Radiation Oncology, Atlanta, Georgia, United States
| | - Hui-Kuo Shu
- Emory University, Winship Cancer Institute, Department of Radiation Oncology, Atlanta, Georgia, United States
| | - Sibo Tian
- Emory University, Winship Cancer Institute, Department of Radiation Oncology, Atlanta, Georgia, United States
| | - Jiwoong Jason Jeong
- Emory University, Winship Cancer Institute, Department of Radiation Oncology, Atlanta, Georgia, United States
| | - Tian Liu
- Emory University, Winship Cancer Institute, Department of Radiation Oncology, Atlanta, Georgia, United States
| | - Hyunsuk Shim
- Emory University, Winship Cancer Institute, Department of Radiation Oncology, Atlanta, Georgia, United States.,Emory University, Winship Cancer Institute, Department of Radiology and Imaging Sciences, Atlanta, Georgia, United States
| | - Hui Mao
- Emory University, Winship Cancer Institute, Department of Radiology and Imaging Sciences, Atlanta, Georgia, United States
| | - Tonghe Wang
- Emory University, Winship Cancer Institute, Department of Radiation Oncology, Atlanta, Georgia, United States
| | - Ashesh B Jani
- Emory University, Winship Cancer Institute, Department of Radiation Oncology, Atlanta, Georgia, United States
| | - Walter J Curran
- Emory University, Winship Cancer Institute, Department of Radiation Oncology, Atlanta, Georgia, United States
| | - Xiaofeng Yang
- Emory University, Winship Cancer Institute, Department of Radiation Oncology, Atlanta, Georgia, United States
| |
Collapse
|
13
|
Malede A, Alemu K, Aemero M, Robele S, Kloos H. Travel to farms in the lowlands and inadequate malaria information significantly predict malaria in villages around Lake Tana, northwest Ethiopia: a matched case-control study. Malar J 2018; 17:290. [PMID: 30097037 PMCID: PMC6086053 DOI: 10.1186/s12936-018-2434-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2018] [Accepted: 07/30/2018] [Indexed: 11/25/2022] Open
Abstract
Background In Ethiopia, malaria has declined in the last decade; only a small number of cases have been reported, primarily from hotspots. The contribution of house proximity to water bodies and the role of migration in malaria transmission has not yet been examined in detail in northwest Ethiopia. Individual and household-level environmental and socio-demographic drivers of malaria heterogeneity were explored contextually in meso-endemic villages around Lake Tana, northwest Ethiopia. Methods A health facility-based paired age-sex matched case–control study involving 303 matched pairs was undertaken from 10 October 2016, to 30 June 2017. Geo-referencing of case households, control households, proximate water bodies, and health centres was carried out. A pretested and structured questionnaire was used to collect data on socio-demography, household assets, housing, travel history, and malaria intervention measures. Medians (interquartile range) were computed for continuous variables. Pearson’s Chi square/Fisher’s exact test was used to detect significant differences in proportions. Principal component analysis was performed to estimate household wealth. Stratified analysis was used to confirm confounding and interaction. A multivariable conditional logistic regression model was used to detect risk factors for malaria. Results Of 303 malaria cases, 59 (19.5% [15.4–24.3]) were imported malaria cases whereas 244 (80.5% [75.7–84.6]) were locally acquired malaria cases. In bivariate analysis, marital status, educational status, and bed net ownership were significantly associated with malaria cases. In multivariable adjustment, travel to malarious lowlands in the preceding month (adjusted mOR = 7.32; 95% CI 2.40–22.34), household member’s travel to malarious lowlands (adjusted mOR = 2.75; 95% CI 1.02–7.44), and inadequate health information on malaria (adjusted mOR = 1.57; 95% CI 1.03–2.41) were predictors of malaria. Stratified analysis confirmed that elevation of households and travel to malarious lowlands were not effect modifiers. Travel to malarious lowlands had a confounding effect on malaria but elevation of households did not. Conclusions In this study, travel to farms in the lowlands and inadequate health information on malaria were risk factors for malaria in villages around Lake Tana. This evidence is critical for the design of improved strategic interventions that consider imported malaria cases and approaches for accessing health information on malaria control in northwest Ethiopia. Electronic supplementary material The online version of this article (10.1186/s12936-018-2434-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Asmamaw Malede
- Ethiopian Institute of Water Resources, Addis Ababa University, Addis Ababa, Ethiopia.
| | - Kassahun Alemu
- Department of Epidemiology and Biostatistics, Institute of Public Health, University of Gondar, Gondar, Ethiopia
| | - Mulugeta Aemero
- Department of Medical Parasitology, School of Biomedical & Laboratory Sciences, University of Gondar, Gondar, Ethiopia
| | - Sirak Robele
- Ethiopian Institute of Water Resources, Addis Ababa University, Addis Ababa, Ethiopia
| | - Helmut Kloos
- Department of Epidemiology and Biostatistics, University of California, San Francisco, USA
| |
Collapse
|
14
|
Mougin F, Auber D, Bourqui R, Diallo G, Dutour I, Jouhet V, Thiessard F, Thiébaut R, Thébault P. Visualizing omics and clinical data: Which challenges for dealing with their variety? Methods 2017; 132:3-18. [PMID: 28887085 DOI: 10.1016/j.ymeth.2017.08.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2017] [Revised: 08/22/2017] [Accepted: 08/23/2017] [Indexed: 12/20/2022] Open
Abstract
Life sciences are currently going through a great number of transformations raised by the in-going revolution in high-throughput technologies for the acquisition of data. The integration of their high dimensionality, ranging from omics to clinical data, is becoming one of the most challenging stages. It involves inter-disciplinary developments with the aim to move towards an enhanced understanding of human physiology for caring purposes. Biologists, bioinformaticians, physicians and other experts related to the healthcare domain have to accompany each step of the analysis process in order to investigate and expertise these various data. In this perspective, methods related to information visualization are gaining increasing attention within life sciences. The softwares based on these methods are now well recognized to facilitate expert users' success in carrying out their data analysis tasks. This article aims at reviewing the current methods and techniques dedicated to information visualisation and their current use in software development related to omics or/and clinical data.
Collapse
Affiliation(s)
- Fleur Mougin
- Univ. Bordeaux, Inserm UMR 1219, Bordeaux Population Health Research Center, Team ERIAS, F-33000 Bordeaux, France; Univ. Bordeaux, CNRS UMR 5800, LaBRI, F-33000 Bordeaux, France.
| | - David Auber
- Univ. Bordeaux, CNRS UMR 5800, LaBRI, F-33000 Bordeaux, France
| | - Romain Bourqui
- Univ. Bordeaux, CNRS UMR 5800, LaBRI, F-33000 Bordeaux, France
| | - Gayo Diallo
- Univ. Bordeaux, Inserm UMR 1219, Bordeaux Population Health Research Center, Team ERIAS, F-33000 Bordeaux, France; Univ. Bordeaux, CNRS UMR 5800, LaBRI, F-33000 Bordeaux, France
| | - Isabelle Dutour
- Univ. Bordeaux, CNRS UMR 5800, LaBRI, F-33000 Bordeaux, France
| | - Vianney Jouhet
- Univ. Bordeaux, Inserm UMR 1219, Bordeaux Population Health Research Center, Team ERIAS, F-33000 Bordeaux, France; CHU de Bordeaux, Pole de sante publique, Service d'information medicale, F-33000 Bordeaux, France
| | - Frantz Thiessard
- Univ. Bordeaux, Inserm UMR 1219, Bordeaux Population Health Research Center, Team ERIAS, F-33000 Bordeaux, France; CHU de Bordeaux, Pole de sante publique, Service d'information medicale, F-33000 Bordeaux, France
| | - Rodolphe Thiébaut
- Univ. Bordeaux, Inserm UMR 1219, INRIA SISTM, F-33000 Bordeaux, France; CHU de Bordeaux, Pole de sante publique, Service d'information medicale, F-33000 Bordeaux, France.
| | | |
Collapse
|
15
|
Bezin J, Duong M, Lassalle R, Droz C, Pariente A, Blin P, Moore N. The national healthcare system claims databases in France, SNIIRAM and EGB: Powerful tools for pharmacoepidemiology. Pharmacoepidemiol Drug Saf 2017; 26:954-962. [PMID: 28544284 DOI: 10.1002/pds.4233] [Citation(s) in RCA: 372] [Impact Index Per Article: 53.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 03/30/2017] [Accepted: 04/23/2017] [Indexed: 12/11/2022]
Affiliation(s)
- Julien Bezin
- Department of Medical Pharmacology, CHU de Bordeaux; Université de Bordeaux; 33076 Bordeaux France
- INSERM U1219; 33076 Bordeaux France
| | - Mai Duong
- INSERM U1219; 33076 Bordeaux France
- Bordeaux PharmacoEpi; INSERM CIC1401; 33076 Bordeaux France
| | - Régis Lassalle
- Bordeaux PharmacoEpi; INSERM CIC1401; 33076 Bordeaux France
| | - Cécile Droz
- Bordeaux PharmacoEpi; INSERM CIC1401; 33076 Bordeaux France
| | - Antoine Pariente
- Department of Medical Pharmacology, CHU de Bordeaux; Université de Bordeaux; 33076 Bordeaux France
- INSERM U1219; 33076 Bordeaux France
| | - Patrick Blin
- Bordeaux PharmacoEpi; INSERM CIC1401; 33076 Bordeaux France
| | - Nicholas Moore
- Department of Medical Pharmacology, CHU de Bordeaux; Université de Bordeaux; 33076 Bordeaux France
- INSERM U1219; 33076 Bordeaux France
- Bordeaux PharmacoEpi; INSERM CIC1401; 33076 Bordeaux France
| |
Collapse
|
16
|
Li Y, Morrow J, Raby B, Tantisira K, Weiss ST, Huang W, Qiu W. Detecting disease-associated genomic outcomes using constrained mixture of Bayesian hierarchical models for paired data. PLoS One 2017; 12:e0174602. [PMID: 28358896 PMCID: PMC5373614 DOI: 10.1371/journal.pone.0174602] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 03/10/2017] [Indexed: 12/02/2022] Open
Abstract
Detecting disease-associated genomic outcomes is one of the key steps in precision medicine research. Cutting-edge high-throughput technologies enable researchers to unbiasedly test if genomic outcomes are associated with disease of interest. However, these technologies also include the challenges associated with the analysis of genome-wide data. Two big challenges are (1) how to reduce the effects of technical noise; and (2) how to handle the curse of dimensionality (i.e., number of variables are way larger than the number of samples). To tackle these challenges, we propose a constrained mixture of Bayesian hierarchical models (MBHM) for detecting disease-associated genomic outcomes for data obtained from paired/matched designs. Paired/matched designs can effectively reduce effects of confounding factors. MBHM does not involve multiple testing, hence does not have the problem of the curse of dimensionality. It also could borrow information across genes so that it can be used for whole genome data with small sample sizes.
Collapse
Affiliation(s)
- Yunfeng Li
- School of Mathematical Sciences, Zhejiang University, HongZhou, Zhejiang, China
| | - Jarrett Morrow
- Channing Division of Network Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, MA, United States of America
| | - Benjamin Raby
- Channing Division of Network Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, MA, United States of America
| | - Kelan Tantisira
- Channing Division of Network Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, MA, United States of America
| | - Scott T. Weiss
- Channing Division of Network Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, MA, United States of America
| | - Wei Huang
- School of Mathematical Sciences, Zhejiang University, HongZhou, Zhejiang, China
| | - Weiliang Qiu
- Channing Division of Network Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, MA, United States of America
- * E-mail:
| |
Collapse
|
17
|
Doerken S, Mockenhaupt M, Naldi L, Schumacher M, Sekula P. The case-crossover design via penalized regression. BMC Med Res Methodol 2016; 16:103. [PMID: 27549803 PMCID: PMC4994302 DOI: 10.1186/s12874-016-0197-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Accepted: 07/28/2016] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND The case-crossover design is an attractive alternative to the classical case-control design which can be used to study the onset of acute events if the risk factors of interest vary in time. By comparing exposures within cases at different time periods, the case-crossover design does not rely on control subjects which can be difficult to acquire. However, using the standard method of maximum likelihood, resulting risk estimates can be heavily biased when the prevalence to risk factors is very low (or very high). METHODS To overcome the problem of low risk factor prevalences, penalized conditional logistic regression via the lasso (least absolute shrinkage and selection operator) has been proposed in the literature as well as related methods such as the Firth correction. We apply and compare several penalized regression approaches in the context of a case-crossover analysis of the European Study of Severe Cutaneous Adverse Reactions (EuroSCAR; 1997-2001). RESULTS Out of 30 drugs, standard methods only correctly classified 17 drugs (including some highly implausible risk estimates), while penalized methods correctly classified 22 drugs. CONCLUSION Penalized methods generally yield better risk classifications and much more plausible risk estimates for the EuroSCAR study than standard methods. As these novel techniques can be easily implemented using available R packages, we encourage routine use of penalized conditional logistic regression for case-crossover data.
Collapse
Affiliation(s)
- Sam Doerken
- Institute for Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany
| | - Maja Mockenhaupt
- Dokumentationszentrum schwerer Hautreaktionen (dZh), Medical Center, University of Freiburg, Freiburg, Germany
| | - Luigi Naldi
- USC di Dermatologia, Azienda Ospedaliero Papa Giovanni XXIII, Bergamo, Italy
| | - Martin Schumacher
- Institute for Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany
| | - Peggy Sekula
- Institute for Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany
| |
Collapse
|