1
|
Vinkenoog M, Toivonen J, van Leeuwen M, Janssen MP, Arvas M. The added value of ferritin levels and genetic markers for the prediction of haemoglobin deferral. Vox Sang 2023; 118:825-834. [PMID: 37649369 DOI: 10.1111/vox.13517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 07/28/2023] [Accepted: 08/07/2023] [Indexed: 09/01/2023]
Abstract
BACKGROUND AND OBJECTIVES On-site haemoglobin deferral for blood donors is sometimes necessary for donor health but demotivating for donors and inefficient for the blood bank. Deferral rates could be reduced by accurately predicting donors' haemoglobin status before they visit the blood bank. Although such predictive models have been published, there is ample room for improvement in predictive performance. We aim to assess the added value of ferritin levels or genetic markers as predictor variables in haemoglobin deferral prediction models. MATERIALS AND METHODS Support vector machines with and without this information (the full and reduced model, respectively) are compared in Finland and the Netherlands. Genetic markers are available in the Finnish data and ferritin levels in the Dutch data. RESULTS Although there is a clear association between haemoglobin deferral and both ferritin levels and several genetic markers, predictive performance increases only marginally with their inclusion as predictors. The recall of deferrals increases from 68.6% to 69.9% with genetic markers and from 79.7% to 80.0% with ferritin levels included. Subgroup analyses show that the added value of these predictors is higher in specific subgroups, for example, for donors with minor alleles on single-nucleotide polymorphism 17:58358769, recall of deferral increases from 73.3% to 93.3%. CONCLUSION Including ferritin levels or genetic markers in haemoglobin deferral prediction models improves predictive performance. The increase in overall performance is small but may be substantial for specific subgroups. We recommend including this information as predictor variables when available, but not to collect it for this purpose only.
Collapse
Affiliation(s)
- Marieke Vinkenoog
- Donor Medicine Research, Sanquin Research, Amsterdam, The Netherlands
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands
| | - Jarkko Toivonen
- Research and Development, Finnish Red Cross Blood Service, Helsinki, Finland
| | - Matthijs van Leeuwen
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands
| | - Mart P Janssen
- Donor Medicine Research, Sanquin Research, Amsterdam, The Netherlands
| | - Mikko Arvas
- Research and Development, Finnish Red Cross Blood Service, Helsinki, Finland
| |
Collapse
|
2
|
van der Arend BWH, Verhagen IE, van Leeuwen M, van der Arend MQTP, van Casteren DS, Terwindt GM. Defining migraine days, based on longitudinal E-diary data. Cephalalgia 2023; 43:3331024231166625. [PMID: 37021643 DOI: 10.1177/03331024231166625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/07/2023]
Abstract
BACKGROUND There is a need for standardization of the definition of a migraine day for clinical and research purposes. METHODS We prospectively compared different definitions of a migraine day with E-diary data of n = 1494 patients with migraine. We used a baseline definition based on migraine characteristics with a duration of ≥4 hours OR triptan intake (independently from its effect) OR (visual) aura lasting 5-60 minutes. RESULTS Of all migraine days defined by triptan intake only, 66.2% had a duration <4 hours. Adjusting the headache duration criterion to ≥30 minutes led to a decrease in days defined by triptan intake only and resulted in a 5.4% increase in total migraine days (equals 0.45 migraine day increase in monthly migraine days). These additional migraine days had a median duration of 2.5 hours. CONCLUSION We propose to define a migraine day as follows: 1) (a) headache duration ≥30 minutes; (b) matching ≥2 of four characteristics: unilateral, pulsating, moderate to severe pain, aggravation by or causing avoidance of routine physical activity; and (c) during headache ≥1 of the following: nausea and/or vomiting, photophobia and phonophobia or 2) (visual) aura duration 5-60 minutes or 3) a day with headache for which acute migraine-specific medication is used irrespective of its effect.
Collapse
Affiliation(s)
| | - Iris E Verhagen
- Department of Neurology, Leiden University Medical Center, Leiden, The Netherlands
| | - Matthijs van Leeuwen
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands
| | | | | | - Gisela M Terwindt
- Department of Neurology, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
3
|
Yang L, Baratchi M, van Leeuwen M. Unsupervised discretization by two-dimensional MDL-based histogram. Mach Learn 2023. [DOI: 10.1007/s10994-022-06294-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
AbstractUnsupervised discretization is a crucial step in many knowledge discovery tasks. The state-of-the-art method for one-dimensional data infers locally adaptive histograms using the minimum description length (MDL) principle, but the multi-dimensional case is far less studied: current methods consider the dimensions one at a time (if not independently), which result in discretizations based on rectangular cells of adaptive size. Unfortunately, this approach is unable to adequately characterize dependencies among dimensions and/or results in discretizations consisting of more cells (or bins) than is desirable. To address this problem, we propose an expressive model class that allows for far more flexible partitions of two-dimensional data. We extend the state of the art for the one-dimensional case to obtain a model selection problem based on the normalized maximum likelihood, a form of refined MDL. As the flexibility of our model class comes at the cost of a vast search space, we introduce a heuristic algorithm, named PALM, which partitions each dimension alternately and then merges neighboring regions, all using the MDL principle. Experiments on synthetic data show that PALM (1) accurately reveals ground truth partitions that are within the model class (i.e., the search space), given a large enough sample size; (2) approximates well a wide range of partitions outside the model class; (3) converges, in contrast to the state-of-the-art multivariate discretization method IPD. Finally, we apply our algorithm to three spatial datasets, and we demonstrate that, compared to kernel density estimation (KDE), our algorithm not only reveals more detailed density changes, but also fits unseen data better, as measured by the log-likelihood.
Collapse
|
4
|
Kroes SKS, van Leeuwen M, Groenwold RHH, Janssen MP. Generating synthetic mixed discrete-continuous health records with mixed sum-product networks. J Am Med Inform Assoc 2022; 30:16-25. [PMID: 36228120 PMCID: PMC9748584 DOI: 10.1093/jamia/ocac184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 09/09/2022] [Accepted: 10/01/2022] [Indexed: 12/15/2022] Open
Abstract
OBJECTIVE Privacy is a concern whenever individual patient health data is exchanged for scientific research. We propose using mixed sum-product networks (MSPNs) as private representations of data and take samples from the network to generate synthetic data that can be shared for subsequent statistical analysis. This anonymization method was evaluated with respect to privacy and information loss. MATERIALS AND METHODS Using a simulation study, information loss was quantified by assessing whether synthetic data could reproduce regression parameters obtained from the original data. Predictors variable types were varied between continuous, count, categorical, and mixed discrete-continuous. Additionally, we measured whether the MSPN approach successfully anonymizes the data by removing associations between background and sensitive information for these datasets. RESULTS The synthetic data generated with MSPNs yielded regression results highly similar to those generated with original data, differing less than 5% in most simulation scenarios. Standard errors increased compared to the original data. Particularly for smaller datasets (1000 records), this resulted in a discrepancy between the estimated and empirical standard errors. Sensitive values could no longer be inferred from background information for at least 99% of tested individuals. DISCUSSION The proposed anonymization approach yields very promising results. Further research is required to evaluate its performance with other types of data and analyses, and to predict how user parameter choices affect a bias-privacy trade-off. CONCLUSION Generating synthetic data from MSPNs is a promising, easy-to-use approach for anonymization of sensitive individual health data that yields informative and private data.
Collapse
Affiliation(s)
- Shannon K S Kroes
- Transfusion Technology Assessment Group, Donor Medicine Research Department, Sanquin Research, Amsterdam, The Netherlands
- Leiden Institute of Advanced Computer Science, Computer Science, Leiden University, Leiden, The Netherlands
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Matthijs van Leeuwen
- Leiden Institute of Advanced Computer Science, Computer Science, Leiden University, Leiden, The Netherlands
| | - Rolf H H Groenwold
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Mart P Janssen
- Transfusion Technology Assessment Group, Donor Medicine Research Department, Sanquin Research, Amsterdam, The Netherlands
- Leiden Institute of Advanced Computer Science, Computer Science, Leiden University, Leiden, The Netherlands
| |
Collapse
|
5
|
Vinkenoog M, van Leeuwen M, Janssen MP. Explainable haemoglobin deferral predictions using machine learning models: Interpretation and consequences for the blood supply. Vox Sang 2022; 117:1262-1270. [PMID: 36102148 PMCID: PMC9826045 DOI: 10.1111/vox.13350] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 07/29/2022] [Accepted: 08/09/2022] [Indexed: 01/18/2023]
Abstract
BACKGROUND AND OBJECTIVES Accurate predictions of haemoglobin (Hb) deferral for whole-blood donors could aid blood banks in reducing deferral rates and increasing efficiency and donor motivation. Complex models are needed to make accurate predictions, but predictions must also be explainable. Before the implementation of a prediction model, its impact on the blood supply should be estimated to avoid shortages. MATERIALS AND METHODS Donation visits between October 2017 and December 2021 were selected from Sanquin's database system. The following variables were available for each visit: donor sex, age, donation start time, month, number of donations in the last 24 months, most recent ferritin level, days since last ferritin measurement, Hb at nth previous visit (n between 1 and 5), days since the nth previous visit. Outcome Hb deferral has two classes: deferred and not deferred. Support vector machines were used as prediction models, and SHapley Additive exPlanations values were used to quantify the contribution of each variable to the model predictions. Performance was assessed using precision and recall. The potential impact on blood supply was estimated by predicting deferral at earlier or later donation dates. RESULTS We present a model that predicts Hb deferral in an explainable way. If used in practice, 64% of non-deferred donors would be invited on or before their original donation date, while 80% of deferred donors would be invited later. CONCLUSION By using this model to invite donors, the number of blood bank visits would increase by 15%, while deferral rates would decrease by 60% (currently 3% for women and 1% for men).
Collapse
Affiliation(s)
- Marieke Vinkenoog
- Department of Donor Medicine ResearchSanquin ResearchAmsterdamthe Netherlands,Leiden Institute of Advanced Computer ScienceLeiden UniversityLeidenthe Netherlands
| | - Matthijs van Leeuwen
- Leiden Institute of Advanced Computer ScienceLeiden UniversityLeidenthe Netherlands
| | - Mart P. Janssen
- Department of Donor Medicine ResearchSanquin ResearchAmsterdamthe Netherlands
| |
Collapse
|
6
|
Vinkenoog M, Steenhuis M, Brinke AT, van Hasselt JGC, Janssen MP, van Leeuwen M, Swaneveld FH, Vrielink H, van de Watering L, Quee F, van den Hurk K, Rispens T, Hogema B, van der Schoot CE. Associations Between Symptoms, Donor Characteristics and IgG Antibody Response in 2082 COVID-19 Convalescent Plasma Donors. Front Immunol 2022; 13:821721. [PMID: 35296077 PMCID: PMC8918483 DOI: 10.3389/fimmu.2022.821721] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 02/03/2022] [Indexed: 12/13/2022] Open
Abstract
Many studies already reported on the association between patient characteristics on the severity of COVID-19 disease outcome, but the relation with SARS-CoV-2 antibody levels is less clear. To investigate this in more detail, we performed a retrospective observational study in which we used the IgG antibody response from 11,118 longitudinal antibody measurements of 2,082 unique COVID convalescent plasma donors. COVID-19 symptoms and donor characteristics were obtained by a questionnaire. Antibody responses were modelled using a linear mixed-effects model. Our study confirms that the SARS-CoV-2 antibody response is associated with patient characteristics like body mass index and age. Antibody decay was faster in male than in female donors (average half-life of 62 versus 72 days). Most interestingly, we also found that three symptoms (headache, anosmia, nasal cold) were associated with lower peak IgG, while six other symptoms (dry cough, fatigue, diarrhoea, fever, dyspnoea, muscle weakness) were associated with higher IgG concentrations.
Collapse
Affiliation(s)
- Marieke Vinkenoog
- Department of Donor Medicine Research, Sanquin Research, Amsterdam, Netherlands
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden, Netherlands
| | - Maurice Steenhuis
- Department of Immunopathology, Sanquin Research, Amsterdam, Netherlands
- Landsteiner Laboratory, Amsterdam University Medical Centre, University of Amsterdam, Amsterdam, Netherlands
| | - Anja ten Brinke
- Department of Immunopathology, Sanquin Research, Amsterdam, Netherlands
- Landsteiner Laboratory, Amsterdam University Medical Centre, University of Amsterdam, Amsterdam, Netherlands
| | - J. G. Coen van Hasselt
- Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Leiden University, Leiden, Netherlands
| | - Mart P. Janssen
- Department of Donor Medicine Research, Sanquin Research, Amsterdam, Netherlands
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden, Netherlands
| | - Matthijs van Leeuwen
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden, Netherlands
| | - Francis H. Swaneveld
- Department of Transfusion Medicine, Sanquin Blood Supply, Amsterdam, Netherlands
| | - Hans Vrielink
- Department of Transfusion Medicine, Sanquin Blood Supply, Amsterdam, Netherlands
| | - Leo van de Watering
- Department of Transfusion Medicine, Sanquin Blood Supply, Amsterdam, Netherlands
| | - Franke Quee
- Department of Donor Medicine Research, Sanquin Research, Amsterdam, Netherlands
| | - Katja van den Hurk
- Department of Donor Medicine Research, Sanquin Research, Amsterdam, Netherlands
| | - Theo Rispens
- Department of Immunopathology, Sanquin Research, Amsterdam, Netherlands
- Landsteiner Laboratory, Amsterdam University Medical Centre, University of Amsterdam, Amsterdam, Netherlands
| | - Boris Hogema
- Department of Virology, Sanquin Diagnostic Services, Amsterdam, Netherlands
| | - C. Ellen van der Schoot
- Department of Experimental Immunohematology, Sanquin Research and Landsteiner Laboratory Amsterdam University Medical Centre, Amsterdam, Netherlands
- *Correspondence: C. Ellen van der Schoot,
| |
Collapse
|
7
|
Abstract
Although data protection is compulsory when personal data is shared, there is no systematic method available to evaluate to what extent each individual is at risk of a privacy breach. We use a collection of measures that quantify how much information is needed to uncover sensitive information. Combined with visualization techniques, our approach can be used to perform a detailed privacy analysis of medical data. Because privacy is evaluated per variable, these adjustments can be made while incorporating how likely it is that these variables will be exploited to uncover sensitive information in practice, as is mandatory in the European Union. Additionally, the analysis of privacy can be used to evaluate to what extent knowledge on specific variables in the data can contribute to privacy breaches, which can subsequently guide the use of anonymization techniques, such as generalization.
Collapse
Affiliation(s)
- Shannon Ks Kroes
- Sanquin Research, the Netherlands.,Leiden University, the Netherlands.,Leiden University Medical Center, the Netherlands
| | | | | | | |
Collapse
|
8
|
Kapoor S, Saxena DK, van Leeuwen M. Online summarization of dynamic graphs using subjective interestingness for sequential data. Data Min Knowl Discov 2020. [DOI: 10.1007/s10618-020-00714-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
AbstractMany real-world phenomena can be represented as dynamic graphs, i.e., networks that change over time. The problem of dynamic graph summarization, i.e., to succinctly describe the evolution of a dynamic graph, has been widely studied. Existing methods typically use objective measures to find fixed structures such as cliques, stars, and cores. Most of the methods, however, do not consider the problem of online summarization, where the summary is incrementally conveyed to the analyst as the graph evolves, and (thus) do not take into account the knowledge of the analyst at a specific moment in time. We address this gap in the literature through a novel, generic framework for subjective interestingness for sequential data. Specifically, we iteratively identify atomic changes, called ‘actions’, that provide most information relative to the current knowledge of the analyst. For this, we introduce a novel information gain measure, which is motivated by the minimum description length (MDL) principle. With this measure, our approach discovers compact summaries without having to decide on the number of patterns. As such, we are the first to combine approaches for data mining based on subjective interestingness (using the maximum entropy principle) with pattern-based summarization (using the MDL principle). We instantiate this framework for dynamic graphs and dense subgraph patterns, and present DSSG, a heuristic algorithm for the online summarization of dynamic graphs by means of informative actions, each of which represents an interpretable change to the connectivity structure of the graph. The experiments on real-world data demonstrate that our approach effectively discovers informative summaries. We conclude with a case study on data from an airline network to show its potential for real-world applications.
Collapse
|
9
|
Vinkenoog M, van den Hurk K, van Kraaij M, van Leeuwen M, Janssen MP. First results of a ferritin-based blood donor deferral policy in the Netherlands. Transfusion 2020; 60:1785-1792. [PMID: 32533600 PMCID: PMC7496980 DOI: 10.1111/trf.15906] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Revised: 04/22/2020] [Accepted: 04/22/2020] [Indexed: 12/16/2022]
Abstract
BACKGROUND Whole blood donors are at risk of becoming iron deficient. To monitor iron stores, Sanquin implemented a new deferral policy based on ferritin levels, in addition to the traditional hemoglobin measurements. METHODS Ferritin levels are determined in every fifth donation, as well as in all first-time donors. Donors with ferritin levels <15 ng/mL (WHO threshold) are deferred for 12 months; those ≥15 and ≤30 ng/mL for 6 months. The first results were analyzed and are presented here. RESULTS The results show that 25% of women (N = 20151, 95% CI 24%-25%) and 1.6% of men (N = 10391, 95% CI 1.4%-1.8%) have ferritin levels ≤30 ng/mL at their first blood center visit. For repeat (non-first-time) donors, these proportions are higher: 53% of women (N = 28329, 95% CI 52%-54%) and 42% of men (N = 31089, 95% CI 41%-43%). After a 6-month deferral, in 88% of returning women (N = 3059, 95% CI 87%-89%) and 99% of returning men (N = 3736, 95% CI 98%-99%) ferritin levels were ≥15 ng/mL. After a 12-month deferral, in 74% of returning women (N = 486, 95% CI 70%-78%) and 95% of returning men (N = 479, 95% CI 94%-97%) ferritin levels increased to ≥15 ng/mL. CONCLUSION Deferral of donors whose pre-donation ferritin levels were ≤30 ng/mL might prevent donors from returning with ferritin levels <15 ng/mL. This policy is promising to mitigate effects of repeated donations on iron stores.
Collapse
Affiliation(s)
- Marieke Vinkenoog
- Donor Medicine Research, Sanquin ResearchAmsterdamThe Netherlands
- Leiden Institute of Advanced Computer ScienceLeiden UniversityLeidenThe Netherlands
| | | | | | - Matthijs van Leeuwen
- Leiden Institute of Advanced Computer ScienceLeiden UniversityLeidenThe Netherlands
| | - Mart P. Janssen
- Donor Medicine Research, Sanquin ResearchAmsterdamThe Netherlands
| |
Collapse
|
10
|
Kapoor S, Saxena DK, van Leeuwen M. Discovering subjectively interesting multigraph patterns. Mach Learn 2020. [DOI: 10.1007/s10994-020-05873-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
11
|
|
12
|
van Os HJA, Ramos LA, Hilbert A, van Leeuwen M, van Walderveen MAA, Kruyt ND, Dippel DWJ, Steyerberg EW, van der Schaaf IC, Lingsma HF, Schonewille WJ, Majoie CBLM, Olabarriaga SD, Zwinderman KH, Venema E, Marquering HA, Wermer MJH. Predicting Outcome of Endovascular Treatment for Acute Ischemic Stroke: Potential Value of Machine Learning Algorithms. Front Neurol 2018; 9:784. [PMID: 30319525 PMCID: PMC6167479 DOI: 10.3389/fneur.2018.00784] [Citation(s) in RCA: 76] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 08/30/2018] [Indexed: 11/24/2022] Open
Abstract
Background: Endovascular treatment (EVT) is effective for stroke patients with a large vessel occlusion (LVO) of the anterior circulation. To further improve personalized stroke care, it is essential to accurately predict outcome after EVT. Machine learning might outperform classical prediction methods as it is capable of addressing complex interactions and non-linear relations between variables. Methods: We included patients from the Multicenter Randomized Clinical Trial of Endovascular Treatment for Acute Ischemic Stroke in the Netherlands (MR CLEAN) Registry, an observational cohort of LVO patients treated with EVT. We applied the following machine learning algorithms: Random Forests, Support Vector Machine, Neural Network, and Super Learner and compared their predictive value with classic logistic regression models using various variable selection methodologies. Outcome variables were good reperfusion (post-mTICI ≥ 2b) and functional independence (modified Rankin Scale ≤2) at 3 months using (1) only baseline variables and (2) baseline and treatment variables. Area under the ROC-curves (AUC) and difference of mean AUC between the models were assessed. Results: We included 1,383 EVT patients, with good reperfusion in 531 (38%) and functional independence in 525 (38%) patients. Machine learning and logistic regression models all performed poorly in predicting good reperfusion (range mean AUC: 0.53–0.57), and moderately in predicting 3-months functional independence (range mean AUC: 0.77–0.79) using only baseline variables. All models performed well in predicting 3-months functional independence using both baseline and treatment variables (range mean AUC: 0.88–0.91) with a negligible difference of mean AUC (0.01; 95%CI: 0.00–0.01) between best performing machine learning algorithm (Random Forests) and best performing logistic regression model (based on prior knowledge). Conclusion: In patients with LVO machine learning algorithms did not outperform logistic regression models in predicting reperfusion and 3-months functional independence after endovascular treatment. For all models at time of admission radiological outcome was more difficult to predict than clinical outcome.
Collapse
Affiliation(s)
| | - Lucas A Ramos
- Department of Biomedical Engineering and Physics, University of Amsterdam, Amsterdam, Netherlands.,Department of Clinical Epidemiology and Biostatistics, University of Amsterdam, Amsterdam, Netherlands
| | - Adam Hilbert
- Department of Biomedical Engineering and Physics, University of Amsterdam, Amsterdam, Netherlands
| | - Matthijs van Leeuwen
- Leiden Institute for Advanced Computer Sciences, Leiden University, Leiden, Netherlands
| | | | - Nyika D Kruyt
- Department of Neurology, Leiden University Medical Center, Leiden, Netherlands
| | | | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, Netherlands.,Department of Public Health, Erasmus Medical Center, Rotterdam, Netherlands
| | | | - Hester F Lingsma
- Department of Public Health, Erasmus Medical Center, Rotterdam, Netherlands
| | | | - Charles B L M Majoie
- Department of Radiology and Nuclear Medicine, University of Amsterdam, Amsterdam, Netherlands
| | - Silvia D Olabarriaga
- Department of Clinical Epidemiology and Biostatistics, University of Amsterdam, Amsterdam, Netherlands
| | - Koos H Zwinderman
- Department of Clinical Epidemiology and Biostatistics, University of Amsterdam, Amsterdam, Netherlands
| | - Esmee Venema
- Department of Neurology, Erasmus Medical Center, Rotterdam, Netherlands.,Department of Public Health, Erasmus Medical Center, Rotterdam, Netherlands
| | - Henk A Marquering
- Department of Biomedical Engineering and Physics, University of Amsterdam, Amsterdam, Netherlands
| | - Marieke J H Wermer
- Department of Neurology, Leiden University Medical Center, Leiden, Netherlands
| | | |
Collapse
|
13
|
Paramonov S, van Leeuwen M, De Raedt L. Relational data factorization. Mach Learn 2017. [DOI: 10.1007/s10994-017-5660-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
14
|
Le Van T, van Leeuwen M, Carolina Fierro A, De Maeyer D, Van den Eynden J, Verbeke L, De Raedt L, Marchal K, Nijssen S. Simultaneous discovery of cancer subtypes and subtype features by molecular data integration. Bioinformatics 2017; 32:i445-i454. [PMID: 27587661 DOI: 10.1093/bioinformatics/btw434] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Subtyping cancer is key to an improved and more personalized prognosis/treatment. The increasing availability of tumor related molecular data provides the opportunity to identify molecular subtypes in a data-driven way. Molecular subtypes are defined as groups of samples that have a similar molecular mechanism at the origin of the carcinogenesis. The molecular mechanisms are reflected by subtype-specific mutational and expression features. Data-driven subtyping is a complex problem as subtyping and identifying the molecular mechanisms that drive carcinogenesis are confounded problems. Many current integrative subtyping methods use global mutational and/or expression tumor profiles to group tumor samples in subtypes but do not explicitly extract the subtype-specific features. We therefore present a method that solves both tasks of subtyping and identification of subtype-specific features simultaneously. Hereto our method integrates` mutational and expression data while taking into account the clonal properties of carcinogenesis. Key to our method is a formalization of the problem as a rank matrix factorization of ranked data that approaches the subtyping problem as multi-view bi-clustering RESULTS We introduce a novel integrative framework to identify subtypes by combining mutational and expression features. The incomparable measurement data is integrated by transformation into ranked data and subtypes are defined as multi-view bi-clusters We formalize the model using rank matrix factorization, resulting in the SRF algorithm. Experiments on simulated data and the TCGA breast cancer data demonstrate that SRF is able to capture subtle differences that existing methods may miss. AVAILABILITY AND IMPLEMENTATION The implementation is available at: https://github.com/rankmatrixfactorisation/SRF CONTACT: kathleen.marchal@intec.ugent.be, siegfried.nijssen@cs.kuleuven.be SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Thanh Le Van
- Department of Computer Science, KULeuven, Leuven, Belgium
| | - Matthijs van Leeuwen
- Leiden Institute for Advanced Computer Science, Universiteit Leiden, Leiden, The Netherlands
| | - Ana Carolina Fierro
- Department of Information Technology, iMinds, Ghent University, Gent, Belgium, Bioinformatics Institute Ghent, 9052 Gent, Belgium, Department of Plant Biotechnology and Bioinformatics, Ghent University, Gent, Belgium
| | - Dries De Maeyer
- Department of Information Technology, iMinds, Ghent University, Gent, Belgium, Bioinformatics Institute Ghent, 9052 Gent, Belgium, Department of Plant Biotechnology and Bioinformatics, Ghent University, Gent, Belgium
| | - Jimmy Van den Eynden
- Department of Medical Biochemisty and Cell Biology, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden
| | - Lieven Verbeke
- Department of Information Technology, iMinds, Ghent University, Gent, Belgium, Bioinformatics Institute Ghent, 9052 Gent, Belgium, Department of Plant Biotechnology and Bioinformatics, Ghent University, Gent, Belgium
| | - Luc De Raedt
- Department of Computer Science, KULeuven, Leuven, Belgium
| | - Kathleen Marchal
- Department of Information Technology, iMinds, Ghent University, Gent, Belgium, Bioinformatics Institute Ghent, 9052 Gent, Belgium, Department of Plant Biotechnology and Bioinformatics, Ghent University, Gent, Belgium Department of Genetics, University of Pretoria, Hatfield Campus, Pretoria 0028, South Africa
| | - Siegfried Nijssen
- Department of Computer Science, KULeuven, Leuven, Belgium, Leiden Institute for Advanced Computer Science, Universiteit Leiden, Leiden, The Netherlands
| |
Collapse
|
15
|
|
16
|
|
17
|
Copmans D, Meinl T, Dietz C, van Leeuwen M, Ortmann J, Berthold MR, de Witte PAM. A KNIME-Based Analysis of the Zebrafish Photomotor Response Clusters the Phenotypes of 14 Classes of Neuroactive Molecules. ACTA ACUST UNITED AC 2015; 21:427-36. [PMID: 26637551 DOI: 10.1177/1087057115618348] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Accepted: 10/29/2015] [Indexed: 11/16/2022]
Abstract
Recently, the photomotor response (PMR) of zebrafish embryos was reported as a robust behavior that is useful for high-throughput neuroactive drug discovery and mechanism prediction. Given the complexity of the PMR, there is a need for rapid and easy analysis of the behavioral data. In this study, we developed an automated analysis workflow using the KNIME Analytics Platform and made it freely accessible. This workflow allows us to simultaneously calculate a behavioral fingerprint for all analyzed compounds and to further process the data. Furthermore, to further characterize the potential of PMR for mechanism prediction, we performed PMR analysis of 767 neuroactive compounds covering 14 different receptor classes using the KNIME workflow. We observed a true positive rate of 25% and a false negative rate of 75% in our screening conditions. Among the true positives, all receptor classes were represented, thereby confirming the utility of the PMR assay to identify a broad range of neuroactive molecules. By hierarchical clustering of the behavioral fingerprints, different phenotypical clusters were observed that suggest the utility of PMR for mechanism prediction for adrenergics, dopaminergics, serotonergics, metabotropic glutamatergics, opioids, and ion channel ligands.
Collapse
Affiliation(s)
- Daniëlle Copmans
- Laboratory for Molecular Biodiscovery, Department of Pharmaceutical and Pharmacological Sciences, KU Leuven, Leuven, Belgium
| | | | - Christian Dietz
- Chair for Bioinformatics and Information Mining, Department of Computer and Information Science, University of Konstanz, Konstanz, Germany
| | - Matthijs van Leeuwen
- Machine Learning Group, Department of Computer Science, KU Leuven, Leuven, Belgium
| | - Julia Ortmann
- Department of Bioanalytical Ecotoxicology, Helmholtz Centre for Environmental Research, UFZ, Leipzig, Germany
| | - Michael R Berthold
- Chair for Bioinformatics and Information Mining, Department of Computer and Information Science, University of Konstanz, Konstanz, Germany
| | - Peter A M de Witte
- Laboratory for Molecular Biodiscovery, Department of Pharmaceutical and Pharmacological Sciences, KU Leuven, Leuven, Belgium
| |
Collapse
|
18
|
Abstract
Pattern mining provides useful tools for exploratory data analysis. Numerous efficient algorithms exist that are able to discover various types of patterns in large datasets. Unfortunately, the problem of identifying patterns that are genuinely interesting to a particular user remains challenging. Current approaches generally require considerable data mining expertise or effort from the data analyst, and hence cannot be used by typical domain experts. To address this, we introduce a generic framework for interactive learning of userspecific pattern ranking functions. The user is only asked to rank small sets of patterns, while a ranking function is inferred from this feedback by preference learning techniques. Moreover, we propose a number of active learning heuristics to minimize the effort required from the user, while ensuring that accurate rankings are obtained. We show how the learned ranking functions can be used to mine new, more interesting patterns. We demonstrate two concrete instances of our framework for two different pattern mining tasks, frequent itemset mining and subgroup discovery. We empirically evaluate the capacity of the algorithm to learn pattern rankings by emulating users. Experiments demonstrate that the system is able to learn accurate rankings, and that the active learning heuristics help reduce the required user effort. Furthermore, using the learned ranking functions as search heuristics allows discovering patterns of higher quality than those in the initial set. This shows that machine learning techniques in general, and active preference learning in particular, are promising building blocks for interactive data mining systems.
Collapse
Affiliation(s)
- Vladimir Dzyuba
- Department of Computer Science, KU Leuven, Celestijnenlaan 200A – bus 2402, Leuven, 3000, Belgium
| | - Matthijs van Leeuwen
- Department of Computer Science, KU Leuven, Celestijnenlaan 200A – bus 2402, Leuven, 3000, Belgium
| | - Siegfried Nijssen
- Department of Computer Science, KU Leuven, Celestijnenlaan 200A – bus 2402, Leuven, 3000, Belgium
| | - Luc De Raedt
- Department of Computer Science, KU Leuven, Celestijnenlaan 200A – bus 2402, Leuven, 3000, Belgium
| |
Collapse
|
19
|
Abstract
Traditional approaches to community detection, as studied by physicists, sociologists, and more recently computer scientists, aim at simply partitioning the social network graph. However, with the advent of online social networking sites, richer data has become available: beyond the link information, each user in the network is annotated with additional information, for example, demographics, shopping behavior, or interests. In this context, it is therefore important to develop mining methods which can take advantage of all available information. In the case of community detection, this means finding
good communities
(a set of nodes cohesive in the social graph) which are associated with
good descriptions
in terms of user information (node attributes).
Having good descriptions associated to our models make them understandable by domain experts and thus more useful in real-world applications. Another requirement dictated by real-world applications, is to develop methods that can use, when available, any domain-specific background knowledge. In the case of community detection the background knowledge could be a vague description of the communities sought in a specific application, or some prototypical nodes (e.g., good customers in the past), that represent what the analyst is looking for (a community of similar users).
Towards this goal, in this article, we define and study the problem of finding a diverse set of cohesive communities with concise descriptions. We propose an effective algorithm that alternates between two phases: a hill-climbing phase producing (possibly overlapping) communities, and a description induction phase which uses techniques from supervised pattern set mining. Our framework has the nice feature of being able to build well-described cohesive communities starting from any given description or seed set of nodes, which makes it very flexible and easily applicable in real-world applications.
Our experimental evaluation confirms that the proposed method discovers cohesive communities with concise descriptions in realistic and large online social networks such as D
elicious
, F
lickr
, and L
ast
FM.
Collapse
|
20
|
Brauer M, van Leeuwen M, Janssen E, Newhouse SK, Heiman JR, Laan E. Attentional and affective processing of sexual stimuli in women with hypoactive sexual desire disorder. Arch Sex Behav 2012; 41:891-905. [PMID: 21892693 DOI: 10.1007/s10508-011-9820-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2010] [Revised: 06/06/2011] [Accepted: 06/06/2011] [Indexed: 05/31/2023]
Abstract
Hypoactive sexual desire disorder (HSDD) is the most common sexual problem in women. From an incentive motivation perspective, HSDD may be the result of a weak association between sexual stimuli and rewarding experiences. As a consequence, these stimuli may either lose or fail to acquire a positive meaning, resulting in a limited number of incentives that have the capacity to elicit a sexual response. According to current information processing models of sexual arousal, sexual stimuli automatically activate meanings and if these are not predominantly positive, processes relevant to the activation of sexual arousal and desire may be interrupted. Premenopausal U.S. and Dutch women with acquired HSDD (n = 42) and a control group of sexually functional women (n = 42) completed a single target Implicit Association Task and a Picture Association Task assessing automatic affective associations with sexual stimuli and a dot detection task measuring attentional capture by sexual stimuli. Results showed that women with acquired HSDD displayed less positive (but not more negative) automatic associations with sexual stimuli than sexually functional women. The same pattern was found for self-reported affective sex-related associations. Participants were slower to detect targets in the dot detection task that replaced sexual images, irrespective of sexual function status. As such, the findings point to the relevance of affective processing of sexual stimuli in women with HSDD, and imply that the treatment of HSDD might benefit from a stronger emphasis on the strengthening of the association between sexual stimuli and positive meaning and sexual reward.
Collapse
Affiliation(s)
- Marieke Brauer
- Department of Sexology and Psychosomatic Obstetrics and Gynecology, University of Amsterdam, Amsterdam, The Netherlands.
| | | | | | | | | | | |
Collapse
|
21
|
|
22
|
|