1
|
Eckle K, Schmidt-Hieber J. A comparison of deep networks with ReLU activation function and linear spline-type methods. Neural Netw 2018; 110:232-242. [PMID: 30616095 DOI: 10.1016/j.neunet.2018.11.005] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Revised: 11/16/2018] [Accepted: 11/20/2018] [Indexed: 10/27/2022]
Abstract
Deep neural networks (DNNs) generate much richer function spaces than shallow networks. Since the function spaces induced by shallow networks have several approximation theoretic drawbacks, this explains, however, not necessarily the success of deep networks. In this article we take another route by comparing the expressive power of DNNs with ReLU activation function to linear spline methods. We show that MARS (multivariate adaptive regression splines) is improper learnable by DNNs in the sense that for any given function that can be expressed as a function in MARS with M parameters there exists a multilayer neural network with O(Mlog(M∕ε)) parameters that approximates this function up to sup-norm error ε. We show a similar result for expansions with respect to the Faber-Schauder system. Based on this, we derive risk comparison inequalities that bound the statistical risk of fitting a neural network by the statistical risk of spline-based methods. This shows that deep networks perform better or only slightly worse than the considered spline methods. We provide a constructive proof for the function approximations.
Collapse
|
Journal Article |
7 |
89 |
2
|
Becker W, Saisana M, Paruolo P, Vandecasteele I. Weights and importance in composite indicators: Closing the gap. ECOLOGICAL INDICATORS 2017; 80:12-22. [PMID: 28867964 PMCID: PMC5473177 DOI: 10.1016/j.ecolind.2017.03.056] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2017] [Revised: 03/30/2017] [Accepted: 03/30/2017] [Indexed: 05/20/2023]
Abstract
Composite indicators are very popular tools for assessing and ranking countries and institutions in terms of environmental performance, sustainability, and other complex concepts that are not directly measurable. Because of the stakes that come with the media attention of these tools, a word of caution is warranted. One common misconception relates to the effect of the weights assigned to indicators during the aggregation process. This work presents a novel series of tools that allow developers and users of composite indicators to explore effects of these weights. First, the importance of each indicator to the composite is measured by the nonlinear Pearson correlation ratio, estimated by Bayesian Gaussian processes. Second, the effect of each indicator is isolated from that of other indicators using regression analysis, and examined in detail. Finally, an optimisation procedure is proposed which allows weights to be fitted to agree with pre-specified values of importance. These three tools together give developers considerable insight into the effects of weights and suggest possibilities for refining and simplifying the aggregation. The added value of these tools are shown on three case studies: the Resource Governance Index, the Good Country Index, and the Water Retention Index.
Collapse
|
research-article |
8 |
63 |
3
|
Jacobson NC, Chow SM, Newman MG. The Differential Time-Varying Effect Model (DTVEM): A tool for diagnosing and modeling time lags in intensive longitudinal data. Behav Res Methods 2019; 51:295-315. [PMID: 30120682 PMCID: PMC6395514 DOI: 10.3758/s13428-018-1101-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
With the recent growth in intensive longitudinal designs and the corresponding demand for methods to analyze such data, there has never been a more pressing need for user-friendly analytic tools that can identify and estimate optimal time lags in intensive longitudinal data. The available standard exploratory methods to identify optimal time lags within univariate and multivariate multiple-subject time series are greatly underpowered at the group (i.e., population) level. We describe a hybrid exploratory-confirmatory tool, referred to herein as the Differential Time-Varying Effect Model (DTVEM), which features a convenient user-accessible function to identify optimal time lags and estimate these lags within a state-space framework. Data from an empirical ecological momentary assessment study are then used to demonstrate the utility of the proposed tool in identifying the optimal time lag for studying the linkages between nervousness and heart rate in a group of undergraduate students. Using a simulation study, we illustrate the effectiveness of DTVEM in identifying optimal lag structures in multiple-subject time-series data with missingness, as well as its strengths and limitations as a hybrid exploratory-confirmatory approach, relative to other existing approaches.
Collapse
|
Research Support, N.I.H., Extramural |
6 |
30 |
4
|
Chen Y, Goldsmith J, Ogden T. Variable Selection in Function-on-Scalar Regression. Stat (Int Stat Inst) 2016; 5:88-101. [PMID: 27429751 DOI: 10.1002/sta4.106] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
For regression models with functional responses and scalar predictors, it is common for the number of predictors to be large. Despite this, few methods for variable selection exist for function-on-scalar models, and none account for the inherent correlation of residual curves in such models. By expanding the coefficient functions using a B-spline basis, we pose the function-on-scalar model as a multivariate regression problem. Spline coefficients are grouped within coefficient function, and group-minimax concave penalty (MCP) is used for variable selection. We adapt techniques from generalized least squares to account for residual covariance by "pre-whitening" using an estimate of the covariance matrix, and establish theoretical properties for the resulting estimator. We further develop an iterative algorithm that alternately updates the spline coefficients and covariance; simulation results indicate that this iterative algorithm often performs as well as pre-whitening using the true covariance, and substantially outperforms methods that neglect the covariance structure. We apply our method to two-dimensional planar reaching motions in a study of the effects of stroke severity on motor control, and find that our method provides lower prediction errors than competing methods.
Collapse
|
Journal Article |
9 |
26 |
5
|
Engineering Analysis of Tricuspid Annular Dynamics in the Beating Ovine Heart. Ann Biomed Eng 2017; 46:443-451. [PMID: 29139013 DOI: 10.1007/s10439-017-1961-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 11/10/2017] [Indexed: 12/11/2022]
Abstract
Functional tricuspid regurgitation is a significant source of morbidity and mortality in the US. Furthermore, treatment of functional tricuspid regurgitation is suboptimal with significant recurrence rates, which may, at least in part, be due to our limited knowledge of the relationship between valvular shape and function. Here we study the dynamics of the healthy in vivo ovine tricuspid annulus to improve our understanding of normal annular deformations throughout the cardiac cycle. To this end, we determine both clinical as well as engineering metrics of in vivo annular dynamics based on sonomicrometry crystals surgically attached to the annulus. We confirm that the tricuspid annulus undergoes large dynamic changes in area, perimeter, height, and eccentricity throughout the cardiac cycle. This deformation may be described as asymmetric in-plane motion of the annulus with minor out-of-plane motion. In addition, we employ strain and curvature to provide mechanistic insight into the origin of this deformation. Specifically, we find that strain and curvature vary considerable across the annulus with highly localized minima and maxima resulting in aforementioned configurational changes throughout the cardiac cycle. It is our hope that these data provide valuable information for clinicians and engineers alike and ultimately help us improve treatment of functional tricuspid regurgitation.
Collapse
|
Journal Article |
8 |
25 |
6
|
Li Y, Guan Y. Functional Principal Component Analysis of Spatio-Temporal Point Processes with Applications in Disease Surveillance. J Am Stat Assoc 2014; 109:1205-1215. [PMID: 25368436 PMCID: PMC4215517 DOI: 10.1080/01621459.2014.885434] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Revised: 11/01/2013] [Indexed: 10/25/2022]
Abstract
In disease surveillance applications, the disease events are modeled by spatio-temporal point processes. We propose a new class of semiparametric generalized linear mixed model for such data, where the event rate is related to some known risk factors and some unknown latent random effects. We model the latent spatio-temporal process as spatially correlated functional data, and propose Poisson maximum likelihood and composite likelihood methods based on spline approximations to estimate the mean and covariance functions of the latent process. By performing functional principal component analysis to the latent process, we can better understand the correlation structure in the point process. We also propose an empirical Bayes method to predict the latent spatial random effects, which can help highlight hot areas with unusually high event rates. Under an increasing domain and increasing knots asymptotic framework, we establish the asymptotic distribution for the parametric components in the model and the asymptotic convergence rates for the functional principal component estimators. We illustrate the methodology through a simulation study and an application to the Connecticut Tumor Registry data.
Collapse
|
research-article |
11 |
21 |
7
|
Singh N, Vialard FX, Niethammer M. Splines for diffeomorphisms. Med Image Anal 2015; 25:56-71. [PMID: 25980676 DOI: 10.1016/j.media.2015.04.012] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2014] [Revised: 04/06/2015] [Accepted: 04/09/2015] [Indexed: 11/16/2022]
Abstract
This paper develops a method for higher order parametric regression on diffeomorphisms for image regression. We present a principled way to define curves with nonzero acceleration and nonzero jerk. This work extends methods based on geodesics which have been developed during the last decade for computational anatomy in the large deformation diffeomorphic image analysis framework. In contrast to previously proposed methods to capture image changes over time, such as geodesic regression, the proposed method can capture more complex spatio-temporal deformations. We take a variational approach that is governed by an underlying energy formulation, which respects the nonflat geometry of diffeomorphisms. Such an approach of minimal energy curve estimation also provides a physical analogy to particle motion under a varying force field. This gives rise to the notion of the quadratic, the cubic and the piecewise cubic splines on the manifold of diffeomorphisms. The variational formulation of splines also allows for the use of temporal control points to control spline behavior. This necessitates the development of a shooting formulation for splines. The initial conditions of our proposed shooting polynomial paths in diffeomorphisms are analogous to the Euclidean polynomial coefficients. We experimentally demonstrate the effectiveness of using the parametric curves both for synthesizing polynomial paths and for regression of imaging data. The performance of the method is compared to geodesic regression.
Collapse
|
Research Support, U.S. Gov't, Non-P.H.S. |
10 |
16 |
8
|
Lobachev O, Ulrich C, Steiniger BS, Wilhelmi V, Stachniss V, Guthe M. Feature-based multi-resolution registration of immunostained serial sections. Med Image Anal 2016; 35:288-302. [PMID: 27494805 DOI: 10.1016/j.media.2016.07.010] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Revised: 07/03/2016] [Accepted: 07/21/2016] [Indexed: 10/21/2022]
Abstract
The form and exact function of the blood vessel network in some human organs, like spleen and bone marrow, are still open research questions in medicine. In this paper, we propose a method to register the immunohistological stainings of serial sections of spleen and bone marrow specimens to enable the visualization and visual inspection of blood vessels. As these vary much in caliber, from mesoscopic (millimeter-range) to microscopic (few micrometers, comparable to a single erythrocyte), we need to utilize a multi-resolution approach. Our method is fully automatic; it is based on feature detection and sparse matching. We utilize a rigid alignment and then a non-rigid deformation, iteratively dealing with increasingly smaller features. Our tool pipeline can already deal with series of complete scans at extremely high resolution, up to 620 megapixels. The improvement presented increases the range of represented details up to smallest capillaries. This paper provides details on the multi-resolution non-rigid registration approach we use. Our application is novel in the way the alignment and subsequent deformations are computed (using features, i.e. "sparse"). The deformations are based on all images in the stack ("global"). We also present volume renderings and a 3D reconstruction of the vascular network in human spleen and bone marrow on a level not possible before. Our registration makes easy tracking of even smallest blood vessels possible, thus granting experts a better comprehension. A quantitative evaluation of our method and related state of the art approaches with seven different quality measures shows the efficiency of our method. We also provide z-profiles and enlarged volume renderings from three different registrations for visual inspection.
Collapse
|
Journal Article |
9 |
13 |
9
|
Zaitsu M, Kawachi I, Takeuchi T, Kobayashi Y. Alcohol consumption and risk of upper-tract urothelial cancer. Cancer Epidemiol 2017; 48:36-40. [PMID: 28364670 DOI: 10.1016/j.canep.2017.03.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Revised: 02/27/2017] [Accepted: 03/13/2017] [Indexed: 11/16/2022]
Abstract
BACKGROUND Upper-tract urothelial cancer (UTUC), which includes renal pelvic cancer and ureter cancer, is a rare cancer and its prognosis is poor. Smoking and high-risk occupations (e.g., printing and dyestuff working which involves exposure to aniline dyes) are well-known risk factors for UTUC. However, the risk of alcohol consumption in UTUC remains unclear. This study aimed to determine whether alcohol consumption is an independent risk factor for UTUC. METHODS The study was a case-control study which used the nationwide clinical inpatient database of the Rosai Hospital group in Japan. We identified 1569 cases and 506,797 controls between 1984 and 2014. We estimated the odds ratio (OR) and 95% confidence interval (95%CI) of alcohol consumption for UTUC - never, up to 15g/day, >15-30g/day, or >30g/day - using unconditional logistic regression. We adjusted for the following covariates: age, sex, study period, hospital, history of smoking, and high-risk occupation. RESULTS The risk of UTUC was significantly higher in ever-drinkers compared with never-drinkers (OR=1.23, 95%CI, 1.08-1.40; P=0.001). Compared with never-drinkers, the risk threshold for UTUC was >15g of alcohol consumption per day (equivalent to 6 ounces of Japanese sake containing 23g of alcohol). A dose-response was observed (P<0.001). CONCLUSION Alcohol consumption may be an independent risk factor for UTUC, with a low-risk threshold of 15g of alcohol per day.
Collapse
|
Journal Article |
8 |
13 |
10
|
Goldstein BA, Assimes T, Winkelmayer WC, Hastie T. Detecting clinically meaningful biomarkers with repeated measurements: An illustration with electronic health records. Biometrics 2015; 71:478-86. [PMID: 25652566 DOI: 10.1111/biom.12283] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Revised: 11/01/2014] [Accepted: 12/01/2014] [Indexed: 11/27/2022]
Abstract
Data sources with repeated measurements are an appealing resource to understand the relationship between changes in biological markers and risk of a clinical event. While longitudinal data present opportunities to observe changing risk over time, these analyses can be complicated if the measurement of clinical metrics is sparse and/or irregular, making typical statistical methods unsuitable. In this article, we use electronic health record (EHR) data as an example to present an analytic procedure to both create an analytic sample and analyze the data to detect clinically meaningful markers of acute myocardial infarction (MI). Using an EHR from a large national dialysis organization we abstracted the records of 64,318 individuals and identified 4769 people that had an MI during the study period. We describe a nested case-control design to sample appropriate controls and an analytic approach using regression splines. Fitting a mixed-model with truncated power splines we perform a series of goodness-of-fit tests to determine whether any of 11 regularly collected laboratory markers are useful clinical predictors. We test the clinical utility of each marker using an independent test set. The results suggest that EHR data can be easily used to detect markers of clinically acute events. Special software or analytic tools are not needed, even with irregular EHR data.
Collapse
|
Research Support, U.S. Gov't, Non-P.H.S. |
10 |
10 |
11
|
Cecchin D, Poggiali D, Riccardi L, Turco P, Bui F, De Marchi S. Analytical and experimental FWHM of a gamma camera: theoretical and practical issues. PeerJ 2015; 3:e722. [PMID: 25674361 PMCID: PMC4319318 DOI: 10.7717/peerj.722] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2014] [Accepted: 12/19/2014] [Indexed: 11/20/2022] Open
Abstract
Introduction. It is well known that resolution on a gamma camera varies as a function of distance, scatter and the camera’s characteristics (collimator type, crystal thickness, intrinsic resolution etc). Manufacturers frequently provide only a few pre-calculated resolution values (using a line source in air, 10–15 cm from the collimator surface and without scattering). However, these are typically not obtained in situations resembling a clinical setting. From a diagnostic point of view, it is useful to know the expected resolution of a gamma camera at a given distance from the collimator surface for a particular setting in order to decide whether it is worth scanning patients with “small lesion” or not. When dealing with absolute quantification it is also mandatory to know precisely the expected resolution and its uncertainty in order to make appropriate corrections. Aim. Our aims are: to test a novel mathematical approach, the cubic spline interpolation, for the extraction of the full width at half maximum (FWHM) from the acquisition of a line source (experimental resolution) also considering measurement uncertainty; to compare it with the usually adopted methods such as the gaussian approach; to compare it with the theoretical resolution (analytical resolution) of a gamma camera at different distances; to create a web-based educational program with which to test these theories. Methods. Three mathematical methods (direct calculation, global interpolation using gaussian and local interpolation using splines) for calculating FWHM from a line source (planar scintigraphy) were tested and compared. A NEMA Triple Line Source Phantom was used to obtain static images both in air and with different scattering levels. An advanced, open-source software (MATLAB/Octave and PHP based) was created “ad hoc” to obtain and compare FWHM values and relative uncertainty. Results and Conclusion. Local interpolation using splines proved faster and more reliable than the usually-adopted Gaussian interpolation. The proposed freely available software proved effective in assessing both FWHM and its uncertainty.
Collapse
|
Journal Article |
10 |
8 |
12
|
Nethery RC, Mealli F, Dominici F. ESTIMATING POPULATION AVERAGE CAUSAL EFFECTS IN THE PRESENCE OF NON-OVERLAP: THE EFFECT OF NATURAL GAS COMPRESSOR STATION EXPOSURE ON CANCER MORTALITY. Ann Appl Stat 2019; 13:1242-1267. [PMID: 31346355 PMCID: PMC6658123 DOI: 10.1214/18-aoas1231] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Most causal inference studies rely on the assumption of overlap to estimate population or sample average causal effects. When data suffer from non-overlap, estimation of these estimands requires reliance on model specifications, due to poor data support. All existing methods to address non-overlap, such as trimming or down-weighting data in regions of poor data support, change the estimand so that inference cannot be made on the sample or the underlying population. In environmental health research settings, where study results are often intended to influence policy, population-level inference may be critical, and changes in the estimand can diminish the impact of the study results, because estimates may not be representative of effects in the population of interest to policymakers. Researchers may be willing to make additional, minimal modeling assumptions in order to preserve the ability to estimate population average causal effects. We seek to make two contributions on this topic. First, we propose a flexible, data-driven definition of propensity score overlap and non-overlap regions. Second, we develop a novel Bayesian framework to estimate population average causal effects with minor model dependence and appropriately large uncertainties in the presence of non-overlap and causal effect heterogeneity. In this approach, the tasks of estimating causal effects in the overlap and non-overlap regions are delegated to two distinct models, suited to the degree of data support in each region. Tree ensembles are used to non-parametrically estimate individual causal effects in the overlap region, where the data can speak for themselves. In the non-overlap region, where insufficient data support means reliance on model specification is necessary, individual causal effects are estimated by extrapolating trends from the overlap region via a spline model. The promising performance of our method is demonstrated in simulations. Finally, we utilize our method to perform a novel investigation of the causal effect of natural gas compressor station exposure on cancer outcomes. Code and data to implement the method and reproduce all simulations and analyses is available on Github (https://github.com/rachelnethery/overlap).
Collapse
|
research-article |
6 |
8 |
13
|
Goldstein BA, Chang TI, Winkelmayer WC. Classifying individuals based on a densely captured sequence of vital signs: An example using repeated blood pressure measurements during hemodialysis treatment. J Biomed Inform 2015; 57:219-24. [PMID: 26277118 DOI: 10.1016/j.jbi.2015.08.010] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Revised: 06/18/2015] [Accepted: 08/07/2015] [Indexed: 10/23/2022]
Abstract
Electronic Health Records (EHRs) present the opportunity to observe serial measurements on patients. While potentially informative, analyzing these data can be challenging. In this work we present a means to classify individuals based on a series of measurements collected by an EHR. Using patients undergoing hemodialysis, we categorized people based on their intradialytic blood pressure. Our primary criteria were that the classifications were time dependent and independent of other subjects. We fit a curve of intradialytic blood pressure using regression splines and then calculated first and second derivatives to come up with four mutually exclusive classifications at different time points. We show that these classifications relate to near term risk of cardiac events and are moderately stable over a succeeding two-week period. This work has general application for analyzing dense EHR data.
Collapse
|
Journal Article |
10 |
8 |
14
|
Kim TH, Han E. Impact of body mass on job quality. ECONOMICS AND HUMAN BIOLOGY 2015; 17:75-85. [PMID: 25765221 DOI: 10.1016/j.ehb.2015.02.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Revised: 02/01/2015] [Accepted: 02/02/2015] [Indexed: 06/04/2023]
Abstract
The current study explores the association between body mass and job quality, a composite measurement of job characteristics, for adults. We use nationally representative data from the Korean Labor and Income Panel Study for the years 2005, 2007, and 2008 with 7282 person-year observations for men and 4611 for women. A Quality of Work Index (QWI) is calculated based on work content, job security, the possibilities for improvement, compensation, work conditions, and interpersonal relationships at work. The key independent variable is the body mass index (kg/m(2)) splined at 18.5, 25, and 30. For men, BMI is positively associated with the QWI only in the normal weight segment (+0.19 percentage points at the 10th, +0.28 at the 50th, +0.32 at the 75th, +0.34 at the 90th, and +0.48 at the 95th quantiles). A unit increase in the BMI for women is associated with a lower QWI at the lower quantiles in the normal weight segment (-0.28 at the 5th, -0.19 at the 10th, and -0.25 percentage points at the 25th quantiles) and at the upper quantiles in the overweight segment (-1.15 at the 90th and -1.66 percentage points at the 95th quantiles). The results imply a spill-over cost of overweight or obesity beyond its impact on health in terms of success in the labor market.
Collapse
|
|
10 |
8 |
15
|
Belotti JT, Castanho DS, Araujo LN, da Silva LV, Alves TA, Tadano YS, Stevan SL, Corrêa FC, Siqueira HV. Air pollution epidemiology: A simplified Generalized Linear Model approach optimized by bio-inspired metaheuristics. ENVIRONMENTAL RESEARCH 2020; 191:110106. [PMID: 32882238 DOI: 10.1016/j.envres.2020.110106] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 07/30/2020] [Accepted: 08/18/2020] [Indexed: 06/11/2023]
Abstract
Studies in air pollution epidemiology are of paramount importance in diagnosing and improve life quality. To explore new methods or modify existing ones is critical to obtain better results. Most air pollution epidemiology studies use the Generalized Linear Model, especially the default version of R, Splus, SAS, and Stata softwares, which use maximum likelihood estimators in parameter optimization. Also, a smooth time function (usually spline) is generally used as a pre-processing step to consider seasonal and long-term tendencies. This investigation introduces a new approach to GLM, proposing the estimation of the free coefficients through bio-inspired metaheuristics - Particle Swarm Optimization (PSO), Genetic Algorithms, and Differential Evolution, as well as the replacement of the spline function by a simple normalization procedure. The considered case studies comprise three important cities of São Paulo state, Brazil with distinct characteristics: São Paulo, Campinas, and Cubatão. We considered the impact of particles with an aerodynamic diameter less than 10 μm (PM10), ambient temperature, and relative humidity in the number of hospital admissions for respiratory diseases (ICD-10, J00 to J99). The results showed that the new approach (especially PSO) brings performance gains compared to the default version of statistical software like R.
Collapse
|
|
5 |
5 |
16
|
Brizzi F, Birrell PJ, Plummer MT, Kirwan P, Brown AE, Delpech VC, Gill ON, De Angelis D. Extending Bayesian back-calculation to estimate age and time specific HIV incidence. LIFETIME DATA ANALYSIS 2019; 25:757-780. [PMID: 30811019 PMCID: PMC6776486 DOI: 10.1007/s10985-019-09465-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Accepted: 01/29/2019] [Indexed: 06/09/2023]
Abstract
CD4-based multi-state back-calculation methods are key for monitoring the HIV epidemic, providing estimates of HIV incidence and diagnosis rates by disentangling their inter-related contribution to the observed surveillance data. This paper, extends existing approaches to age-specific settings, permitting the joint estimation of age- and time-specific incidence and diagnosis rates and the derivation of other epidemiological quantities of interest. This allows the identification of specific age-groups at higher risk of infection, which is crucial in directing public health interventions. We investigate, through simulation studies, the suitability of various bivariate splines for the non-parametric modelling of the latent age- and time-specific incidence and illustrate our method on routinely collected data from the HIV epidemic among gay and bisexual men in England and Wales.
Collapse
|
research-article |
6 |
4 |
17
|
Forsare C, Bak M, Falck AK, Grabau D, Killander F, Malmström P, Rydén L, Stål O, Sundqvist M, Bendahl PO, Fernö M. Non-linear transformations of age at diagnosis, tumor size, and number of positive lymph nodes in prediction of clinical outcome in breast cancer. BMC Cancer 2018; 18:1226. [PMID: 30526533 PMCID: PMC6286551 DOI: 10.1186/s12885-018-5123-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 11/22/2018] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Prognostic factors in breast cancer are often measured on a continuous scale, but categorized for clinical decision-making. The primary aim of this study was to evaluate if accounting for continuous non-linear effects of the three factors age at diagnosis, tumor size, and number of positive lymph nodes improves prognostication. These factors will most likely be included in the management of breast cancer patients also in the future, after an expected implementation of gene expression profiling for adjuvant treatment decision-making. METHODS Four thousand four hundred forty seven and 1132 women with primary breast cancer constituted the derivation and validation set, respectively. Potential non-linear effects on the log hazard of distant recurrences of the three factors were evaluated during 10 years of follow-up. Cox-models of successively increasing complexity: dichotomized predictors, predictors categorized into three or four groups, and predictors transformed using fractional polynomials (FPs) or restricted cubic splines (RCS), were used. Predictive performance was evaluated by Harrell's C-index. RESULTS Using FP-transformations, non-linear effects were detected for tumor size and number of positive lymph nodes in univariable analyses. For age, non-linear transformations did, however, not improve the model fit significantly compared to the linear identity transformation. As expected, the C-index increased with increasing model complexity for multivariable models including the three factors. By allowing more than one cut-point per factor, the C-index increased from 0.628 to 0.674. The additional gain, as measured by the C-index, when using FP- or RCS-transformations was modest (0.695 and 0.696, respectively). The corresponding C-indices for these four models in the validation set, based on the same transformations and parameter estimates from the derivation set, were 0.675, 0.700, 0.706, and 0.701. CONCLUSIONS Categorization of each factor into three to four groups was found to improve prognostication compared to dichotomization. The additional gain by allowing continuous non-linear effects modeled by FPs or RCS was modest. However, the continuous nature of these transformations has the advantage of making it possible to form risk groups of any size.
Collapse
|
Randomized Controlled Trial |
7 |
2 |
18
|
Hung CM, Huang YM, Chang MS. Alignment using genetic programming with causal trees for identification of protein functions. NONLINEAR ANALYSIS, THEORY, METHODS & APPLICATIONS 2006; 65:1070-1093. [PMID: 32288048 PMCID: PMC7117053 DOI: 10.1016/j.na.2005.09.048] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
A hybrid evolutionary model is used to propose a hierarchical homology of protein sequences to identify protein functions systematically. The proposed model offers considerable potentials, considering the inconsistency of existing methods for predicting novel proteins. Because some novel proteins might align without meaningful conserved domains, maximizing the score of sequence alignment is not the best criterion for predicting protein functions. This work presents a decision model that can minimize the cost of making a decision for predicting protein functions using the hierarchical homologies. Particularly, the model has three characteristics: (i) it is a hybrid evolutionary model with multiple fitness functions that uses genetic programming to predict protein functions on a distantly related protein family, (ii) it incorporates modified robust point matching to accurately compare all feature points using the moment invariant and thin-plate spline theorems, and (iii) the hierarchical homologies holding up a novel protein sequence in the form of a causal tree can effectively demonstrate the relationship between proteins. This work describes the comparisons of nucleocapsid proteins from the putative polyprotein SARS virus and other coronaviruses in other hosts using the model.
Collapse
|
research-article |
19 |
2 |
19
|
Jensen RK, Clements M, Gjærde LK, Jakobsen LH. Fitting parametric cure models in R using the packages cuRe and rstpm2. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 226:107125. [PMID: 36126436 DOI: 10.1016/j.cmpb.2022.107125] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 09/09/2022] [Accepted: 09/10/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND AND OBJECTIVE Within medical research, cure models are useful for analyzing time-to-event data in the scenario where a proportion of the analyzed individuals are expected to never experience the event of interest. Cure models are also useful for modelling the relative survival in scenarios where a proportion of the individuals are expected to eventually experience a mortality rate similar to that of the general population. Here we present two R packages, cuRe and rstpm2, that provide researchers with several tools for performing statistical inference using parametric cure models. METHODS Cure models are commonly used to estimate 1) the proportion of individuals that are cured and 2) the event-time distribution of individuals who are not cured. This can be done using simple parametric distributions for the event-time distribution of the uncured, but our implementations also enable fitting of more flexible spline-based cure models. The parametric framework of both packages ensures that cure models for the relative survival can easily be used. RESULTS The cuRe package contains two main functions for estimating parametric mixture cure models; one based on simple parametric distributions (e.g. Weibull or exponential) and one utilizing a spline-based formulation of the cure model. The rstpm2 package enables estimation of spline-based latent cure models, i.e., cure models with no explicit parameters modelling the proportion of cured individuals. CONCLUSIONS Through the R-packages cuRe and rstpm2, a wide range of different parametric cure models can be fitted. The cuRe package also contains a number of useful post-estimation procedures for computing the time to statistical cure and conditional probability of cure, which may spread the use of cure models in medical research.
Collapse
|
|
3 |
1 |
20
|
Abstract
One application of positron emission tomography (PET), a nuclear imaging technique, in neuroscience involves in vivo estimation of the density of various proteins (often, neuroreceptors) in the brain. PET scanning begins with the injection of a radiolabeled tracer that binds preferentially to the target protein; tracer molecules are then continuously delivered to the brain via the bloodstream. By detecting the radioactive decay of the tracer over time, dynamic PET data are constructed to reflect the concentration of the target protein in the brain at each time. The fundamental problem in the analysis of dynamic PET data involves estimating the impulse response function (IRF), which is necessary for describing the binding behavior of the injected radiotracer. Virtually all existing methods have three common aspects: summarizing the entire IRF with a single scalar measure; modeling each subject separately; and the imposition of parametric restrictions on the IRF. In contrast, we propose a functional data analytic approach that regards each subject's IRF as the basic analysis unit, models multiple subjects simultaneously, and estimates the IRF nonparametrically. We pose our model as a linear mixed effect model in which population level fixed effects and subject-specific random effects are expanded using a B-spline basis. Shrinkage and roughness penalties are incorporated in the model to enforce identifiability and smoothness of the estimated curves, respectively, while monotonicity and non-negativity constraints impose biological information on estimates. We illustrate this approach by applying it to clinical PET data with subjects belonging to three diagnosic groups. We explore differences among groups by means of pointwise confidence intervals of the estimated mean curves based on bootstrap samples.
Collapse
|
Journal Article |
7 |
1 |
21
|
Lee U, Carroll RJ, Marder K, Wang Y, Garcia TP. Estimating disease onset from change points of markers measured with error. Biostatistics 2021; 22:819-835. [PMID: 31999331 PMCID: PMC8596391 DOI: 10.1093/biostatistics/kxz068] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 12/27/2019] [Accepted: 12/29/2019] [Indexed: 11/13/2022] Open
Abstract
Huntington disease is an autosomal dominant, neurodegenerative disease without clearly identified biomarkers for when motor-onset occurs. Current standards to determine motor-onset rely on a clinician's subjective judgment that a patient's extrapyramidal signs are unequivocally associated with Huntington disease. This subjectivity can lead to error which could be overcome using an objective, data-driven metric that determines motor-onset. Recent studies of motor-sign decline-the longitudinal degeneration of motor-ability in patients-have revealed that motor-onset is closely related to an inflection point in its longitudinal trajectory. We propose a nonlinear location-shift marker model that captures this motor-sign decline and assesses how its inflection point is linked to other markers of Huntington disease progression. We propose two estimating procedures to estimate this model and its inflection point: one is a parametric method using nonlinear mixed effects model and the other one is a multi-stage nonparametric approach, which we developed. In an empirical study, the parametric approach was sensitive to correct specification of the mean structure of the longitudinal data. In contrast, our multi-stage nonparametric procedure consistently produced unbiased estimates regardless of the true mean structure. Applying our multi-stage nonparametric estimator to Neurobiological Predictors of Huntington Disease, a large observational study of Huntington disease, leads to earlier prediction of motor-onset compared to the clinician's subjective judgment.
Collapse
|
|
4 |
1 |
22
|
Egan C, Harris RJ, Mitchell HD, Desai M, Mandal S, De Angelis D. Analysing HCV incidence trends in people who inject drugs using serial behavioural and seroprevalence data: A modelling study. THE INTERNATIONAL JOURNAL OF DRUG POLICY 2024:104469. [PMID: 38880700 DOI: 10.1016/j.drugpo.2024.104469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 05/14/2024] [Accepted: 05/18/2024] [Indexed: 06/18/2024]
Abstract
INTRODUCTION The introduction of new direct-acting antivirals for hepatitis C virus (HCV) infection, has enabled the formulation of a HCV elimination strategy led by the World Health Organisation (WHO). Guidelines for elimination of HCV target a reduction in incidence, but this is difficult to measure and needs estimating. METHODS Serial cross-sectional bio-behavioural sero-surveys provide information on an individual's infection status and duration of exposure and how these change over time. These data can be used to estimate the rate of first infection through appropriate statistical models. This study utilised updated HCV seroprevalence information from the Unlinked Anonymous Monitoring survey, an annual survey of England, Wales and Northern Ireland monitoring the prevalence of blood borne viruses in people who inject drugs. Flexible parametric and semiparametric approaches, including fractional polynomials and splines, for estimating incidence rates by exposure time and survey year were implemented and compared. RESULTS Incidence rates were shown to peak in those recently initiating injecting drug use at approximately 0.20 infections per person-year followed by a rapid reduction in the subsequent few years of injecting to approximately 0.05 infections per person-year. There was evidence of a rise in incidence rates for recent initiates between 2011 and 2020 from 0.17 infections per person-year (95 % CI, 0.16-0.19) to 0.26 infections per person-year (0.23-0.30). In those injecting for longer durations, incidence rates were stable over time. CONCLUSIONS Fractional polynomials provided an adequate fit with relatively few parameters, but splines may be preferable to ensure flexibility, in particular, to detect short-term changes in the rate of first infection over time that may be a result of treatment effects. Although chronic HCV prevalence has declined with treatment scale up over 2016-2020, there is no evidence yet of a corresponding fall in the rate of first infection. Seroprevalence and risk behaviour data can be used to estimate and monitor HCV incidence, providing insight into progress towards WHO defined elimination of HCV.
Collapse
|
|
1 |
|
23
|
Gelir F, Chatla SB, Bhuiyan MS, Disbrow EA, Conrad SA, Vanchiere JA, Kevil CG, Gecili E, Bhuiyan MAN. Characterizing heterogeneity in Alzheimer's disease progression: a semiparametric model. Sci Rep 2025; 15:7660. [PMID: 40038506 DOI: 10.1038/s41598-025-92540-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Accepted: 02/28/2025] [Indexed: 03/06/2025] Open
Abstract
The progression of Alzheimer's disease (AD), a leading cause of dementia worldwide, is known for its variability and complexity, challenging the conventional methods of monitoring and predicting disease trajectories. This study introduces a semiparametric modeling approach to analyze longitudinal cognitive and imaging data. We studied two different outcome variables from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database: the Alzheimer's Disease Assessment Scale-Cognitive Subscale 13 (ADAS13) scores and ventricular volumes [Formula: see text]. Unlike traditional linear mixed effects models, semiparametric models do not assume a linear AD progression over time. Semiparametric models offer the advantage of capturing the non-linear features of AD progression, such as cognitive decline and neurodegeneration, represented by changes in ADAS13 scores and ventricular enlargement, respectively. By integrating regression splines and mixed modeling techniques, we provide a nuanced understanding of AD progression that captures the heterogeneity of disease trajectories. Our analysis reveals variations in the timing and degree of cognitive decline and neurodegeneration among AD patients, underlining the need for personalized approaches for monitoring and managing AD. This study's findings contribute to the modeling of AD progression and offer potential implications for interventions and prognostic assessments in clinical and research settings.
Collapse
|
|
1 |
|
24
|
Prot V, Aguilera HM, Skallerud B, Persson R, Urheim S. A method for non-invasive estimation of mitral valve annular regional strains. Comput Biol Med 2025; 187:109773. [PMID: 39929002 DOI: 10.1016/j.compbiomed.2025.109773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 01/09/2025] [Accepted: 01/29/2025] [Indexed: 02/12/2025]
Abstract
INTRODUCTION In this study, a method to assess local deformations along the mitral annulus curve is proposed. METHODS It employs the known global geometry (from echocardiography) of the annulus during the cardiac cycle which is approximated with a closed cubic spline to generate a smooth mathematical representation of the annulus at each available time point. A point-wise mapping of the annular geometries between two consecutive times is established by minimizing the global displacements of the annulus. Hence, the displacements of the mitral annulus are determined and used to calculate the regional strains along the annulus line. RESULTS Data obtained from sonomicrometric markers are used to test the method detailed herein. The results show that our method can predict annular displacements throughout the cardiac cycle. Strain values computed with this approach are in line with previously experimentally measured strains reported in the literature. Finally, the method is applied and illustrated on an echocardiographic recording of a healthy individual. CONCLUSION The numerical method provided can be used to capture the regional annular strains by echocardiography and may help to predict regional dysfunctions in the mitral annulus, providing information on the pathological mechanisms.
Collapse
|
|
1 |
|
25
|
Bolt MA, MaWhinney S, Pattee JW, Erlandson KM, Badesch DB, Peterson RA. Inference following multiple imputation for generalized additive models: an investigation of the median p-value rule with applications to the Pulmonary Hypertension Association Registry and Colorado COVID-19 hospitalization data. BMC Med Res Methodol 2022; 22:148. [PMID: 35597908 PMCID: PMC9123297 DOI: 10.1186/s12874-022-01613-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 04/21/2022] [Indexed: 11/10/2022] Open
Abstract
Background Missing data prove troublesome in data analysis; at best they reduce a study’s statistical power and at worst they induce bias in parameter estimates. Multiple imputation via chained equations is a popular technique for dealing with missing data. However, techniques for combining and pooling results from fitted generalized additive models (GAMs) after multiple imputation have not been well explored. Methods We simulated missing data under MCAR, MAR, and MNAR frameworks and utilized random forest and predictive mean matching imputation to investigate a variety of rules for combining GAMs after multiple imputation with binary and normally distributed outcomes. We compared multiple pooling procedures including the “D2” method, the Cauchy combination test, and the median p-value (MPV) rule. The MPV rule involves simply computing and reporting the median p-value across all imputations. Other ad hoc methods such as a mean p-value rule and a single imputation method are investigated. The viability of these methods in pooling results from B-splines is also examined for normal outcomes. An application of these various pooling techniques is then performed on two case studies, one which examines the effect of elevation on a six-minute walk distance (a normal outcome) for patients with pulmonary arterial hypertension, and the other which examines risk factors for intubation in hospitalized COVID-19 patients (a dichotomous outcome). Results In comparison to the results from generalized additive models fit on full datasets, the median p-value rule performs as well as if not better than the other methods examined. In situations where the alternative hypothesis is true, the Cauchy combination test appears overpowered and alternative methods appear underpowered, while the median p-value rule yields results similar to those from analyses of complete data. Conclusions For pooling results after fitting GAMs to multiply imputed datasets, the median p-value is a simple yet useful approach which balances both power to detect important associations and control of Type I errors. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-022-01613-w.
Collapse
|
|
3 |
|