1
|
Doubly robust evaluation of high-dimensional surrogate markers. Biostatistics 2023; 24:985-999. [PMID: 35791753 PMCID: PMC10801117 DOI: 10.1093/biostatistics/kxac020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 05/16/2022] [Accepted: 06/03/2022] [Indexed: 10/19/2023] Open
Abstract
When evaluating the effectiveness of a treatment, policy, or intervention, the desired measure of efficacy may be expensive to collect, not routinely available, or may take a long time to occur. In these cases, it is sometimes possible to identify a surrogate outcome that can more easily, quickly, or cheaply capture the effect of interest. Theory and methods for evaluating the strength of surrogate markers have been well studied in the context of a single surrogate marker measured in the course of a randomized clinical study. However, methods are lacking for quantifying the utility of surrogate markers when the dimension of the surrogate grows. We propose a robust and efficient method for evaluating a set of surrogate markers that may be high-dimensional. Our method does not require treatment to be randomized and may be used in observational studies. Our approach draws on a connection between quantifying the utility of a surrogate marker and the most fundamental tools of causal inference-namely, methods for robust estimation of the average treatment effect. This connection facilitates the use of modern methods for estimating treatment effects, using machine learning to estimate nuisance functions and relaxing the dependence on model specification. We demonstrate that our proposed approach performs well, demonstrate connections between our approach and certain mediation effects, and illustrate it by evaluating whether gene expression can be used as a surrogate for immune activation in an Ebola study.
Collapse
|
2
|
Using a population-based Kalman estimator to model the COVID-19 epidemic in France: estimating associations between disease transmission and non-pharmaceutical interventions. Int J Biostat 2023; 0:ijb-2022-0087. [PMID: 36607837 DOI: 10.1515/ijb-2022-0087] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 11/08/2022] [Indexed: 01/07/2023]
Abstract
In response to the COVID-19 pandemic caused by SARS-CoV-2, governments have adopted a wide range of non-pharmaceutical interventions (NPI). These include stringent measures such as strict lockdowns, closing schools, bars and restaurants, curfews, and barrier gestures such as mask-wearing and social distancing. Deciphering the effectiveness of each NPI is critical to responding to future waves and outbreaks. To this end, we first develop a dynamic model of the French COVID-19 epidemics over a one-year period. We rely on a global extended Susceptible-Infectious-Recovered (SIR) mechanistic model of infection that includes a dynamic transmission rate over time. Multilevel data across French regions are integrated using random effects on the parameters of the mechanistic model, boosting statistical power by multiplying integrated observation series. We estimate the parameters using a new population-based statistical approach based on a Kalman filter, used for the first time in analysing real-world data. We then fit the estimated time-varying transmission rate using a regression model that depends on the NPIs while accounting for vaccination coverage, the occurrence of variants of concern (VoC), and seasonal weather conditions. We show that all NPIs considered have an independent significant association with transmission rates. In addition, we show a strong association between weather conditions that reduces transmission in summer, and we also estimate increased transmissibility of VoC.
Collapse
|
3
|
Erratum: CD177, a specific marker of neutrophil activation, is associated with coronavirus disease 2019 severity and death. iScience 2022; 26:105715. [PMID: 36590178 PMCID: PMC9788918 DOI: 10.1016/j.isci.2022.105715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
[This corrects the article DOI: 10.1016/j.isci.2021.102711.].
Collapse
|
4
|
High-temporal resolution profiling reveals distinct immune trajectories following the first and second doses of COVID-19 mRNA vaccines. SCIENCE ADVANCES 2022; 8:eabp9961. [PMID: 36367935 PMCID: PMC9651857 DOI: 10.1126/sciadv.abp9961] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 09/26/2022] [Indexed: 05/31/2023]
Abstract
Knowledge of the mechanisms underpinning the development of protective immunity conferred by mRNA vaccines is fragmentary. Here, we investigated responses to coronavirus disease 2019 (COVID-19) mRNA vaccination via high-temporal resolution blood transcriptome profiling. The first vaccine dose elicited modest interferon and adaptive immune responses, which peaked on days 2 and 5, respectively. The second vaccine dose, in contrast, elicited sharp day 1 interferon, inflammation, and erythroid cell responses, followed by a day 5 plasmablast response. Both post-first and post-second dose interferon signatures were associated with the subsequent development of antibody responses. Yet, we observed distinct interferon response patterns after each of the doses that may reflect quantitative or qualitative differences in interferon induction. Distinct interferon response phenotypes were also observed in patients with COVID-19 and were associated with severity and differences in duration of intensive care. Together, this study also highlights the benefits of adopting high-frequency sampling protocols in profiling vaccine-elicited immune responses.
Collapse
|
5
|
The benefit of augmenting open data with clinical data-warehouse EHR for forecasting SARS-CoV-2 hospitalizations in Bordeaux area, France. JAMIA Open 2022; 5:ooac086. [DOI: 10.1093/jamiaopen/ooac086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 07/12/2022] [Accepted: 10/19/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
Objective
To develop an accurate regional forecast algorithm to predict the number of hospitalized patients and to assess the benefit of the Electronic Health Records (EHR) information to perform those predictions.
Materials and Methods
Aggregated data from SARS-CoV-2 and weather public database and data-warehouse of the Bordeaux hospital were extracted from 2020-05-16 to 2022-01-17. The outcomes were the number of hospitalized patients in the Bordeaux Hospital at 7 and 14 days. We compared the performance of different data sources, feature engineering and machine learning models.
Results
During the period of 88 weeks, 2561 hospitalizations due to COVID19 were recorded at the Bordeaux Hospital. The model achieving the best performance was an elastic-net penalized linear regression using all available data with a median relative error at 7 and 14 days of 0.136 [0.063; 0.223] and 0.198 [0.105; 0.302] hospitalizations, respectively. Electronic health records (EHRs) from the hospital data-warehouse improved median relative error at 7 and 14 days by 10.9 and 19.8%, respectively. Graphical evaluation showed remaining forecast error was mainly due to delay in slope shift detection.
Discussion
Forecast model showed overall good performance both at 7 and 14 days which were improved by the addition of the data from Bordeaux Hospital data-warehouse.
Conclusion
The development of hospital data-warehouse might help to get more specific and faster information than traditional surveillance system, which in turn will help to improve epidemic forecasting at a larger and finer scale.
LAY SUMMARY
The objective of this work was to develop a forecast algorithm to predict the number of hospitalized patients at Bordeaux Hospital. In addition, we assessed the benefit of the Electronic Health Records (EHRs) information to perform those predictions.
To perform this task, we used data between 2020-05-16 and 2022-01-17 from national database on SARS-CoV-2 epidemics, public database on weather and the data-warehouse of the Bordeaux hospital. The outcomes were the number of hospitalized patients in the Bordeaux Hospital at 7 and 14 days.
During the period of 88 weeks, 2561 hospitalizations due to COVID19 were recorded at the Bordeaux Hospital. The best model had an error of 13.6% at 7 days and 19.8% at 14 days. EHRs from the hospital data-warehouse improved the performance by 10% at 7 days and 20% at 14 days. Graphical evaluation showed remaining forecast error was mainly due to delay in slope shift detection.
Forecast model showed overall good performance which were improved by the addition of EHRs data. The development of hospital data-warehouse might help to get more specific and faster information than traditional surveillance system, which in turn will help to improve epidemic forecasting at a larger and finer scale.
Collapse
|
6
|
ATLAS: an automated association test using probabilistically linked health records with application to genetic studies. J Am Med Inform Assoc 2021; 28:2582-2592. [PMID: 34608931 DOI: 10.1093/jamia/ocab187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 08/14/2021] [Accepted: 08/22/2021] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVE Large amounts of health data are becoming available for biomedical research. Synthesizing information across databases may capture more comprehensive pictures of patient health and enable novel research studies. When no gold standard mappings between patient records are available, researchers may probabilistically link records from separate databases and analyze the linked data. However, previous linked data inference methods are constrained to certain linkage settings and exhibit low power. Here, we present ATLAS, an automated, flexible, and robust association testing algorithm for probabilistically linked data. MATERIALS AND METHODS Missing variables are imputed at various thresholds using a weighted average method that propagates uncertainty from probabilistic linkage. Next, estimated effect sizes are obtained using a generalized linear model. ATLAS then conducts the threshold combination test by optimally combining P values obtained from data imputed at varying thresholds using Fisher's method and perturbation resampling. RESULTS In simulations, ATLAS controls for type I error and exhibits high power compared to previous methods. In a real-world genetic association study, meta-analysis of ATLAS-enabled analyses on a linked cohort with analyses using an existing cohort yielded additional significant associations between rheumatoid arthritis genetic risk score and laboratory biomarkers. DISCUSSION Weighted average imputation weathers false matches and increases contribution of true matches to mitigate linkage error-induced bias. The threshold combination test avoids arbitrarily choosing a threshold to rule a match, thus automating linked data-enabled analyses and preserving power. CONCLUSION ATLAS promises to enable novel and powerful research studies using linked data to capitalize on all available data sources.
Collapse
|
7
|
Predicting the retinal content in omega-3 fatty acids for age-related macular-degeneration. Clin Transl Med 2021; 11:e404. [PMID: 34323423 PMCID: PMC8243522 DOI: 10.1002/ctm2.404] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 04/11/2021] [Accepted: 04/18/2021] [Indexed: 01/28/2023] Open
|
8
|
CD177, a specific marker of neutrophil activation, is associated with coronavirus disease 2019 severity and death. iScience 2021; 24:102711. [PMID: 34127958 PMCID: PMC8189740 DOI: 10.1016/j.isci.2021.102711] [Citation(s) in RCA: 68] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 03/26/2021] [Accepted: 06/08/2021] [Indexed: 01/03/2023] Open
Abstract
The identification of patients with coronavirus disease 2019 and high risk of severe disease is a challenge in routine care. We performed cell phenotypic, serum, and RNA sequencing gene expression analyses in severe hospitalized patients (n = 61). Relative to healthy donors, results showed abnormalities of 27 cell populations and an elevation of 42 cytokines, neutrophil chemo-attractants, and inflammatory components in patients. Supervised and unsupervised analyses revealed a high abundance of CD177, a specific neutrophil activation marker, contributing to the clustering of severe patients. Gene abundance correlated with high serum levels of CD177 in severe patients. Higher levels were confirmed in a second cohort and in intensive care unit (ICU) than non-ICU patients (P < 0.001). Longitudinal measurements discriminated between patients with the worst prognosis, leading to death, and those who recovered (P = 0.01). These results highlight neutrophil activation as a hallmark of severe disease and CD177 assessment as a reliable prognostic marker for routine care.
Collapse
|
9
|
Automatic phenotyping of electronical health record: PheVis algorithm. J Biomed Inform 2021; 117:103746. [PMID: 33746080 DOI: 10.1016/j.jbi.2021.103746] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 03/02/2021] [Accepted: 03/05/2021] [Indexed: 11/18/2022]
Abstract
Electronic Health Records (EHRs) often lack reliable annotation of patient medical conditions. Phenorm, an automated unsupervised algorithm to identify patient medical conditions from EHR data, has been developed. PheVis extends PheNorm at the visit resolution. PheVis combines diagnosis codes together with medical concepts extracted from medical notes, incorporating past history in a machine learning approach to provide an interpretable parametric predictor of the occurrence probability for a given medical condition at each visit. PheVis is applied to two real-world use-cases using the datawarehouse of the University Hospital of Bordeaux: i) rheumatoid arthritis, a chronic condition; ii) tuberculosis, an acute condition. Cross-validated AUROC were respectively 0.943 [0.940; 0.945] and 0.987 [0.983; 0.990]. Cross-validated AUPRC were respectively 0.754 [0.744; 0.763] and 0.299 [0.198; 0.403]. PheVis performs well for chronic conditions, though absence of exclusion of past medical history by natural language processing tools limits its performance in French for acute conditions. It achieves significantly better performance than state-of-the-art unsupervised methods especially for chronic diseases.
Collapse
|
10
|
Early signature in the blood lipidome associated with subsequent cognitive decline in the elderly: A case-control analysis nested within the Three-City cohort study. EBioMedicine 2021; 64:103216. [PMID: 33508744 PMCID: PMC7841305 DOI: 10.1016/j.ebiom.2021.103216] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 01/04/2021] [Accepted: 01/05/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Brain lipid metabolism appears critical for cognitive aging, but whether alterations in the lipidome relate to cognitive decline remains unclear at the system level. METHODS We studied participants from the Three-City study, a multicentric cohort of older persons, free of dementia at time of blood sampling, and who provided repeated measures of cognition over 12 subsequent years. We measured 189 serum lipids from 13 lipid classes using shotgun lipidomics in a case-control sample on cognitive decline (matched on age, sex and level of education) nested within the Bordeaux study center (discovery, n = 418). Associations with cognitive decline were investigated using bootstrapped penalized regression, and tested for validation in the Dijon study center (validation, n = 314). FINDINGS Among 17 lipids identified in the discovery stage, lower levels of the triglyceride TAG50:5, and of four membrane lipids (sphingomyelin SM40:2,2, phosphatidylethanolamine PE38:5(18:1/20:4), ether-phosphatidylethanolamine PEO34:3(16:1/18:2), and ether-phosphatidylcholine PCO34:1(16:1/18:0)), and higher levels of PCO32:0(16:0/16:0), were associated with greater odds of cognitive decline, and replicated in our validation sample. INTERPRETATION These findings indicate that in the blood lipidome of non-demented older persons, a specific profile of lipids involved in membrane fluidity, myelination, and lipid rafts, is associated with subsequent cognitive decline. FUNDING The complete list of funders is available at the end of the manuscript, in the Acknowledgement section.
Collapse
|
11
|
dearseq: a variance component score test for RNA-seq differential analysis that effectively controls the false discovery rate. NAR Genom Bioinform 2020; 2:lqaa093. [PMID: 33575637 PMCID: PMC7676475 DOI: 10.1093/nargab/lqaa093] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2019] [Revised: 10/14/2020] [Accepted: 10/23/2020] [Indexed: 12/20/2022] Open
Abstract
RNA-seq studies are growing in size and popularity. We provide evidence that the most commonly used methods for differential expression analysis (DEA) may yield too many false positive results in some situations. We present dearseq, a new method for DEA that controls the false discovery rate (FDR) without making any assumption about the true distribution of RNA-seq data. We show that dearseq controls the FDR while maintaining strong statistical power compared to the most popular methods. We demonstrate this behavior with mathematical proofs, simulations and a real data set from a study of tuberculosis, where our method produces fewer apparent false positives.
Collapse
|
12
|
Immune Alterations in a Patient with SARS-CoV-2-Related Acute Respiratory Distress Syndrome. J Clin Immunol 2020; 40:1082-1092. [PMID: 32829467 PMCID: PMC7443154 DOI: 10.1007/s10875-020-00839-x] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Accepted: 07/27/2020] [Indexed: 02/03/2023]
Abstract
We report a longitudinal analysis of the immune response associated with a fatal case of COVID-19 in Europe. This patient exhibited a rapid evolution towards multiorgan failure. SARS-CoV-2 was detected in multiple nasopharyngeal, blood, and pleural samples, despite antiviral and immunomodulator treatment. Clinical evolution in the blood was marked by an increase (2-3-fold) in differentiated effector T cells expressing exhaustion (PD-1) and senescence (CD57) markers, an expansion of antibody-secreting cells, a 15-fold increase in γδ T cell and proliferating NK-cell populations, and the total disappearance of monocytes, suggesting lung trafficking. In the serum, waves of a pro-inflammatory cytokine storm, Th1 and Th2 activation, and markers of T cell exhaustion, apoptosis, cell cytotoxicity, and endothelial activation were observed until the fatal outcome. This case underscores the need for well-designed studies to investigate complementary approaches to control viral replication, the source of the hyperinflammatory status, and immunomodulation to target the pathophysiological response. The investigation was conducted as part of an overall French clinical cohort assessing patients with COVID-19 and registered in clinicaltrials.gov under the following number: NCT04262921.
Collapse
|
13
|
PheProb: probabilistic phenotyping using diagnosis codes to improve power for genetic association studies. J Am Med Inform Assoc 2019; 25:1359-1365. [PMID: 29788308 DOI: 10.1093/jamia/ocy056] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2017] [Accepted: 04/23/2018] [Indexed: 12/24/2022] Open
Abstract
Objective Standard approaches for large scale phenotypic screens using electronic health record (EHR) data apply thresholds, such as ≥2 diagnosis codes, to define subjects as having a phenotype. However, the variation in the accuracy of diagnosis codes can impair the power of such screens. Our objective was to develop and evaluate an approach which converts diagnosis codes into a probability of a phenotype (PheProb). We hypothesized that this alternate approach for defining phenotypes would improve power for genetic association studies. Methods The PheProb approach employs unsupervised clustering to separate patients into 2 groups based on diagnosis codes. Subjects are assigned a probability of having the phenotype based on the number of diagnosis codes. This approach was developed using simulated EHR data and tested in a real world EHR cohort. In the latter, we tested the association between low density lipoprotein cholesterol (LDL-C) genetic risk alleles known for association with hyperlipidemia and hyperlipidemia codes (ICD-9 272.x). PheProb and thresholding approaches were compared. Results Among n = 1462 subjects in the real world EHR cohort, the threshold-based p-values for association between the genetic risk score (GRS) and hyperlipidemia were 0.126 (≥1 code), 0.123 (≥2 codes), and 0.142 (≥3 codes). The PheProb approach produced the expected significant association between the GRS and hyperlipidemia: p = .001. Conclusions PheProb improves statistical power for association studies relative to standard thresholding approaches by leveraging information about the phenotype in the billing code counts. The PheProb approach has direct applications where efficient approaches are required, such as in Phenome-Wide Association Studies.
Collapse
|
14
|
Diet-Related Metabolites Associated with Cognitive Decline Revealed by Untargeted Metabolomics in a Prospective Cohort. Mol Nutr Food Res 2019; 63:e1900177. [PMID: 31218777 PMCID: PMC6790579 DOI: 10.1002/mnfr.201900177] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Revised: 05/24/2019] [Indexed: 12/21/2022]
Abstract
Scope Untargeted metabolomics may reveal preventive targets in cognitive aging, including within the food metabolome. Methods and results
A case‐control study nested in the prospective Three‐City study includes participants aged ≥65 years and initially free of dementia. A total of 209 cases of cognitive decline and 209 controls (matched for age, gender, education) with slower cognitive decline over up to 12 years are contrasted. Using untargeted metabolomics and bootstrap‐enhanced penalized regression, a baseline serum signature of 22 metabolites associated with subsequent cognitive decline is identified. The signature includes three coffee metabolites, a biomarker of citrus intake, a cocoa metabolite, two metabolites putatively derived from fish and wine, three medium‐chain acylcarnitines, glycodeoxycholic acid, lysoPC(18:3), trimethyllysine, glucose, cortisol, creatinine, and arginine. Adding the 22 metabolites to a reference predictive model for cognitive decline (conditioned on age, gender, education and including ApoE‐ε4, diabetes, BMI, and number of medications) substantially increases the predictive performance: cross‐validated Area Under the Receiver Operating Curve = 75% [95% CI 70–80%] compared to 62% [95% CI 56–67%]. Conclusions The untargeted metabolomics study supports a protective role of specific foods (e.g., coffee, cocoa, fish) and various alterations in the endogenous metabolism responsive to diet in cognitive aging.
Collapse
|
15
|
P1‐011: UNTARGETED METABOLOMICS IN A PROSPECTIVE COHORT TO IDENTIFY DIET‐RELATED METABOLITES ASSOCIATED WITH AGE‐RELATED COGNITIVE DECLINE. Alzheimers Dement 2019. [DOI: 10.1016/j.jalz.2019.06.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
16
|
Gene Expression Signatures Associated With Immune and Virological Responses to Therapeutic Vaccination With Dendritic Cells in HIV-Infected Individuals. Front Immunol 2019; 10:874. [PMID: 31105698 PMCID: PMC6492565 DOI: 10.3389/fimmu.2019.00874] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Accepted: 04/05/2019] [Indexed: 12/31/2022] Open
Abstract
The goal of HIV therapeutic vaccination is to induce HIV-specific immune response able to control HIV replication. We previously reported that vaccination with ex vivo generated Dendritic Cells (DC) loaded with HIV-lipopeptides in HIV-infected patients (n = 19) on antiretroviral therapy (ART) was well-tolerated and immunogenic. Vaccine-elicited HIV-specific T cell responses were associated with improved control of viral replication following antiretroviral interruption (ATI from w24 to w48). We show an inverse relationship between HIV-specific responses (production of IL-2, IL-13, IL-21, IFN-g, CD4 polyfunctionality, i.e., production of at least two cytokines) and the peak of viral load during ATI. Here we have performed an integrative systems vaccinology analysis including: (i) post vaccination (w16) immune responses assessed by cytometry, cytokine secretion, and Interferon-γ ELISPOT assays; (ii) whole blood and cellular gene expression measured during vaccination; and (iii) viral parameters following ATI, with the objective to disentangle the relationships between these markers and to identify vaccine signatures. During vaccination, 69 gene expression modules out of 260 varied significantly including (by order of significance) modules related to inflammation (Chaussabel Modules M3.2, M4.13, M4.6, M5.7, M7.1, M4.2), plasma cells (M4.11) and T cells (M4.1, 4.15). Cellular immune responses were positively correlated to genes belonging to T cell functional modules (M4.1, M4.15) at w16 and negatively correlated to genes belonging to inflammation modules (M7.1, M5.7, M3.2, M4.13, M4.2). More specifically, we show that prolonged increased abundance of inflammatory gene pathways related to toll-like receptor signaling (especially TLR4) are associated with both lower vaccine immune responses and control of viral replication post ATI. Further comparison of DC vaccine gene signatures with previously reported non-HIV vaccine signatures, such as flu and pneumococcal vaccines, revealed common pathways across vaccines. Overall, these results show that too long duration and too high intensity of vaccine inflammatory responses hamper the magnitude of effector responses.
Collapse
|
17
|
Semi-supervised estimation of covariance with application to phenome-wide association studies with electronic medical records data. Stat Methods Med Res 2019; 29:455-465. [PMID: 30943854 DOI: 10.1177/0962280219837676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Electronic medical records data are valuable resources for discovery research. They contain detailed phenotypic information on individual patients, opening opportunities for simultaneously studying multiple phenotypes. A useful tool for such simultaneous assessment is the phenome-wide association study, which relates a genomic or biological marker of interest to a wide spectrum of disease phenotypes, typically defined by the diagnostic billing codes. One challenge arises when the biomarker of interest is expensive to measure on the entire electronic medical record cohort. Performing phenome-wide association study based on supervised estimation using only subjects who have marker measurements may yield limited power. In this paper, we focus on the setting where the marker is measured on a small fraction of the patients while a few surrogate markers such as historical measurements of the biomarker are available on a large number of patients. We propose an efficient semi-supervised estimation procedure to estimate the covariance between the biomarker and the billing code, leveraging the surrogate marker information. We employ surrogate marker values to impute the missing outcome via a two-step semi-non-parametric approach and demonstrate that our proposed estimator is always more efficient than the supervised counterpart without requiring the imputation model to be correct. We illustrate the proposed procedure by assessing the association between the C-reactive protein and some inflammatory diseases with an electronic medical record study of inflammatory bowel disease performed with the Partners HealthCare electronic medical record database where C-reactive protein was only measured for a small fraction of the patients due to budget constraints.
Collapse
|
18
|
Benefits of dimension reduction in penalized regression methods for high-dimensional grouped data: a case study in low sample size. Bioinformatics 2019; 35:3628-3634. [DOI: 10.1093/bioinformatics/btz135] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Revised: 02/08/2019] [Accepted: 02/23/2019] [Indexed: 01/10/2023] Open
Abstract
Abstract
Motivation
In some prediction analyses, predictors have a natural grouping structure and selecting predictors accounting for this additional information could be more effective for predicting the outcome accurately. Moreover, in a high dimension low sample size framework, obtaining a good predictive model becomes very challenging. The objective of this work was to investigate the benefits of dimension reduction in penalized regression methods, in terms of prediction performance and variable selection consistency, in high dimension low sample size data. Using two real datasets, we compared the performances of lasso, elastic net, group lasso, sparse group lasso, sparse partial least squares (PLS), group PLS and sparse group PLS.
Results
Considering dimension reduction in penalized regression methods improved the prediction accuracy. The sparse group PLS reached the lowest prediction error while consistently selecting a few predictors from a single group.
Availability and implementation
R codes for the prediction methods are freely available at https://github.com/SoufianeAjana/Blisar.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
|
19
|
Sequential Dirichlet process mixtures of multivariate skew $t$-distributions for model-based clustering of flow cytometry data. Ann Appl Stat 2019. [DOI: 10.1214/18-aoas1209] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
20
|
P1‐009: EARLY BLOOD LIPID SIGNATURE PREDICTING ACCELERATED COGNITIVE DECLINE IN OLDER PERSONS. Alzheimers Dement 2018. [DOI: 10.1016/j.jalz.2018.06.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
21
|
Variance component score test for time-course gene set analysis of longitudinal RNA-seq data. Biostatistics 2018; 18:589-604. [PMID: 28334305 DOI: 10.1093/biostatistics/kxx005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2016] [Accepted: 01/04/2017] [Indexed: 01/28/2023] Open
Abstract
As gene expression measurement technology is shifting from microarrays to sequencing, the statistical tools available for their analysis must be adapted since RNA-seq data are measured as counts. It has been proposed to model RNA-seq counts as continuous variables using nonparametric regression to account for their inherent heteroscedasticity. In this vein, we propose tcgsaseq, a principled, model-free, and efficient method for detecting longitudinal changes in RNA-seq gene sets defined a priori. The method identifies those gene sets whose expression varies over time, based on an original variance component score test accounting for both covariates and heteroscedasticity without assuming any specific parametric distribution for the (transformed) counts. We demonstrate that despite the presence of a nonparametric component, our test statistic has a simple form and limiting distribution, and both may be computed quickly. A permutation version of the test is additionally proposed for very small sample sizes. Applied to both simulated data and two real datasets, tcgsaseq is shown to exhibit very good statistical properties, with an increase in stability and power when compared to state-of-the-art methods ROAST (rotation gene set testing), edgeR, and DESeq2, which can fail to control the type I error under certain realistic settings. We have made the method available for the community in the R package tcgsaseq.
Collapse
|
22
|
Association Between Anti-Citrullinated Fibrinogen Antibodies and Coronary Artery Disease in Rheumatoid Arthritis. Arthritis Care Res (Hoboken) 2018; 70:1113-1117. [PMID: 28992379 DOI: 10.1002/acr.23444] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2017] [Accepted: 10/03/2017] [Indexed: 12/31/2022]
Abstract
OBJECTIVE Antibodies against citrullinated fibrinogen (anti-Cit-fibrinogen) have been implicated in rheumatoid arthritis (RA) and associated with cardiovascular risk in RA. The objective of this study was to examine the association between anti-Cit-fibrinogens and coronary artery disease (CAD) outcomes. METHODS We performed the study in an RA cohort based in a large academic institution linked with electronic medical record data containing information on CAD outcomes from medical record review. Using a published bead-based assay method, we measured 10 types of anti-Cit-fibrinogens. We applied a score test to determine the association between the anti-Cit-fibrinogens as a group with CAD outcomes. Principal components analysis (PCA) was performed to assess whether the anti-Cit-fibrinogens clustered into groups. Each group was then additionally tested for association with CAD. Sensitivity analyses were also performed using a published International Classification of Disease, Ninth Revision code group for ischemic heart disease (IHD) as the outcome. RESULTS We studied 1,006 RA subjects (mean ± SD age 61.0 ± 13.0 years; 72.2% anti-cyclic citrullinated peptide positive). As a group, anti-Cit-fibrinogen was associated with CAD (P = 1.1 × 10-4 ). From the PCA analysis, we observed 3 main groups, of which only 1 group, containing 7 of the 10 anti-Cit-fibrinogens, was significantly associated with CAD outcomes (P = 0.015). In the sensitivity analysis, all anti-Cit-fibrinogens as a group remained significantly associated with IHD (P = 2.9 × 10-4 ). CONCLUSION Anti-Cit-fibrinogen antibodies as a group were associated with CAD outcomes in our RA cohort, with the strongest signal for association arising from a subset of the autoantibodies.
Collapse
|
23
|
Pattern of polyphenol intake and the long-term risk of dementia in older persons. Neurology 2018; 90:e1979-e1988. [PMID: 29703769 DOI: 10.1212/wnl.0000000000005607] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Accepted: 03/13/2018] [Indexed: 11/15/2022] Open
Abstract
OBJECTIVE To investigate the optimal combination of dietary polyphenols associated with the long-term risk of dementia in a large prospective French cohort of older persons, the Three-City (3C) Study. METHODS We included 1,329 older adults without dementia from the 3C study with assessment of intake of 26 polyphenol subclasses who were followed up for 12 years for dementia. Using partial least squares for Cox models, we identified a pattern of polyphenol intake associated with dementia risk. RESULTS The pattern combined several flavonoids (dihydroflavonols, anthocyanins, isoflavonoids, flavanones), stilbenes (including resveratrol), lignans, and other subclasses (hydroxybenzaldehydes, naphthoquinones, furanocoumarins). Compared with participants in the lower quintile of pattern score, those in the higher quintile had a 50% lower risk of dementia (95% confidence interval 20%-68%, p for trend <0.01) in multivariate models. CONCLUSIONS In this French cohort, a polyphenol pattern provided by a diet containing specific plant products (nuts, citrus, berries, leafy vegetables, soy, cereals, olive oil) accompanied by red wine and tea was associated with lower dementia risk.
Collapse
|
24
|
Phenome-Wide Association Study of Autoantibodies to Citrullinated and Noncitrullinated Epitopes in Rheumatoid Arthritis. Arthritis Rheumatol 2017; 69:742-749. [PMID: 27792870 PMCID: PMC5378622 DOI: 10.1002/art.39974] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2016] [Accepted: 10/27/2016] [Indexed: 12/22/2022]
Abstract
OBJECTIVE Patients with rheumatoid arthritis (RA) develop autoantibodies against a spectrum of antigens, but the clinical significance of these autoantibodies is unclear. Using a phenome-wide association study (PheWAS) approach, we examined the association between autoantibodies and clinical subphenotypes of RA. METHODS This study was conducted in a cohort of RA patients identified from the electronic medical records (EMRs) of 2 tertiary care centers. Using a published multiplex bead assay, we measured 36 autoantibodies targeting epitopes implicated in RA. We extracted all International Classification of Diseases, Ninth Revision (ICD-9) codes for each subject and grouped them into disease categories (PheWAS codes), using a published method. We tested for the association of each autoantibody (grouped by the targeted protein) with PheWAS codes. To determine significant associations (at a false discovery rate [FDR] of ≤0.1), we reviewed the medical records of 50 patients with each PheWAS code to determine positive predictive values (PPVs). RESULTS We studied 1,006 RA patients; the mean ± SD age of the patients was 61.0 ± 12.9 years, and 79.0% were female. A total of 3,568 unique ICD-9 codes were grouped into 625 PheWAS codes; the 206 PheWAS codes with a prevalence of ≥3% were studied. Using the PheWAS method, we identified 24 significant associations of autoantibodies to epitopes at an FDR of ≤0.1. The associations that were strongest and had the highest PPV for the PheWAS code were autoantibodies against fibronectin and obesity (P = 6.1 × 10-4 , PPV 100%), and that between fibrinogen and pneumonopathy (P = 2.7 × 10-4 , PPV 96%). Pneumonopathy codes included diagnoses for cryptogenic organizing pneumonia and obliterative bronchiolitis. CONCLUSION We demonstrated application of a bioinformatics method, the PheWAS, to screen for the clinical significance of RA-related autoantibodies. Using the PheWAS approach, we identified potentially significant links between variations in the levels of autoantibodies and comorbidities of interest in RA.
Collapse
|
25
|
Kernel machine score test for pathway analysis in the presence of semi-competing risks. Stat Methods Med Res 2016; 27:1099-1114. [PMID: 27255336 DOI: 10.1177/0962280216653427] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
In cancer studies, patients often experience two different types of events: a non-terminal event such as recurrence or metastasis, and a terminal event such as cancer-specific death. Identifying pathways and networks of genes associated with one or both of these events is an important step in understanding disease development and targeting new biological processes for potential intervention. These correlated outcomes are commonly dealt with by modeling progression-free survival, where the event time is the minimum between the times of recurrence and death. However, identifying pathways only associated with progression-free survival may miss out on pathways that affect time to recurrence but not death, or vice versa. We propose a combined testing procedure for a pathway's association with both the cause-specific hazard of recurrence and the marginal hazard of death. The dependency between the two outcomes is accounted for through perturbation resampling to approximate the test's null distribution, without any further assumption on the nature of the dependency. Even complex non-linear relationships between pathways and disease progression or death can be uncovered thanks to a flexible kernel machine framework. The superior statistical power of our approach is demonstrated in numerical studies and in a gene expression study of breast cancer.
Collapse
|
26
|
Group and sparse group partial least square approaches applied in genomics context. Bioinformatics 2015; 32:35-42. [DOI: 10.1093/bioinformatics/btv535] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2015] [Accepted: 09/03/2015] [Indexed: 01/07/2023] Open
|
27
|
Time-Course Gene Set Analysis for Longitudinal Gene Expression Data. PLoS Comput Biol 2015; 11:e1004310. [PMID: 26111374 PMCID: PMC4482329 DOI: 10.1371/journal.pcbi.1004310] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Accepted: 04/30/2015] [Indexed: 01/13/2023] Open
Abstract
Gene set analysis methods, which consider predefined groups of genes in the analysis of genomic data, have been successfully applied for analyzing gene expression data in cross-sectional studies. The time-course gene set analysis (TcGSA) introduced here is an extension of gene set analysis to longitudinal data. The proposed method relies on random effects modeling with maximum likelihood estimates. It allows to use all available repeated measurements while dealing with unbalanced data due to missing at random (MAR) measurements. TcGSA is a hypothesis driven method that identifies a priori defined gene sets with significant expression variations over time, taking into account the potential heterogeneity of expression within gene sets. When biological conditions are compared, the method indicates if the time patterns of gene sets significantly differ according to these conditions. The interest of the method is illustrated by its application to two real life datasets: an HIV therapeutic vaccine trial (DALIA-1 trial), and data from a recent study on influenza and pneumococcal vaccines. In the DALIA-1 trial TcGSA revealed a significant change in gene expression over time within 69 gene sets during vaccination, while a standard univariate individual gene analysis corrected for multiple testing as well as a standard a Gene Set Enrichment Analysis (GSEA) for time series both failed to detect any significant pattern change over time. When applied to the second illustrative data set, TcGSA allowed the identification of 4 gene sets finally found to be linked with the influenza vaccine too although they were found to be associated to the pneumococcal vaccine only in previous analyses. In our simulation study TcGSA exhibits good statistical properties, and an increased power compared to other approaches for analyzing time-course expression patterns of gene sets. The method is made available for the community through an R package. Gene set analysis methods use prior biological knowledge to analyze gene expression data. This prior knowledge takes the form of predefined groups of genes, linked through their biological function. Gene set analysis methods have been successfully applied in transversal studies, their results being more sensitive and interpretable than those of methods investigating genomic data one gene at a time. The time-course gene set analysis (TcGSA) introduced here is an extension of such gene set analysis to longitudinal data. This method identifies a priori defined groups of genes whose expression is not stable over time, taking into account the potential heterogeneity between patients and between genes. When biological conditions are compared, it identifies the gene sets that have different expression dynamics according to these conditions. Data from 2 studies are analyzed: data from an HIV therapeutic vaccine trial, and data from a recent study on influenza and pneumococcal vaccines. In both cases, TcGSA provided new insights compared to standard approaches thanks to an increased sensitivity compared to other approaches. Those results highlight the benefits of the TcGSA method for analyzing gene expression dynamics.
Collapse
|
28
|
Evidence synthesis through a degradation model applied to myocardial infarction. LIFETIME DATA ANALYSIS 2013; 19:1-18. [PMID: 22918702 PMCID: PMC3983527 DOI: 10.1007/s10985-012-9227-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2011] [Accepted: 08/07/2012] [Indexed: 06/01/2023]
Abstract
We propose an evidence synthesis approach through a degradation model to estimate causal influences of physiological factors on myocardial infarction (MI) and coronary heart disease (CHD). For instance several studies give incidences of MI and CHD for different age strata, other studies give relative or absolute risks for strata of main risk factors of MI or CHD. Evidence synthesis of several studies allows incorporating these disparate pieces of information into a single model. For doing this we need to develop a sufficiently general dynamical model; we also need to estimate the distribution of explanatory factors in the population. We develop a degradation model for both MI and CHD using a Brownian motion with drift, and the drift is modeled as a function of indicators of obesity, lipid profile, inflammation and blood pressure. Conditionally on these factors the times to MI or CHD have inverse Gaussian ([Formula: see text]) distributions. The results we want to fit are generally not conditional on all the factors and thus we need marginal distributions of the time of occurrence of MI and CHD; this leads us to manipulate the inverse Gaussian normal distribution ([Formula: see text]) (an [Formula: see text] whose drift parameter has a normal distribution). Another possible model arises if a factor modifies the threshold. This led us to define an extension of [Formula: see text] obtained when both drift and threshold parameters have normal distributions. We applied the model to results published in five important studies of MI and CHD and their risk factors. The fit of the model using the evidence synthesis approach was satisfactory and the effects of the four risk factors were highly significant.
Collapse
|