1
|
Patient contrastive learning: A performant, expressive, and practical approach to electrocardiogram modeling. PLoS Comput Biol 2022; 18:e1009862. [PMID: 35157695 PMCID: PMC8880931 DOI: 10.1371/journal.pcbi.1009862] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 02/25/2022] [Accepted: 01/25/2022] [Indexed: 11/19/2022] Open
Abstract
Supervised machine learning applications in health care are often limited due to a scarcity of labeled training data. To mitigate the effect of small sample size, we introduce a pre-training approach, Patient Contrastive Learning of Representations (PCLR), which creates latent representations of electrocardiograms (ECGs) from a large number of unlabeled examples using contrastive learning. The resulting representations are expressive, performant, and practical across a wide spectrum of clinical tasks. We develop PCLR using a large health care system with over 3.2 million 12-lead ECGs and demonstrate that training linear models on PCLR representations achieves a 51% performance increase, on average, over six training set sizes and four tasks (sex classification, age regression, and the detection of left ventricular hypertrophy and atrial fibrillation), relative to training neural network models from scratch. We also compared PCLR to three other ECG pre-training approaches (supervised pre-training, unsupervised pre-training with an autoencoder, and pre-training using a contrastive multi ECG-segment approach), and show significant performance benefits in three out of four tasks. We found an average performance benefit of 47% over the other models and an average of a 9% performance benefit compared to best model for each task. We release PCLR to enable others to extract ECG representations at https://github.com/broadinstitute/ml4h/tree/master/model_zoo/PCLR. ECGs are a rich source of cardiac health information. Many recent works have shown that deep learning can extract new information from ECGs when there are a sufficient number of labeled data. However, when there are not enough labeled data or a clinician scientist does not have the resources to train a deep learning model from scratch, options are limited. We introduce Patient Contrastive Learning of Representations (PCLR), an approach to train a neural network that extracts representations of ECGs. The only labels required to train PCLR are which ECG comes from which patient. The resulting ECG representations can be used directly in linear models for new tasks without needing to finetune the neural network. We show PCLR is better than using a set of handpicked features for four tasks, and better than three other deep learning approaches for three out of four tasks evaluated. Furthermore, PCLR is better than training a neural network from scratch when training data are limited. PCLR is one of the first attempts at releasing and evaluating a pre-trained ECG model with the purpose of accelerating deep learning ECG research.
Collapse
|
2
|
Taylor ZL, Vang J, Lopez-Lopez E, Oosterom N, Mikkelsen T, Ramsey LB. Systematic Review of Pharmacogenetic Factors That Influence High-Dose Methotrexate Pharmacokinetics in Pediatric Malignancies. Cancers (Basel) 2021; 13:cancers13112837. [PMID: 34200242 PMCID: PMC8201112 DOI: 10.3390/cancers13112837] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 05/27/2021] [Accepted: 06/02/2021] [Indexed: 02/07/2023] Open
Abstract
Methotrexate (MTX) is a mainstay therapeutic agent administered at high doses for the treatment of pediatric and adult malignancies, such as acute lymphoblastic leukemia, osteosarcoma, and lymphoma. Despite the vast evidence for clinical efficacy, high-dose MTX displays significant inter-individual pharmacokinetic variability. Delayed MTX clearance can lead to prolonged, elevated exposure, causing increased risks for nephrotoxicity, mucositis, seizures, and neutropenia. Numerous pharmacogenetic studies have investigated the effects of several genes and polymorphisms on MTX clearance in an attempt to better understand the pharmacokinetic variability and improve patient outcomes. To date, several genes and polymorphisms that affect MTX clearance have been identified. However, evidence for select genes have conflicting results or lack the necessary replication and validation needed to confirm their effects on MTX clearance. Therefore, we performed a systematic review to identify and then summarize the pharmacogenetic factors that influence high-dose MTX pharmacokinetics in pediatric malignancies. Using the PRISMA guidelines, we analyzed 58 articles and 24 different genes that were associated with transporter pharmacology or the folate transport pathway. We conclude that there is only one gene that reliably demonstrates an effect on MTX pharmacokinetics: SLCO1B1.
Collapse
Affiliation(s)
- Zachary L. Taylor
- Department of Pharmacology and Systems Physiology, University of Cincinnati, Cincinnati, OH 45267, USA;
- Division of Research in Patient Services, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
- Division of Clinical Pharmacology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Jesper Vang
- Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark;
- Paediatric Oncology Research Laboratory, University Hospital of Copenhagen, Rigshospitalet Blegdamsvej 9, 2100 Copenhagen, Denmark
| | - Elixabet Lopez-Lopez
- Department of Genetics, Physical Anthropology and Animal Physiology, Faculty of Science and Technology, University of the Basque Country, UPV/EHU, 48940 Leioa, Spain;
- Pediatric Oncology Group, BioCruces Bizkaia Health Research Institute, 48903 Barakaldo, Spain
| | - Natanja Oosterom
- Princess Máxima Center for Pediatric Oncology, 3720 Utrecht, The Netherlands;
| | - Torben Mikkelsen
- Department of Pediatric Oncology, Aarhus University Hospital, 8200 Aarhus, Denmark;
| | - Laura B. Ramsey
- Division of Research in Patient Services, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
- Division of Clinical Pharmacology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
- Correspondence: ; Tel.: +1-513-803-8963
| |
Collapse
|
3
|
Genetic variants associated with methotrexate-induced mucositis in cancer treatment: A systematic review and meta-analysis. Crit Rev Oncol Hematol 2021; 161:103312. [PMID: 33794308 DOI: 10.1016/j.critrevonc.2021.103312] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2020] [Revised: 03/08/2021] [Accepted: 03/23/2021] [Indexed: 12/11/2022] Open
Abstract
Methotrexate (MTX), an important chemotherapeutic agent, is often accompanied with mucositis. The occurrence and severity are unpredictable and show large interindividual variability. In this study, we review and meta-analyze previously studied genetic variants in relation to MTX-induced mucositis. We conducted a systematic search in Medline and Embase. We included genetic association studies of MTX-induced mucositis in cancer patients. A meta-analysis was conducted for single nucleotide polymorphisms (SNPs) for which at least two studies found a statistically significant association. A total of 34 SNPs were associated with mucositis in at least one study of the 57 included studies. Two of the seven SNPs included in our meta-analysis were statistically significantly associated with mucositis: MTHFR c.677C > T (recessive, grade ≥3 vs grade 0-2, OR 2.53, 95 %CI [1.48-4.32], False Discovery Rate[FDR]-corrected p-value 0.011) and MTRR c.66A > G (overdominant, grade ≥1 vs grade 0, OR 2.08, 95 %CI [1.16-3.73], FDR-corrected p-value 0.042).
Collapse
|
4
|
Xin J, Chu H, Ben S, Ge Y, Shao W, Zhao Y, Wei Y, Ma G, Li S, Gu D, Zhang Z, Du M, Wang M. Evaluating the effect of multiple genetic risk score models on colorectal cancer risk prediction. Gene 2018; 673:174-180. [PMID: 29908285 DOI: 10.1016/j.gene.2018.06.035] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2018] [Revised: 05/25/2018] [Accepted: 06/12/2018] [Indexed: 12/29/2022]
Abstract
Currently, genetic risk score (GRS) model has been a widely used method to evaluate the genetic effect of cancer risk prediction, but seldom studies investigated their discriminatory power, especially for colorectal cancer (CRC) risk prediction. In this study, we applied both simulation and real data to comprehensively compare the discriminability of different GRS models. The GRS models were fitted by logistic regression with three scenarios, including simple count GRS (SC-GRS), logistic regression weighted GRS (LR-GRS, including DL-GRS and OR-GRS) and explained variance weighted GRS (EV-GRS, including EV_DL-GRS and EV_OR-GRS) models. The model performance was evaluated by receiver operating characteristic (ROC) curves and area under curves (AUC) metric, net reclassification improvement (NRI) and integrated discrimination improvement (IDI). In real data analysis, as DL-GRS and EV_DL-GRS models were carried with serious over-fitting, the other three models were kept for further comparison. Compared to unweighted SC-GRS model, reclassification was significantly decreased in OR-GRS model (NRI = -0.082, IDI = -0.002, P < 0.05), while EV_OR-GRS model showed negative NRI and IDI (NRI = -0.077, IDI = -5.54E-04, P < 0.05) compared to OR-GRS model. Besides, traditional model with smoking status (AUC = 0.523) performed lower discriminability compared to the combined model (AUC = 0.607) including genetic (i.e., SC-GRS) and smoking factors. Similarly, the findings from simulation were all consistent to real data results. It is plausible that SC-GRS model could be optimal for predicting genetic risk of CRC. Moreover, the addition of more significant genetic variants to traditional model could further improve predictive power on CRC risk prediction.
Collapse
Affiliation(s)
- Junyi Xin
- Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China; Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Haiyan Chu
- Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China; Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Shuai Ben
- Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China; Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Yuqiu Ge
- Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China; Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Wei Shao
- Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China; Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Yang Zhao
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China; China International Cooperation Center for Environment and Human Health, Nanjing Medical University, Nanjing, China
| | - Yongyue Wei
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China; China International Cooperation Center for Environment and Human Health, Nanjing Medical University, Nanjing, China
| | - Gaoxiang Ma
- Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China; Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Shuwei Li
- Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China; Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Dongying Gu
- Department of Oncology, The Affiliated Nanjing Hospital of Nanjing Medical University, Nanjing, China
| | - Zhengdong Zhang
- Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China; Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, School of Public Health, Nanjing Medical University, Nanjing, China.
| | - Mulong Du
- Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China; Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China; China International Cooperation Center for Environment and Human Health, Nanjing Medical University, Nanjing, China.
| | - Meilin Wang
- Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China; Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, School of Public Health, Nanjing Medical University, Nanjing, China; China International Cooperation Center for Environment and Human Health, Nanjing Medical University, Nanjing, China.
| |
Collapse
|
5
|
Genetic Biomarkers to Identify the Risk of Osteonecrosis in Children with Acute Lymphoblastic Leukemia. Mol Diagn Ther 2016; 20:519-522. [PMID: 27365083 DOI: 10.1007/s40291-016-0226-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Osteonecrosis is a disabling complication of treatment for pediatric acute lymphoblastic leukemia, and much effort has been made to predict which patients are prone to develop this disease. Multiple clinical and genetic factors have already been identified as being associated with osteonecrosis; however, a prediction model that combines pretreatment genetic biomarkers and clinical factors has not yet been designed. Such a prediction model can only be developed with continuing international collaborations and research efforts, including large genome-wide association studies.
Collapse
|
6
|
Yashin AI, Arbeev KG, Wu D, Arbeeva L, Kulminski A, Kulminskaya I, Akushevich I, Ukraintseva SV. How Genes Modulate Patterns of Aging-Related Changes on the Way to 100: Biodemographic Models and Methods in Genetic Analyses of Longitudinal Data. NORTH AMERICAN ACTUARIAL JOURNAL : NAAJ 2016; 20:201-232. [PMID: 27773987 PMCID: PMC5070546 DOI: 10.1080/10920277.2016.1178588] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
BACKGROUND AND OBJECTIVE To clarify mechanisms of genetic regulation of human aging and longevity traits, a number of genome-wide association studies (GWAS) of these traits have been performed. However, the results of these analyses did not meet expectations of the researchers. Most detected genetic associations have not reached a genome-wide level of statistical significance, and suffered from the lack of replication in the studies of independent populations. The reasons for slow progress in this research area include low efficiency of statistical methods used in data analyses, genetic heterogeneity of aging and longevity related traits, possibility of pleiotropic (e.g., age dependent) effects of genetic variants on such traits, underestimation of the effects of (i) mortality selection in genetically heterogeneous cohorts, (ii) external factors and differences in genetic backgrounds of individuals in the populations under study, the weakness of conceptual biological framework that does not fully account for above mentioned factors. One more limitation of conducted studies is that they did not fully realize the potential of longitudinal data that allow for evaluating how genetic influences on life span are mediated by physiological variables and other biomarkers during the life course. The objective of this paper is to address these issues. DATA AND METHODS We performed GWAS of human life span using different subsets of data from the original Framingham Heart Study cohort corresponding to different quality control (QC) procedures and used one subset of selected genetic variants for further analyses. We used simulation study to show that approach to combining data improves the quality of GWAS. We used FHS longitudinal data to compare average age trajectories of physiological variables in carriers and non-carriers of selected genetic variants. We used stochastic process model of human mortality and aging to investigate genetic influence on hidden biomarkers of aging and on dynamic interaction between aging and longevity. We investigated properties of genes related to selected variants and their roles in signaling and metabolic pathways. RESULTS We showed that the use of different QC procedures results in different sets of genetic variants associated with life span. We selected 24 genetic variants negatively associated with life span. We showed that the joint analyses of genetic data at the time of bio-specimen collection and follow up data substantially improved significance of associations of selected 24 SNPs with life span. We also showed that aging related changes in physiological variables and in hidden biomarkers of aging differ for the groups of carriers and non-carriers of selected variants. CONCLUSIONS . The results of these analyses demonstrated benefits of using biodemographic models and methods in genetic association studies of these traits. Our findings showed that the absence of a large number of genetic variants with deleterious effects may make substantial contribution to exceptional longevity. These effects are dynamically mediated by a number of physiological variables and hidden biomarkers of aging. The results of these research demonstrated benefits of using integrative statistical models of mortality risks in genetic studies of human aging and longevity.
Collapse
Affiliation(s)
- Anatoliy I. Yashin
- Professor, Center for Population Health and Aging, Duke University, 2024 W. Main Street, Room A102E, Durham, NC 27705, USA. Tel.: (+1) 919-668-2713; Fax: (+1) 919-684-3861
| | - Konstantin G. Arbeev
- Sr. Research Scientist, Center for Population Health and Aging, Duke University, 2024 W. Main Street, Room A102F, Durham, NC 27705, USA. Tel.: (+1) 919-668-2707; Fax: (+1) 919-684-3861
| | - Deqing Wu
- Sr. Research Scientist, Center for Population Health and Aging, Duke University, 2024 W. Main Street, Room A104, Durham, NC 27705, USA. Tel.: (+1) 919-684-6126; Fax: (+1) 919-684-3861
| | - Liubov Arbeeva
- Statistician, Center for Population Health and Aging, Duke University, 2024 W. Main Street, Room A102G, Durham, NC 27705, USA. Tel.: (+1) 919-613-0715; Fax: (+1) 919-684-3861
| | - Alexander Kulminski
- Sr. Research Scientist, Center for Population Health and Aging, Duke University, 2024 W. Main Street, Room A106, Durham, NC 27705, USA. Tel.: (+1) 919-684-4962; Fax: (+1) 919-684-3861
| | - Irina Kulminskaya
- Research Scientist, Center for Population Health and Aging, Duke University, 2024 W. Main Street, Room A102D, Durham, NC 27705, USA. Tel.: (+1) 919-681-8232; Fax: (+1) 919-684-3861
| | - Igor Akushevich
- Sr. Research Scientist, Center for Population Health and Aging, Duke University, 2024 W. Main Street, Room A107, Durham, NC 27705, USA. Tel.: (+1) 919-668-2715; Fax: (+1) 919-684-3861
| | - Svetlana V. Ukraintseva
- Sr. Research Scientist, Center for Population Health and Aging, Duke University, 2024 W. Main Street, Room A105, Durham, NC 27705, USA. Tel.: (+1) 919-668-2712; Fax: (+1) 919-684-3861
| |
Collapse
|
7
|
Potenciano V, Abad-Grau MM, Alcina A, Matesanz F. A comparison of genomic profiles of complex diseases under different models. BMC Med Genomics 2016; 9:3. [PMID: 26782991 PMCID: PMC4717655 DOI: 10.1186/s12920-015-0157-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2014] [Accepted: 11/27/2015] [Indexed: 12/15/2022] Open
Abstract
Background Various approaches are being used to predict individual risk to polygenic diseases from data provided by genome-wide association studies. As there are substantial differences between the diseases investigated, the data sets used and the way they are tested, it is difficult to assess which models are more suitable for this task. Results We compared different approaches for seven complex diseases provided by the Wellcome Trust Case Control Consortium (WTCCC) under a within-study validation approach. Risk models were inferred using a variety of learning machines and assumptions about the underlying genetic model, including a haplotype-based approach with different haplotype lengths and different thresholds in association levels to choose loci as part of the predictive model. In accordance with previous work, our results generally showed low accuracy considering disease heritability and population prevalence. However, the boosting algorithm returned a predictive area under the ROC curve (AUC) of 0.8805 for Type 1 diabetes (T1D) and 0.8087 for rheumatoid arthritis, both clearly over the AUC obtained by other approaches and over 0.75, which is the minimum required for a disease to be successfully tested on a sample at risk, which means that boosting is a promising approach. Its good performance seems to be related to its robustness to redundant data, as in the case of genome-wide data sets due to linkage disequilibrium. Conclusions In view of our results, the boosting approach may be suitable for modeling individual predisposition to Type 1 diabetes and rheumatoid arthritis based on genome-wide data and should be considered for more in-depth research. Electronic supplementary material The online version of this article (doi:10.1186/s12920-015-0157-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Víctor Potenciano
- Departamento de Lenguajes y Sistemas Informáticos, ETSIIT, c/ Periodista Daniel Saucedo Aranda s/n Universidad de Granada, Granada, 18071, Spain.
| | - María Mar Abad-Grau
- Departamento de Lenguajes y Sistemas Informáticos, ETSIIT, c/ Periodista Daniel Saucedo Aranda s/n Universidad de Granada, Granada, 18071, Spain.
| | - Antonio Alcina
- Instituto de Parasitología y Biología Molecular, CSIC, Granada, Spain.
| | | |
Collapse
|
8
|
Maranville JC, Di Rienzo A. Combining genetic and nongenetic biomarkers to realize the promise of pharmacogenomics for inflammatory diseases. Pharmacogenomics 2015; 15:1931-40. [PMID: 25495413 DOI: 10.2217/pgs.14.129] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Many drugs used to treat inflammatory diseases are ineffective in a substantial proportion of patients. Identifying patients that are likely to respond to specific therapies would facilitate personalized treatment strategies that could improve outcomes while reducing costs and risks of adverse events. Despite these clear benefits, there are limited examples of predictive biomarkers of drug efficacy currently implemented into clinical practice for inflammatory diseases. We review efforts to identify genetic and nongenetic biomarkers of drug response in these diseases and consider potential benefits from combining multiple sources of biological data into multifeature predictive models.
Collapse
Affiliation(s)
- Joseph C Maranville
- Committee on Clinical Pharmacology & Pharmacogenomics, The University of Chicago, Chicago, IL, USA
| | | |
Collapse
|
9
|
Chen T, Ma Y, Wang Y. Predicting cumulative risk of disease onset by redistributing weights. Stat Med 2015; 34:2427-43. [PMID: 25847392 PMCID: PMC4457675 DOI: 10.1002/sim.6499] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2013] [Revised: 02/23/2015] [Accepted: 03/12/2015] [Indexed: 11/09/2022]
Abstract
We propose a simple approach predicting the cumulative risk of disease accommodating predictors with time-varying effects and outcomes subject to censoring. We use a nonparametric function for the coefficient of the time-varying effect and handle censoring through self-consistency equations that redistribute the probability mass of censored outcomes to the right. The computational procedure is extremely convenient and can be implemented by standard software. We prove large sample properties of the proposed estimator and evaluate its finite sample performance through simulation studies. We apply the method to estimate the cumulative risk of developing Huntington's disease (HD) from subjects with huntingtin gene mutation using a large collaborative HD study data and illustrate an inverse relationship between the cumulative risk of HD and the length of cytosine-adenine-guanine repeats in the huntingtin gene.
Collapse
Affiliation(s)
| | - Yanyuan Ma
- Department of Statistics, Texas A&M University
| | - Yuanjia Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University
| |
Collapse
|
10
|
Carayol J, Schellenberg GD, Dombroski B, Amiet C, Génin B, Fontaine K, Rousseau F, Vazart C, Cohen D, Frazier TW, Hardan AY, Dawson G, Rio Frio T. A scoring strategy combining statistics and functional genomics supports a possible role for common polygenic variation in autism. Front Genet 2014; 5:33. [PMID: 24600472 PMCID: PMC3927086 DOI: 10.3389/fgene.2014.00033] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Accepted: 01/29/2014] [Indexed: 12/23/2022] Open
Abstract
Autism spectrum disorders (ASD) are highly heritable complex neurodevelopmental disorders with a 4:1 male: female ratio. Common genetic variation could explain 40-60% of the variance in liability to autism. Because of their small effect, genome-wide association studies (GWASs) have only identified a small number of individual single-nucleotide polymorphisms (SNPs). To increase the power of GWASs in complex disorders, methods like convergent functional genomics (CFG) have emerged to extract true association signals from noise and to identify and prioritize genes from SNPs using a scoring strategy combining statistics and functional genomics. We adapted and applied this approach to analyze data from a GWAS performed on families with multiple children affected with autism from Autism Speaks Autism Genetic Resource Exchange (AGRE). We identified a set of 133 candidate markers that were localized in or close to genes with functional relevance in ASD from a discovery population (545 multiplex families); a gender specific genetic score (GS) based on these common variants explained 1% (P = 0.01 in males) and 5% (P = 8.7 × 10(-7) in females) of genetic variance in an independent sample of multiplex families. Overall, our work demonstrates that prioritization of GWAS data based on functional genomics identified common variants associated with autism and provided additional support for a common polygenic background in autism.
Collapse
Affiliation(s)
| | - Gerard D. Schellenberg
- Department of Pathology and Laboratory Medicine, University of PennsylvaniaPhiladelphia, PA, USA
| | - Beth Dombroski
- Department of Pathology and Laboratory Medicine, University of PennsylvaniaPhiladelphia, PA, USA
| | | | | | | | | | | | - David Cohen
- Groupe Hospitalier Pitié-Salpêtrière, Department of Child and Adolescent Psychiatry, AP-HP, Université Pierre et Marie CurieParis, France
| | - Thomas W. Frazier
- Center for Pediatric Behavioral Health and Center for Autism, Cleveland ClinicCleveland, OH, USA
| | - Antonio Y. Hardan
- Department of Psychiatry and Behavioral Sciences, Stanford UniversityStanford, CA, USA
| | - Geraldine Dawson
- Department of Psychiatry and Behavioral Sciences, Duke University Medical CenterDurham, NC, USA
| | | |
Collapse
|
11
|
Ren S, Xu J, Zhou T, Jiang H, Chen H, Liu F, Na R, Zhang L, Wu Y, Sun J, Yang B, Gao X, Zheng SL, Xu C, Ding Q, Sun Y. Plateau effect of prostate cancer risk-associated SNPs in discriminating prostate biopsy outcomes. Prostate 2013; 73:1824-35. [PMID: 24037738 PMCID: PMC3910089 DOI: 10.1002/pros.22721] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/18/2013] [Accepted: 07/19/2013] [Indexed: 12/20/2022]
Abstract
BACKGROUND Additional prostate cancer (PCa) risk-associated single nucleotide polymorphisms (SNPs) continue to be identified. It is unclear whether addition of newly identified SNPs improves the discriminative performance of biopsy outcomes over previously established SNPs. METHODS A total of 667 consecutive patients that underwent prostate biopsy for detection of PCa at Huashan Hospital and Changhai Hospital, Shanghai, China were recruited. Genetic scores were calculated for each patient using various combinations of 29 PCa risk-associated SNPs. Performance of these genetic scores for discriminating prostate biopsy outcomes were compared using the area under a receiver operating characteristic curve (AUC). RESULTS The discriminative performance of genetic score derived from a panel of all 29 SNPs (24 previous and 5 new) was similar to that derived from the 24 previously established SNPs, the AUC of which were 0.60 and 0.61, respectively (P = 0.72). When SNPs with the strongest effect on PCa risk (ranked based on contribution to the total genetic variance from an external study) were sequentially added to the models for calculating genetic score, the AUC gradually increased and peaked at 0.62 with the top 13 strongest SNPs. Under the 13-SNP model, the PCa detection rate was 21.52%, 36.74%, and 51.98%, respectively for men with low (<0.5), intermediate (0.5-1.5), and high (>1.5) genetic score, P-trend = 9.91 × 10(-6). CONCLUSION Genetic score based on PCa risk-associated SNPs implicated to date is a significant predictor of biopsy outcome. Additional small-effect PCa risk-associated SNPs to be discovered in the future are unlikely to further improve predictive performance.
Collapse
Affiliation(s)
- Shancheng Ren
- Department of Urology, Shanghai Changhai Hospital, Second Military Medical University, Shanghai, China
| | - Jianfeng Xu
- Fudan Institute of Urology, Huashan Hospital, Fudan University, Shanghai, China
- State Key Laboratory of Genetic Engineering, Center for Genetic Epidemiology, School of Life Sciences, Fudan University, Shanghai, China
- Center for Cancer Genomics, Wake Forest University School of Medicine, Winston-Salem, North Carolina
| | - Tie Zhou
- Department of Urology, Shanghai Changhai Hospital, Second Military Medical University, Shanghai, China
| | - Haowen Jiang
- Fudan Institute of Urology, Huashan Hospital, Fudan University, Shanghai, China
| | - Haitao Chen
- State Key Laboratory of Genetic Engineering, Center for Genetic Epidemiology, School of Life Sciences, Fudan University, Shanghai, China
| | - Fang Liu
- Fudan Institute of Urology, Huashan Hospital, Fudan University, Shanghai, China
- State Key Laboratory of Genetic Engineering, Center for Genetic Epidemiology, School of Life Sciences, Fudan University, Shanghai, China
| | - Rong Na
- Fudan Institute of Urology, Huashan Hospital, Fudan University, Shanghai, China
| | - Limin Zhang
- Fudan Institute of Urology, Huashan Hospital, Fudan University, Shanghai, China
| | - Yishuo Wu
- Fudan Institute of Urology, Huashan Hospital, Fudan University, Shanghai, China
| | - Jielin Sun
- Center for Cancer Genomics, Wake Forest University School of Medicine, Winston-Salem, North Carolina
| | - Bo Yang
- Department of Urology, Shanghai Changhai Hospital, Second Military Medical University, Shanghai, China
| | - Xu Gao
- Department of Urology, Shanghai Changhai Hospital, Second Military Medical University, Shanghai, China
| | - S. Lilly Zheng
- Center for Cancer Genomics, Wake Forest University School of Medicine, Winston-Salem, North Carolina
| | - Chuanliang Xu
- Department of Urology, Shanghai Changhai Hospital, Second Military Medical University, Shanghai, China
| | - Qiang Ding
- Fudan Institute of Urology, Huashan Hospital, Fudan University, Shanghai, China
- Correspondence to: Qiang Ding, Fudan Institute of Urology, Huashan Hospital, Fudan University, Shanghai, China.
| | - Yinghao Sun
- Department of Urology, Shanghai Changhai Hospital, Second Military Medical University, Shanghai, China
- Correspondence to: Yinghao Sun, Department of Urology, Shanghai Changhai Hospital, Second Military Medical University, Shanghai, China.
| |
Collapse
|
12
|
Wu J, Pfeiffer RM, Gail MH. Strategies for developing prediction models from genome-wide association studies. Genet Epidemiol 2013; 37:768-77. [PMID: 24166696 DOI: 10.1002/gepi.21762] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2013] [Revised: 07/31/2013] [Accepted: 09/10/2013] [Indexed: 12/30/2022]
Abstract
Genome-wide association studies (GWASs) have identified hundreds of single nucleotide polymorphisms (SNPs) associated with complex human diseases. However, risk prediction models based on them have limited discriminatory accuracy. It has been suggested that including many such SNPs can improve predictive performance. Here, we studied various aspects of model building to improve discriminatory accuracy, as measured by the area under the receiver operating characteristic curve (AUC), including: (1) How well does a one-phase procedure that selects SNPs and estimates odds ratios on the same data perform? (2) How should training data be allocated between SNP selection (Phase 1) and estimation (Phase 2) in a two-phase procedure? (3) Should SNP selection be based on P-value thresholding or ranking P-values? (4) How many SNPs should be selected? and (5) Is multivariate estimation preferred to univariate estimation in the presence of linkage disequilibrium (LD)? We used realistic estimates of the distributions of genetic effect sizes, allele frequencies, and LD patterns based on GWAS data for Crohn's disease and prostate cancer. Theory and simulations were used to estimate AUC. Empirical risk models based on 10,000 cases and controls had considerably lower AUC than theoretically achievable. The most critical aspect of prediction model building was initial SNP selection. The single-phase procedure achieved higher AUC than the two-phase procedure. Multivariate estimation did not perform as well as univariate (marginal) estimation. For complex diseases and samples of 10,000 or fewer cases and controls, one should limit the number of SNPs to tens or hundreds.
Collapse
Affiliation(s)
- Jincao Wu
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America
| | | | | |
Collapse
|
13
|
Effector CD4+ T cell expression signatures and immune-mediated disease associated genes. PLoS One 2012; 7:e38510. [PMID: 22715389 PMCID: PMC3371029 DOI: 10.1371/journal.pone.0038510] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2011] [Accepted: 05/07/2012] [Indexed: 01/22/2023] Open
Abstract
Genome-wide association studies (GWAS) in immune-mediated diseases have identified over 150 associated genomic loci. Many of these loci play a role in T cell responses, and regulation of T cell differentiation plays a critical role in immune-mediated diseases; however, the relationship between implicated disease loci and T cell differentiation is incompletely understood. To further address this relationship, we examined differential gene expression in naïve human CD4+ T cells, as well as in in vitro differentiated Th1, memory Th17-negative and Th17-enriched CD4+ T cells subsets using microarray and RNASeq. We observed a marked enrichment for increased expression in memory CD4+ compared to naïve CD4+ T cells of genes contained among immune–mediated disease loci. Within memory T cells, expression of disease-associated genes was typically increased in Th17-enriched compared to Th17-negative cells. Utilizing RNASeq and promoter methylation studies, we identified a differential regulation pattern for genes solely expressed in Th17 cells (IL17A and CCL20) compared to genes expressed in both Th17 and Th1 cells (IL23R and IL12RB2), where high levels of promoter methylation are correlated to near zero RNASeq levels for IL17A and CCL20. These findings have implications for human Th17 celI plasticity and for the regulation of Th17-Th1 expression signatures. Importantly, utilizing RNASeq we found an abundant isoform of IL23R terminating before the transmembrane domain that was enriched in Th17 cells. In addition to molecular resolution, we find that RNASeq provides significantly improved power to define differential gene expression and identify alternative gene variants relative to microarray analysis. The comprehensive integration of differential gene expression between cell subsets with disease-association signals, and functional pathways provides insight into disease pathogenesis.
Collapse
|
14
|
Chen SH, Ip EH, Xu J, Sun J, Hsu FC. Using graded response model for the prediction of prostate cancer risk. Hum Genet 2012; 131:1327-36. [PMID: 22461065 DOI: 10.1007/s00439-012-1160-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2011] [Accepted: 03/21/2012] [Indexed: 12/16/2022]
Abstract
Disease risk-associated single nucleotide polymorphisms (SNPs) identified from genome-wide association studies (GWAS) have the potential to be used for disease risk prediction. An important feature of these risk-associated SNPs is their weak individual effect but stronger cumulative effect on disease risk. To date, a stable summary estimate of the joint effect of genetic variants on disease risk prediction is not available. In this study, we propose to use the graded response model (GRM), which is based on the item response theory, for estimating the individual risk that is associated with a set of SNPs. We compare the GRM with a recently proposed risk prediction model called cumulative relative risk (CRR). Thirty-three prostate cancer risk-associated SNPs were originally discovered in GWAS by December 2009. These SNPs were used to evaluate the performance of GRM and CRR for predicting prostate cancer risk in three GWAS populations, including populations from Sweden, Johns Hopkins Hospital, and the National Cancer Institute Cancer Genetic Markers of Susceptibility study. Computational results show that the risk prediction estimates of GRM, compared to CRR, are less biased and more stable.
Collapse
Affiliation(s)
- Shyh-Huei Chen
- Division of Public Health Sciences, Department of Biostatistical Sciences, Wake Forest School of Medicine, Wells Fargo Center 23rd floor, Medical Center Blvd, Winston-Salem, NC 27157, USA.
| | | | | | | | | |
Collapse
|
15
|
Predicting Disease Onset from Mutation Status Using Proband and Relative Data with Applications to Huntington's Disease. JOURNAL OF PROBABILITY AND STATISTICS 2012; 2012. [PMID: 23476655 DOI: 10.1155/2012/375935] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Huntington's disease (HD) is a progressive neurodegenerative disorder caused by an expansion of CAG repeats in the IT15 gene. The age-at-onset (AAO) of HD is inversely related to the CAG repeat length and the minimum length thought to cause HD is 36. Accurate estimation of the AAO distribution based on CAG repeat length is important for genetic counseling and the design of clinical trials. In the Cooperative Huntington's Observational Research Trial (COHORT) study, the CAG repeat length is known for the proband participants. However, whether a family member shares the huntingtin gene status (CAG expanded or not) with the proband is unknown. In this work, we use the expectation-maximization (EM) algorithm to handle the missing huntingtin gene information in first-degree family members in COHORT, assuming that a family member has the same CAG length as the proband if the family member carries a huntingtin gene mutation. We perform simulation studies to examine performance of the proposed method and apply the methods to analyze COHORT proband and family combined data. Our analyses reveal that the estimated cumulative risk of HD symptom onset obtained from the combined data is slightly lower than the risk estimated from the proband data alone.
Collapse
|
16
|
Kang J, Kugathasan S, Georges M, Zhao H, Cho JH. Improved risk prediction for Crohn's disease with a multi-locus approach. Hum Mol Genet 2011; 20:2435-42. [PMID: 21427131 PMCID: PMC3298027 DOI: 10.1093/hmg/ddr116] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2010] [Revised: 01/28/2011] [Accepted: 03/16/2011] [Indexed: 12/14/2022] Open
Abstract
Genome-wide association studies have identified numerous loci demonstrating genome-wide significant association with Crohn's disease. However, when many single nucleotide polymorphisms (SNPs) have weak-to-moderate disease risks, genetic risk prediction models based only on those markers that pass the most stringent statistical significance testing threshold may be suboptimal. Haplotype-based predictive models may provide advantages over single-SNP approaches by facilitating detection of associations driven by cis-interactions among nearby SNPs. In addition, these approaches may be helpful in assaying non-genotyped, rare causal variants. In this study, we investigated the use of two-marker haplotypes for risk prediction in Crohn's disease and show that it leads to improved prediction accuracy compared with single-point analyses. With large numbers of predictors, traditional classification methods such as logistic regression and support vector machine approaches may be suboptimal. An alternative approach is to apply the risk-score method calculated as the number of risk haplotypes an individual carries, both within and across loci. We used the area under the curve (AUC) of the receiver operating curve to assess the performance of prediction models in large-scale genetic data, and observed that the prediction performance in the validation cohort continues to improve as thousands of haplotypes are included in the model, with the AUC reaching its plateau at 0.72 at ∼7000 haplotypes, and begins to gradually decline after that point. In contrast, using the SNP as predictors, we only obtained maximum AUC of 0.65. Validation studies in independent cohorts further support improved prediction capacity with multi-marker, as opposed to single marker analyses.
Collapse
Affiliation(s)
- Jia Kang
- Department of Epidemiology and Public Health and
| | - Subra Kugathasan
- Pediatrics and Human Genetics, Emory University, Atlanta, GA, USA and
| | | | - Hongyu Zhao
- Department of Epidemiology and Public Health and
| | - Judy H. Cho
- Department of Medicine and Genetics, Yale University, New Haven, CT, USA
| |
Collapse
|