1
|
Raben TG, Lello L, Widen E, Hsu SDH. Biobank-scale methods and projections for sparse polygenic prediction from machine learning. Sci Rep 2023; 13:11662. [PMID: 37468507 PMCID: PMC10356957 DOI: 10.1038/s41598-023-37580-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 06/23/2023] [Indexed: 07/21/2023] Open
Abstract
In this paper we characterize the performance of linear models trained via widely-used sparse machine learning algorithms. We build polygenic scores and examine performance as a function of training set size, genetic ancestral background, and training method. We show that predictor performance is most strongly dependent on size of training data, with smaller gains from algorithmic improvements. We find that LASSO generally performs as well as the best methods, judged by a variety of metrics. We also investigate performance characteristics of predictors trained on one genetic ancestry group when applied to another. Using LASSO, we develop a novel method for projecting AUC and correlation as a function of data size (i.e., for new biobanks) and characterize the asymptotic limit of performance. Additionally, for LASSO (compressed sensing) we show that performance metrics and predictor sparsity are in agreement with theoretical predictions from the Donoho-Tanner phase transition. Specifically, a future predictor trained in the Taiwan Precision Medicine Initiative for asthma can achieve an AUC of [Formula: see text] and for height a correlation of [Formula: see text] for a Taiwanese population. This is above the measured values of [Formula: see text] and [Formula: see text], respectively, for UK Biobank trained predictors applied to a European population.
Collapse
Affiliation(s)
- Timothy G Raben
- Department of Physics and Astronomy, Michigan State University, Michigan, USA.
| | - Louis Lello
- Department of Physics and Astronomy, Michigan State University, Michigan, USA
- Genomic Prediction, Inc., North Brunswick, NJ, USA
| | - Erik Widen
- Department of Physics and Astronomy, Michigan State University, Michigan, USA
- Genomic Prediction, Inc., North Brunswick, NJ, USA
| | - Stephen D H Hsu
- Department of Physics and Astronomy, Michigan State University, Michigan, USA
- Genomic Prediction, Inc., North Brunswick, NJ, USA
| |
Collapse
|
2
|
Lello L, Hsu M, Widen E, Raben TG. Sibling variation in polygenic traits and DNA recombination mapping with UK Biobank and IVF family data. Sci Rep 2023; 13:376. [PMID: 36611071 PMCID: PMC9825593 DOI: 10.1038/s41598-023-27561-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 01/04/2023] [Indexed: 01/09/2023] Open
Abstract
We use UK Biobank and a unique IVF family dataset (including genotyped embryos) to investigate sibling variation in both phenotype and genotype. We compare phenotype (disease status, height, blood biomarkers) and genotype (polygenic scores, polygenic health index) distributions among siblings to those in the general population. As expected, the between-siblings standard deviation in polygenic scores is [Formula: see text] times smaller than in the general population, but variation is still significant. As previously demonstrated, this allows for substantial benefit from polygenic screening in IVF. Differences in sibling genotypes result from distinct recombination patterns in sexual reproduction. We develop a novel sibling-pair method for detection of recombination breaks via statistical discontinuities. The new method is used to construct a dataset of 1.44 million recombination events which may be useful in further study of meiosis.
Collapse
Affiliation(s)
- Louis Lello
- Genomic Prediction, Inc., North Brunswick, NJ, USA.
- Department of Physics and Astronomy, Michigan State University, East Lansing, USA.
| | - Maximus Hsu
- Genomic Prediction, Inc., North Brunswick, NJ, USA
| | - Erik Widen
- Genomic Prediction, Inc., North Brunswick, NJ, USA
- Department of Physics and Astronomy, Michigan State University, East Lansing, USA
| | - Timothy G Raben
- Department of Physics and Astronomy, Michigan State University, East Lansing, USA
| |
Collapse
|
3
|
Widen E, Lello L, Raben TG, Tellier LCAM, Hsu SDH. Polygenic Health Index, General Health, and Pleiotropy: Sibling Analysis and Disease Risk Reduction. Sci Rep 2022; 12:18173. [PMID: 36307513 PMCID: PMC9616929 DOI: 10.1038/s41598-022-22637-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 10/18/2022] [Indexed: 12/31/2022] Open
Abstract
We construct a polygenic health index as a weighted sum of polygenic risk scores for 20 major disease conditions, including, e.g., coronary artery disease, type 1 and 2 diabetes, schizophrenia, etc. Individual weights are determined by population-level estimates of impact on life expectancy. We validate this index in odds ratios and selection experiments using unrelated individuals and siblings (pairs and trios) from the UK Biobank. Individuals with higher index scores have decreased disease risk across almost all 20 diseases (no significant risk increases), and longer calculated life expectancy. When estimated Disability Adjusted Life Years (DALYs) are used as the performance metric, the gain from selection among ten individuals (highest index score vs average) is found to be roughly 4 DALYs. We find no statistical evidence for antagonistic trade-offs in risk reduction across these diseases. Correlations between genetic disease risks are found to be mostly positive and generally mild. These results have important implications for public health and also for fundamental issues such as pleiotropy and genetic architecture of human disease conditions.
Collapse
Affiliation(s)
- Erik Widen
- Department of Physics and Astronomy, Michigan State University, 567 Wilson Rd, East Lansing, MI, 48824, USA. .,Genomic Prediction, Inc., 671 US Highway One, North Brunswick, NJ, 08902, USA.
| | - Louis Lello
- Department of Physics and Astronomy, Michigan State University, 567 Wilson Rd, East Lansing, MI, 48824, USA. .,Genomic Prediction, Inc., 671 US Highway One, North Brunswick, NJ, 08902, USA.
| | - Timothy G Raben
- Department of Physics and Astronomy, Michigan State University, 567 Wilson Rd, East Lansing, MI, 48824, USA
| | - Laurent C A M Tellier
- Department of Physics and Astronomy, Michigan State University, 567 Wilson Rd, East Lansing, MI, 48824, USA.,Genomic Prediction, Inc., 671 US Highway One, North Brunswick, NJ, 08902, USA
| | - Stephen D H Hsu
- Department of Physics and Astronomy, Michigan State University, 567 Wilson Rd, East Lansing, MI, 48824, USA.,Genomic Prediction, Inc., 671 US Highway One, North Brunswick, NJ, 08902, USA
| |
Collapse
|
4
|
Raben TG, Lello L, Widen E, Hsu SDH. From Genotype to Phenotype: Polygenic Prediction of Complex Human Traits. Methods Mol Biol 2022; 2467:421-446. [PMID: 35451785 DOI: 10.1007/978-1-0716-2205-6_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Decoding the genome confers the capability to predict characteristics of the organism (phenotype) from DNA (genotype). We describe the present status and future prospects of genomic prediction of complex traits in humans. Some highly heritable complex phenotypes such as height and other quantitative traits can already be predicted with reasonable accuracy from DNA alone. For many diseases, including important common conditions such as coronary artery disease, breast cancer, type I and II diabetes, individuals with outlier polygenic scores (e.g., top few percent) have been shown to have 5 or even 10 times higher risk than average. Several psychiatric conditions such as schizophrenia and autism also fall into this category. We discuss related topics such as the genetic architecture of complex traits, sibling validation of polygenic scores, and applications to adult health, in vitro fertilization (embryo selection), and genetic engineering.
Collapse
Affiliation(s)
| | - Louis Lello
- Michigan State University, East Lansing, MI, USA
- Genomic Prediction, North Brunswick, NJ, USA
| | - Erik Widen
- Michigan State University, East Lansing, MI, USA
| | - Stephen D H Hsu
- Michigan State University, East Lansing, MI, USA.
- Genomic Prediction, North Brunswick, NJ, USA.
| |
Collapse
|
5
|
Tellier LCAM, Eccles J, Treff NR, Lello L, Fishel S, Hsu S. Embryo Screening for Polygenic Disease Risk: Recent Advances and Ethical Considerations. Genes (Basel) 2021; 12:1105. [PMID: 34440279 PMCID: PMC8393569 DOI: 10.3390/genes12081105] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 06/25/2021] [Accepted: 07/06/2021] [Indexed: 11/16/2022] Open
Abstract
Machine learning methods applied to large genomic datasets (such as those used in GWAS) have led to the creation of polygenic risk scores (PRSs) that can be used identify individuals who are at highly elevated risk for important disease conditions, such as coronary artery disease (CAD), diabetes, hypertension, breast cancer, and many more. PRSs have been validated in large population groups across multiple continents and are under evaluation for widespread clinical use in adult health. It has been shown that PRSs can be used to identify which of two individuals is at a lower disease risk, even when these two individuals are siblings from a shared family environment. The relative risk reduction (RRR) from choosing an embryo with a lower PRS (with respect to one chosen at random) can be quantified by using these sibling results. New technology for precise embryo genotyping allows more sophisticated preimplantation ranking with better results than the current method of selection that is based on morphology. We review the advances described above and discuss related ethical considerations.
Collapse
Affiliation(s)
- Laurent C. A. M. Tellier
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI 48824, USA; (L.C.A.M.T.); (S.H.)
- Genomic Prediction, Inc., North Brunswick, NJ 08902, USA; (J.E.); (N.R.T.)
| | - Jennifer Eccles
- Genomic Prediction, Inc., North Brunswick, NJ 08902, USA; (J.E.); (N.R.T.)
| | - Nathan R. Treff
- Genomic Prediction, Inc., North Brunswick, NJ 08902, USA; (J.E.); (N.R.T.)
| | - Louis Lello
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI 48824, USA; (L.C.A.M.T.); (S.H.)
- Genomic Prediction, Inc., North Brunswick, NJ 08902, USA; (J.E.); (N.R.T.)
| | - Simon Fishel
- CARE Fertility Group, Nottingham NG8 6PZ, UK;
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool L2 2QP, UK
| | - Stephen Hsu
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI 48824, USA; (L.C.A.M.T.); (S.H.)
- Genomic Prediction, Inc., North Brunswick, NJ 08902, USA; (J.E.); (N.R.T.)
| |
Collapse
|
6
|
Yong SY, Raben TG, Lello L, Hsu SDH. Genetic architecture of complex traits and disease risk predictors. Sci Rep 2020; 10:12055. [PMID: 32694572 PMCID: PMC7374622 DOI: 10.1038/s41598-020-68881-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Accepted: 06/30/2020] [Indexed: 01/30/2023] Open
Abstract
Genomic prediction of complex human traits (e.g., height, cognitive ability, bone density) and disease risks (e.g., breast cancer, diabetes, heart disease, atrial fibrillation) has advanced considerably in recent years. Using data from the UK Biobank, predictors have been constructed using penalized algorithms that favor sparsity: i.e., which use as few genetic variants as possible. We analyze the specific genetic variants (SNPs) utilized in these predictors, which can vary from dozens to as many as thirty thousand. We find that the fraction of SNPs in or near genic regions varies widely by phenotype. For the majority of disease conditions studied, a large amount of the variance is accounted for by SNPs outside of coding regions. The state of these SNPs cannot be determined from exome-sequencing data. This suggests that exome data alone will miss much of the heritability for these traits-i.e., existing PRS cannot be computed from exome data alone. We also study the fraction of SNPs and of variance that is in common between pairs of predictors. The DNA regions used in disease risk predictors so far constructed seem to be largely disjoint (with a few interesting exceptions), suggesting that individual genetic disease risks are largely uncorrelated. It seems possible in theory for an individual to be a low-risk outlier in all conditions simultaneously.
Collapse
Affiliation(s)
- Soke Yuen Yong
- Department of Physics and Astronomy, Michigan State University, East Lansing, USA.
| | - Timothy G Raben
- Department of Physics and Astronomy, Michigan State University, East Lansing, USA
| | - Louis Lello
- Department of Physics and Astronomy, Michigan State University, East Lansing, USA.,Genomic Prediction, North Brunswick, NJ, USA
| | - Stephen D H Hsu
- Department of Physics and Astronomy, Michigan State University, East Lansing, USA.,Genomic Prediction, North Brunswick, NJ, USA
| |
Collapse
|
7
|
Lello L, Raben TG, Yong SY, Tellier LCAM, Hsu SDH. Genomic Prediction of 16 Complex Disease Risks Including Heart Attack, Diabetes, Breast and Prostate Cancer. Sci Rep 2019; 9:15286. [PMID: 31653892 PMCID: PMC6814833 DOI: 10.1038/s41598-019-51258-x] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Accepted: 09/26/2019] [Indexed: 01/09/2023] Open
Abstract
We construct risk predictors using polygenic scores (PGS) computed from common Single Nucleotide Polymorphisms (SNPs) for a number of complex disease conditions, using L1-penalized regression (also known as LASSO) on case-control data from UK Biobank. Among the disease conditions studied are Hypothyroidism, (Resistant) Hypertension, Type 1 and 2 Diabetes, Breast Cancer, Prostate Cancer, Testicular Cancer, Gallstones, Glaucoma, Gout, Atrial Fibrillation, High Cholesterol, Asthma, Basal Cell Carcinoma, Malignant Melanoma, and Heart Attack. We obtain values for the area under the receiver operating characteristic curves (AUC) in the range ~0.58-0.71 using SNP data alone. Substantially higher predictor AUCs are obtained when incorporating additional variables such as age and sex. Some SNP predictors alone are sufficient to identify outliers (e.g., in the 99th percentile of polygenic score, or PGS) with 3-8 times higher risk than typical individuals. We validate predictors out-of-sample using the eMERGE dataset, and also with different ancestry subgroups within the UK Biobank population. Our results indicate that substantial improvements in predictive power are attainable using training sets with larger case populations. We anticipate rapid improvement in genomic prediction as more case-control data become available for analysis.
Collapse
Affiliation(s)
- Louis Lello
- Department of Physics and Astronomy, Michigan State University, East Lansing, Michigan, USA.
| | - Timothy G Raben
- Department of Physics and Astronomy, Michigan State University, East Lansing, Michigan, USA.
| | - Soke Yuen Yong
- Department of Physics and Astronomy, Michigan State University, East Lansing, Michigan, USA.
| | - Laurent C A M Tellier
- Genomic Prediction, North Brunswick, NJ, USA.
- Cognitive Genomics Laboratory, Shenzhen Key Laboratory of Neurogenomics, China National GeneBank, BGI-Shenzhen, Shenzhen, China.
| | - Stephen D H Hsu
- Department of Physics and Astronomy, Michigan State University, East Lansing, Michigan, USA.
- Genomic Prediction, North Brunswick, NJ, USA.
- Cognitive Genomics Laboratory, Shenzhen Key Laboratory of Neurogenomics, China National GeneBank, BGI-Shenzhen, Shenzhen, China.
| |
Collapse
|
8
|
Char DS. How should whole-genome sequencing be implemented in children? A consideration of the current limitations. Per Med 2016; 13:33-42. [DOI: 10.2217/pme.15.44] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
In children, whole-genome sequencing (WGS) is envisioned as a tool to improve diagnosis of undiagnosed diseases and to improve population-based screening. Pilot applications have shown benefits: genomic information has been used as a diagnostic aid; pharmacogenomics can reduce medicine-related adverse events; advanced knowledge of the potential for later-onset disease can target tests and appropriate therapies. However, emerging technical, conceptual and ethical challenges may limit WGS from fulfilling the current vision for future applications. WGS platforms still struggle with reliability and accuracy. The role of the genome in long-term organismal function and disease is still being established. Ethical implications of WGS in both undiagnosed disease and population screening, particularly potential impacts of testing on children and their families are still unresolved.
Collapse
Affiliation(s)
- Danton S Char
- Department of Anesthesiology, Stanford University School of Medicine, Division of Pediatric Cardiac Anesthesia, H3580, Stanford University Medical Center, 300 Pasteur Drive, Stanford, CA 94305, USA
| |
Collapse
|
9
|
Rafiq I, Freeman L. Clinical genomics and the adult with congenital heart disease: new opportunities. BRITISH HEART JOURNAL 2015; 101:242. [DOI: 10.1136/heartjnl-2014-306802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
|