1
|
Ren Z, Povysil G, Hostyk JA, Cui H, Bhardwaj N, Goldstein DB. ATAV: a comprehensive platform for population-scale genomic analyses. BMC Bioinformatics 2021; 22:149. [PMID: 33757430 PMCID: PMC7988908 DOI: 10.1186/s12859-021-04071-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 03/14/2021] [Indexed: 11/21/2022] Open
Abstract
Background A common approach for sequencing studies is to do joint-calling and store variants of all samples in a single file. If new samples are continually added or controls are re-used for several studies, the cost and time required to perform joint-calling for each analysis can become prohibitive. Results We present ATAV, an analysis platform for large-scale whole-exome and whole-genome sequencing projects. ATAV stores variant and per site coverage data for all samples in a centralized database, which is efficiently queried by ATAV to support diagnostic analyses for trios and singletons, as well as rare-variant collapsing analyses for finding disease associations in complex diseases. Runtime logs ensure full reproducibility and the modularized ATAV framework makes it extensible to continuous development. Besides helping with the identification of disease-causing variants for a range of diseases, ATAV has also enabled the discovery of disease-genes by rare-variant collapsing on datasets containing more than 20,000 samples. Analyses to date have been performed on data of more than 110,000 individuals demonstrating the scalability of the framework. To allow users to easily access variant-level data directly from the database, we provide a web-based interface, the ATAV data browser (http://atavdb.org/). Through this browser, summary-level data for more than 40,000 samples can be queried by the general public representing a mix of cases and controls of diverse ancestries. Users have access to phenotype categories of variant carriers, as well as predicted ancestry, gender, and quality metrics. In contrast to many other platforms, the data browser is able to show data of newly-added samples in real-time and therefore evolves rapidly as more and more samples are sequenced. Conclusions Through ATAV, users have public access to one of the largest variant databases for patients sequenced at a tertiary care center and can look up any genes or variants of interest. Additionally, since the entire code is freely available on GitHub, ATAV can easily be deployed by other groups that wish to build their own platform, database, and user interface.
Collapse
Affiliation(s)
- Zhong Ren
- Institute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY, 10032, USA.
| | - Gundula Povysil
- Institute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Joseph A Hostyk
- Institute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Hongzhu Cui
- Institute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Nitin Bhardwaj
- Institute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - David B Goldstein
- Institute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY, 10032, USA
| |
Collapse
|
2
|
Eade K, Gantner ML, Hostyk JA, Nagasaki T, Giles S, Fallon R, Harkins-Perry S, Baldini M, Lim EW, Scheppke L, Dorrell MI, Cai C, Baugh EH, Wolock CJ, Wallace M, Berlow RB, Goldstein DB, Metallo CM, Friedlander M, Allikmets R. Serine biosynthesis defect due to haploinsufficiency of PHGDH causes retinal disease. Nat Metab 2021; 3:366-377. [PMID: 33758422 PMCID: PMC8084205 DOI: 10.1038/s42255-021-00361-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/07/2020] [Accepted: 02/10/2021] [Indexed: 02/08/2023]
Abstract
Macular telangiectasia type 2 (MacTel) is a progressive, late-onset retinal degenerative disease linked to decreased serum levels of serine that elevate circulating levels of a toxic ceramide species, deoxysphingolipids (deoxySLs); however, causal genetic variants that reduce serine levels in patients have not been identified. Here we identify rare, functional variants in the gene encoding the rate-limiting serine biosynthetic enzyme, phosphoglycerate dehydrogenase (PHGDH), as the single locus accounting for a significant fraction of MacTel. Under a dominant collapsing analysis model of a genome-wide enrichment analysis of rare variants predicted to impact protein function in 793 MacTel cases and 17,610 matched controls, the PHGDH gene achieves genome-wide significance (P = 1.2 × 10-13) with variants explaining ~3.2% of affected individuals. We further show that the resulting functional defects in PHGDH cause decreased serine biosynthesis and accumulation of deoxySLs in retinal pigmented epithelial cells. PHGDH is a significant locus for MacTel that explains the typical disease phenotype and suggests a number of potential treatment options.
Collapse
Affiliation(s)
- Kevin Eade
- Lowy Medical Research Institute, La Jolla, CA, USA
| | | | - Joseph A Hostyk
- Institute for Genomic Medicine, Columbia University, New York, NY, USA
| | | | - Sarah Giles
- Lowy Medical Research Institute, La Jolla, CA, USA
| | - Regis Fallon
- Lowy Medical Research Institute, La Jolla, CA, USA
| | - Sarah Harkins-Perry
- Lowy Medical Research Institute, La Jolla, CA, USA
- The Scripps Research Institute, La Jolla, CA, USA
| | - Michelle Baldini
- Department of Bioengineering, University of California, San Diego, CA, USA
| | - Esther W Lim
- Department of Bioengineering, University of California, San Diego, CA, USA
| | - Lea Scheppke
- Lowy Medical Research Institute, La Jolla, CA, USA
| | | | - Carolyn Cai
- Department of Ophthalmology, Columbia University, New York, NY, USA
| | - Evan H Baugh
- Institute for Genomic Medicine, Columbia University, New York, NY, USA
| | - Charles J Wolock
- Institute for Genomic Medicine, Columbia University, New York, NY, USA
| | - Martina Wallace
- Department of Bioengineering, University of California, San Diego, CA, USA
| | | | - David B Goldstein
- Institute for Genomic Medicine, Columbia University, New York, NY, USA
| | | | - Martin Friedlander
- Lowy Medical Research Institute, La Jolla, CA, USA
- The Scripps Research Institute, La Jolla, CA, USA
- Scripps Clinic Medical Group, La Jolla, CA, USA
| | - Rando Allikmets
- Department of Ophthalmology, Columbia University, New York, NY, USA.
- Department of Pathology and Cell Biology, Columbia University, New York, NY, USA.
| |
Collapse
|