1
|
Lu Y, Oliva M, Pierce BL, Liu J, Chen LS. Integrative cross-omics and cross-context analysis elucidates molecular links underlying genetic effects on complex traits. Nat Commun 2024; 15:2383. [PMID: 38493154 PMCID: PMC10944527 DOI: 10.1038/s41467-024-46675-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 03/06/2024] [Indexed: 03/18/2024] Open
Abstract
Genetic effects on functionally related 'omic' traits often co-occur in relevant cellular contexts, such as tissues. Motivated by the multi-tissue methylation quantitative trait loci (mQTLs) and expression QTLs (eQTLs) analysis, we propose X-ING (Cross-INtegrative Genomics) for cross-omics and cross-context integrative analysis. X-ING takes as input multiple matrices of association statistics, each obtained from different omics data types across multiple cellular contexts. It models the latent binary association status of each statistic, captures the major association patterns among omics data types and contexts, and outputs the posterior mean and probability for each input statistic. X-ING enables the integration of effects from different omics data with varying effect distributions. In the multi-tissue cis-association analysis, X-ING shows improved detection and replication of mQTLs by integrating eQTL maps. In the trans-association analysis, X-ING reveals an enrichment of trans-associations in many disease/trait-relevant tissues.
Collapse
Affiliation(s)
- Yihao Lu
- Department of Public Health Sciences, The University of Chicago, Chicago, IL, USA
| | - Meritxell Oliva
- Department of Public Health Sciences, The University of Chicago, Chicago, IL, USA
- Genomics Research Center, AbbVie, North Chicago, IL, USA
| | - Brandon L Pierce
- Department of Public Health Sciences, The University of Chicago, Chicago, IL, USA
| | - Jin Liu
- School of Data Science, The Chinese University of Hong Kong-Shenzhen, Shenzhen, China.
| | - Lin S Chen
- Department of Public Health Sciences, The University of Chicago, Chicago, IL, USA.
| |
Collapse
|
2
|
Zhao T, Wang F, Mott R, Dekkers J, Cheng H. Using encrypted genotypes and phenotypes for collaborative genomic analyses to maintain data confidentiality. Genetics 2024; 226:iyad210. [PMID: 38085098 PMCID: PMC11090459 DOI: 10.1093/genetics/iyad210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 11/13/2023] [Indexed: 03/08/2024] Open
Abstract
To adhere to and capitalize on the benefits of the FAIR (findable, accessible, interoperable, and reusable) principles in agricultural genome-to-phenome studies, it is crucial to address privacy and intellectual property issues that prevent sharing and reuse of data in research and industry. Direct sharing of genotype and phenotype data is often prohibited due to intellectual property and privacy concerns. Thus, there is a pressing need for encryption methods that obscure confidential aspects of the data, without affecting the outcomes of certain statistical analyses. A homomorphic encryption method for genotypes and phenotypes (HEGP) has been proposed for single-marker regression in genome-wide association studies (GWAS) using linear mixed models with Gaussian errors. This methodology permits frequentist likelihood-based parameter estimation and inference. In this paper, we extend HEGP to broader applications in genome-to-phenome analyses. We show that HEGP is suited to commonly used linear mixed models for genetic analyses of quantitative traits including genomic best linear unbiased prediction (GBLUP) and ridge-regression best linear unbiased prediction (RR-BLUP), as well as Bayesian variable selection methods (e.g. those in Bayesian Alphabet), for genetic parameter estimation, genomic prediction, and GWAS. By advancing the capabilities of HEGP, we offer researchers and industry professionals a secure and efficient approach for collaborative genomic analyses while preserving data confidentiality.
Collapse
Affiliation(s)
- Tianjing Zhao
- Department of Animal Science, University of California, Davis, CA 95616, USA
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, NE 68583, USA
| | - Fangyi Wang
- Department of Plant Sciences, University of California, Davis, CA 95616, USA
| | - Richard Mott
- Genetics Institute, University College London, London, WC1E 6BT, UK
| | - Jack Dekkers
- Department of Animal Science, Iowa State University, Ames, IA 50011, USA
| | - Hao Cheng
- Department of Animal Science, University of California, Davis, CA 95616, USA
| |
Collapse
|
3
|
Miao DNR, Ladha F, Lyle SM, Olivier DW, Ahmed S, Drögemöller BI. Current Perspectives on Data Sharing and Open Science in Pharmacogenomics. Clin Pharmacol Ther 2024; 115:408-411. [PMID: 38087986 DOI: 10.1002/cpt.3115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 11/21/2023] [Indexed: 02/17/2024]
Affiliation(s)
- Deanne Nixie R Miao
- Department of Biochemistry and Medical Genetics, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Feryal Ladha
- Department of Biochemistry and Medical Genetics, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Sarah M Lyle
- Department of Biochemistry and Medical Genetics, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Daniel W Olivier
- Department of Biochemistry and Medical Genetics, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
- Department of Physiological Sciences, Stellenbosch University, Stellenbosch, Western Cape, South Africa
| | - Samah Ahmed
- Department of Biochemistry and Medical Genetics, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Britt I Drögemöller
- Department of Biochemistry and Medical Genetics, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
- Paul Albrechtsen Research Institute CancerCare Manitoba Research, Winnipeg, Manitoba, Canada
- Children's Hospital Research Institute of Manitoba, Winnipeg, Manitoba, Canada
- Centre on Aging, Winnipeg, Manitoba, Canada
| |
Collapse
|
4
|
Venkatesaramani R, Wan Z, Malin BA, Vorobeychik Y. Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics. Genome Res 2023; 33:1113-1123. [PMID: 37217251 PMCID: PMC10538482 DOI: 10.1101/gr.277674.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 04/20/2023] [Indexed: 05/24/2023]
Abstract
The collection and sharing of genomic data are becoming increasingly commonplace in research, clinical, and direct-to-consumer settings. The computational protocols typically adopted to protect individual privacy include sharing summary statistics, such as allele frequencies, or limiting query responses to the presence/absence of alleles of interest using web services called Beacons. However, even such limited releases are susceptible to likelihood ratio-based membership-inference attacks. Several approaches have been proposed to preserve privacy, which either suppress a subset of genomic variants or modify query responses for specific variants (e.g., adding noise, as in differential privacy). However, many of these approaches result in a significant utility loss, either suppressing many variants or adding a substantial amount of noise. In this paper, we introduce optimization-based approaches to explicitly trade off the utility of summary data or Beacon responses and privacy with respect to membership-inference attacks based on likelihood ratios, combining variant suppression and modification. We consider two attack models. In the first, an attacker applies a likelihood ratio test to make membership-inference claims. In the second model, an attacker uses a threshold that accounts for the effect of the data release on the separation in scores between individuals in the data set and those who are not. We further introduce highly scalable approaches for approximately solving the privacy-utility tradeoff problem when information is in the form of either summary statistics or presence/absence queries. Finally, we show that the proposed approaches outperform the state of the art in both utility and privacy through an extensive evaluation with public data sets.
Collapse
Affiliation(s)
| | - Zhiyu Wan
- Vanderbilt University Medical Center, Nashville, Tennessee 37212, USA
| | - Bradley A Malin
- Vanderbilt University Medical Center, Nashville, Tennessee 37212, USA
| | | |
Collapse
|
5
|
Fu S, Purdue MP, Zhang H, Qin J, Song L, Berndt SI, Yu K. Improve the model of disease subtype heterogeneity by leveraging external summary data. PLoS Comput Biol 2023; 19:e1011236. [PMID: 37437002 DOI: 10.1371/journal.pcbi.1011236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 06/02/2023] [Indexed: 07/14/2023] Open
Abstract
Researchers are often interested in understanding the disease subtype heterogeneity by testing whether a risk exposure has the same level of effect on different disease subtypes. The polytomous logistic regression (PLR) model provides a flexible tool for such an evaluation. Disease subtype heterogeneity can also be investigated with a case-only study that uses a case-case comparison procedure to directly assess the difference between risk effects on two disease subtypes. Motivated by a large consortium project on the genetic basis of non-Hodgkin lymphoma (NHL) subtypes, we develop PolyGIM, a procedure to fit the PLR model by integrating individual-level data with summary data extracted from multiple studies under different designs. The summary data consist of coefficient estimates from working logistic regression models established by external studies. Examples of the working model include the case-case comparison model and the case-control comparison model, which compares the control group with a subtype group or a broad disease group formed by merging several subtypes. PolyGIM efficiently evaluates risk effects and provides a powerful test for disease subtype heterogeneity in situations when only summary data, instead of individual-level data, is available from external studies due to various informatics and privacy constraints. We investigate the theoretic properties of PolyGIM and use simulation studies to demonstrate its advantages. Using data from eight genome-wide association studies within the NHL consortium, we apply it to study the effect of the polygenic risk score defined by a lymphoid malignancy on the risks of four NHL subtypes. These results show that PolyGIM can be a valuable tool for pooling data from multiple sources for a more coherent evaluation of disease subtype heterogeneity.
Collapse
Affiliation(s)
- Sheng Fu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Mark P Purdue
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Han Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Jing Qin
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Lei Song
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Sonja I Berndt
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Kai Yu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| |
Collapse
|
6
|
Sun L, Wang Z, Lu T, Manolio TA, Paterson AD. eXclusionarY: 10 years later, where are the sex chromosomes in GWASs? Am J Hum Genet 2023; 110:903-912. [PMID: 37267899 DOI: 10.1016/j.ajhg.2023.04.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023] Open
Abstract
10 years ago, a detailed analysis showed that only 33% of genome-wide association study (GWAS) results included the X chromosome. Multiple recommendations were made to combat such exclusion. Here, we re-surveyed the research landscape to determine whether these earlier recommendations had been translated. Unfortunately, among the genome-wide summary statistics reported in 2021 in the NHGRI-EBI GWAS Catalog, only 25% provided results for the X chromosome and 3% for the Y chromosome, suggesting that the exclusion phenomenon not only persists but has also expanded into an exclusionary problem. Normalizing by physical length of the chromosome, the average number of studies published through November 2022 with genome-wide-significant findings on the X chromosome is ∼1 study/Mb. By contrast, it ranges from ∼6 to ∼16 studies/Mb for chromosomes 4 and 19, respectively. Compared with the autosomal growth rate of ∼0.086 studies/Mb/year over the last decade, studies of the X chromosome grew at less than one-seventh that rate, only ∼0.012 studies/Mb/year. Among the studies that reported significant associations on the X chromosome, we noted extreme heterogeneities in data analysis and reporting of results, suggesting the need for clear guidelines. Unsurprisingly, among the 430 scores sampled from the PolyGenic Score Catalog, 0% contained weights for sex chromosomal SNPs. To overcome the dearth of sex chromosome analyses, we provide five sets of recommendations and future directions. Finally, until the sex chromosomes are included in a whole-genome study, instead of GWASs, we propose such studies would more properly be referred to as "AWASs," meaning "autosome-wide scans."
Collapse
Affiliation(s)
- Lei Sun
- Department of Statistical Sciences, Faculty of Arts and Science, University of Toronto, Toronto, ON, Canada; Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.
| | - Zhong Wang
- Department of Statistics and Data Science, Faculty of Science, National University of Singapore, Singapore
| | - Tianyuan Lu
- Department of Statistical Sciences, Faculty of Arts and Science, University of Toronto, Toronto, ON, Canada; Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC, Canada
| | - Teri A Manolio
- Division of Genomic Medicine, National Human Genome Research Institute, NIH, Bethesda, MD, USA
| | - Andrew D Paterson
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada; Division of Epidemiology, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada; Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada.
| |
Collapse
|
7
|
Bernasconi A, Canakoglu A, Comolli F. Processing genome-wide association studies within a repository of heterogeneous genomic datasets. BMC Genom Data 2023; 24:13. [PMID: 36869294 PMCID: PMC9985298 DOI: 10.1186/s12863-023-01111-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 02/02/2023] [Indexed: 03/05/2023] Open
Abstract
BACKGROUND Genome Wide Association Studies (GWAS) are based on the observation of genome-wide sets of genetic variants - typically single-nucleotide polymorphisms (SNPs) - in different individuals that are associated with phenotypic traits. Research efforts have so far been directed to improving GWAS techniques rather than on making the results of GWAS interoperable with other genomic signals; this is currently hindered by the use of heterogeneous formats and uncoordinated experiment descriptions. RESULTS To practically facilitate integrative use, we propose to include GWAS datasets within the META-BASE repository, exploiting an integration pipeline previously studied for other genomic datasets that includes several heterogeneous data types in the same format, queryable from the same systems. We represent GWAS SNPs and metadata by means of the Genomic Data Model and include metadata within a relational representation by extending the Genomic Conceptual Model with a dedicated view. To further reduce the gap with the descriptions of other signals in the repository of genomic datasets, we perform a semantic annotation of phenotypic traits. Our pipeline is demonstrated using two important data sources, initially organized according to different data models: the NHGRI-EBI GWAS Catalog and FinnGen (University of Helsinki). The integration effort finally allows us to use these datasets within multi-sample processing queries that respond to important biological questions. These are then made usable for multi-omic studies together with, e.g., somatic and reference mutation data, genomic annotations, epigenetic signals. CONCLUSIONS As a result of the our work on GWAS datasets, we enable 1) their interoperable use with several other homogenized and processed genomic datasets in the context of the META-BASE repository; 2) their big data processing by means of the GenoMetric Query Language and associated system. Future large-scale tertiary data analysis may extensively benefit from the addition of GWAS results to inform several different downstream analysis workflows.
Collapse
Affiliation(s)
- Anna Bernasconi
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, 20133 Milano, Italy
| | - Arif Canakoglu
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, 20133 Milano, Italy
| | - Federico Comolli
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, 20133 Milano, Italy
| |
Collapse
|
8
|
Reales G, Wallace C. Sharing GWAS summary statistics results in more citations. Commun Biol 2023; 6:116. [PMID: 36709395 PMCID: PMC9884206 DOI: 10.1038/s42003-023-04497-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 01/17/2023] [Indexed: 01/29/2023] Open
Abstract
A review of citation rates from genomic studies in the GWAS Catalog suggests that sharing summary statistics results, on average, in ~81.8% more citations, highlighting a benefit of publicly sharing GWAS summary statistics.
Collapse
Affiliation(s)
- Guillermo Reales
- Cambridge Institute of Therapeutic Immunology and Infectious Disease (CITIID), University of Cambridge, Cambridge, UK. .,Department of Medicine, University of Cambridge, Cambridge, UK.
| | - Chris Wallace
- grid.5335.00000000121885934Cambridge Institute of Therapeutic Immunology and Infectious Disease (CITIID), University of Cambridge, Cambridge, UK ,grid.5335.00000000121885934Department of Medicine, University of Cambridge, Cambridge, UK ,grid.5335.00000000121885934MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| |
Collapse
|
9
|
Falola O, Adam Y, Ajayi O, Kumuthini J, Adewale S, Mosaku A, Samtal C, Adebayo G, Emmanuel J, Tchamga MSS, Erondu U, Nehemiah A, Rasaq S, Ajayi M, Akanle B, Oladipo O, Isewon I, Adebiyi M, Oyelade J, Adebiyi E. SysBiolPGWAS: simplifying post-GWAS analysis through the use of computational technologies and integration of diverse omics datasets. Bioinformatics 2023; 39:btac791. [PMID: 36477976 PMCID: PMC9825739 DOI: 10.1093/bioinformatics/btac791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 10/28/2022] [Accepted: 12/07/2022] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Post-genome-wide association studies (pGWAS) analysis is designed to decipher the functional consequences of significant single-nucleotide polymorphisms (SNPs) in the era of GWAS. This can be translated into research insights and clinical benefits such as the effectiveness of strategies for disease screening, treatment and prevention. However, the setup of pGWAS (pGWAS) tools can be quite complicated, and it mostly requires big data. The challenge however is, scientists are required to have sufficient experience with several of these technically complex and complicated tools in order to complete the pGWAS analysis. RESULTS We present SysBiolPGWAS, a pGWAS web application that provides a comprehensive functionality for biologists and non-bioinformaticians to conduct several pGWAS analyses to overcome the above challenges. It provides unique functionalities for analysis involving multi-omics datasets and visualization using various bioinformatics tools. SysBiolPGWAS provides access to individual pGWAS tools and a novel custom pGWAS pipeline that integrates several individual pGWAS tools and data. The SysBiolPGWAS app was developed to be a one-stop shop for pGWAS analysis. It targets researchers in the area of the human genome and performs its analysis mainly in the autosomal chromosomes. AVAILABILITY AND IMPLEMENTATION SysBiolPGWAS web app was developed using JavaScript/TypeScript web frameworks and is available at: https://spgwas.waslitbre.org/. All codes are available in this GitHub repository https://github.com/covenant-university-bioinformatics.
Collapse
Affiliation(s)
- Oluwadamilare Falola
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State 112104, Nigeria
| | - Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State 112104, Nigeria
| | - Olabode Ajayi
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town 7535, Republic of South Africa
| | - Judit Kumuthini
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town 7535, Republic of South Africa
| | - Suraju Adewale
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State 112104, Nigeria
| | - Abayomi Mosaku
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State 112104, Nigeria
| | - Chaimae Samtal
- Laboratory of Biotechnology, Environment, Agri-food and Health, Faculty of Sciences Dhar El Mahraz, Sidi Mohammed Ben Abdellah University, Fez 30000, Morocco
| | - Glory Adebayo
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State 112104, Nigeria
- Department of Biological Sciences, Covenant University, Ota, Ogun State 112104, Nigeria
| | - Jerry Emmanuel
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State 112104, Nigeria
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State 112104, Nigeria
| | - Milaine S S Tchamga
- African Institute for Mathematical Sciences (AIMS), Muizenberg, Cape Town 7945, South Africa
| | - Udochukwu Erondu
- Department of Computer Science, Landmark University, Omu-Aran, Kwara State 251103, Nigeria
| | - Adebayo Nehemiah
- Department of Computer Science, Landmark University, Omu-Aran, Kwara State 251103, Nigeria
| | - Suraj Rasaq
- Department of Computer Science, Landmark University, Omu-Aran, Kwara State 251103, Nigeria
| | - Mary Ajayi
- Department of Computer Science, Landmark University, Omu-Aran, Kwara State 251103, Nigeria
| | - Bola Akanle
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State 112104, Nigeria
- Center for System and Information Services, Covenant University, Ota, Ogun State 112104, Nigeria
- Covenant Applied Informatics and Communication Africa Center of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State 112104, Nigeria
| | - Olaleye Oladipo
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State 112104, Nigeria
- Center for System and Information Services, Covenant University, Ota, Ogun State 112104, Nigeria
- Covenant Applied Informatics and Communication Africa Center of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State 112104, Nigeria
| | - Itunuoluwa Isewon
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State 112104, Nigeria
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State 112104, Nigeria
- Covenant Applied Informatics and Communication Africa Center of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State 112104, Nigeria
| | - Marion Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State 112104, Nigeria
- Department of Computer Science, Landmark University, Omu-Aran, Kwara State 251103, Nigeria
- Covenant Applied Informatics and Communication Africa Center of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State 112104, Nigeria
| | - Jelili Oyelade
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State 112104, Nigeria
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State 112104, Nigeria
- Covenant Applied Informatics and Communication Africa Center of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State 112104, Nigeria
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State 112104, Nigeria
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State 112104, Nigeria
- Covenant Applied Informatics and Communication Africa Center of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State 112104, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg 69120, Germany
| |
Collapse
|
10
|
Sollis E, Mosaku A, Abid A, Buniello A, Cerezo M, Gil L, Groza T, Güneş O, Hall P, Hayhurst J, Ibrahim A, Ji Y, John S, Lewis E, MacArthur JL, McMahon A, Osumi-Sutherland D, Panoutsopoulou K, Pendlington Z, Ramachandran S, Stefancsik R, Stewart J, Whetzel P, Wilson R, Hindorff L, Cunningham F, Lambert S, Inouye M, Parkinson H, Harris L. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res 2022; 51:D977-D985. [PMID: 36350656 PMCID: PMC9825413 DOI: 10.1093/nar/gkac1010] [Citation(s) in RCA: 332] [Impact Index Per Article: 166.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/13/2022] [Accepted: 10/20/2022] [Indexed: 11/11/2022] Open
Abstract
The NHGRI-EBI GWAS Catalog (www.ebi.ac.uk/gwas) is a FAIR knowledgebase providing detailed, structured, standardised and interoperable genome-wide association study (GWAS) data to >200 000 users per year from academic research, healthcare and industry. The Catalog contains variant-trait associations and supporting metadata for >45 000 published GWAS across >5000 human traits, and >40 000 full P-value summary statistics datasets. Content is curated from publications or acquired via author submission of prepublication summary statistics through a new submission portal and validation tool. GWAS data volume has vastly increased in recent years. We have updated our software to meet this scaling challenge and to enable rapid release of submitted summary statistics. The scope of the repository has expanded to include additional data types of high interest to the community, including sequencing-based GWAS, gene-based analyses and copy number variation analyses. Community outreach has increased the number of shared datasets from under-represented traits, e.g. cancer, and we continue to contribute to awareness of the lack of population diversity in GWAS. Interoperability of the Catalog has been enhanced through links to other resources including the Polygenic Score Catalog and the International Mouse Phenotyping Consortium, refinements to GWAS trait annotation, and the development of a standard format for GWAS data.
Collapse
Affiliation(s)
- Elliot Sollis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Abayomi Mosaku
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ala Abid
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Annalisa Buniello
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Maria Cerezo
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Laurent Gil
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK,Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
| | - Tudor Groza
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Osman Güneş
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peggy Hall
- Division of Genomic Medicine, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - James Hayhurst
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Arwa Ibrahim
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Yue Ji
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sajo John
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Elizabeth Lewis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jacqueline A L MacArthur
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Aoife McMahon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David Osumi-Sutherland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kalliope Panoutsopoulou
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zoë Pendlington
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Santhi Ramachandran
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ray Stefancsik
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan Stewart
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Patricia Whetzel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Robert Wilson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Lucia Hindorff
- Division of Genomic Medicine, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Samuel A Lambert
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK,Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK,British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK,Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
| | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK,British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK,Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK,Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Australia
| | - Helen Parkinson
- To whom correspondence should be addressed. Tel: +44 1223 49 4672;
| | - Laura W Harris
- Correspondence may also be addressed to Laura W. Harris. Tel: +44 1223 49 4354;
| |
Collapse
|
11
|
Truong VQ, Woerner JA, Cherlin TA, Bradford Y, Lucas AM, Okeh CC, Shivakumar MK, Hui DH, Kumar R, Pividori M, Jones SC, Bossa AC, Turner SD, Ritchie MD, Verma SS. Quality Control Procedures for Genome-Wide Association Studies. Curr Protoc 2022; 2:e603. [PMID: 36441943 DOI: 10.1002/cpz1.603] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Genome-wide association studies (GWAS) are being conducted at an unprecedented rate in population-based cohorts and have increased our understanding of the pathophysiology of many complex diseases. Regardless of the context, the practical utility of this information ultimately depends upon the quality of the data used for statistical analyses. Quality control (QC) procedures for GWAS are constantly evolving. Here, we enumerate some of the challenges in QC of genotyped GWAS data and describe the approaches involving genotype imputation of a sample dataset along with post-imputation quality assurance, thereby minimizing potential bias and error in GWAS results. We discuss common issues associated with QC of the GWAS data (genotyped and imputed), including data file formats, software packages for data manipulation and analysis, sex chromosome anomalies, sample identity, sample relatedness, population substructure, batch effects, and marker quality. We provide detailed guidelines along with a sample dataset to suggest current best practices and discuss areas of ongoing and future research. © 2022 Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Van Q Truong
- Genomics and Computational Biology Graduate Group, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
- Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Jakob A Woerner
- Genomics and Computational Biology Graduate Group, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
- Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Tess A Cherlin
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Yuki Bradford
- Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Anastasia M Lucas
- Genomics and Computational Biology Graduate Group, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
- Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Chelsea C Okeh
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Manu K Shivakumar
- Genomics and Computational Biology Graduate Group, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
- Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Daniel H Hui
- Genomics and Computational Biology Graduate Group, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
- Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Rachit Kumar
- Genomics and Computational Biology Graduate Group, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
- Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Milton Pividori
- Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - S Chris Jones
- Genomics and Computational Biology Graduate Group, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
- Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Abigail C Bossa
- Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | | | - Marylyn D Ritchie
- Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Shefali S Verma
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| |
Collapse
|
12
|
Matushyn M, Bose M, Mahmoud AA, Cuthbertson L, Tello C, Bircan KO, Terpolovsky A, Bamunusinghe V, Khan U, Novković B, Grabherr MG, Yazdi PG. SumStatsRehab: an efficient algorithm for GWAS summary statistics assessment and restoration. BMC Bioinformatics 2022; 23:443. [PMID: 36284273 PMCID: PMC9594936 DOI: 10.1186/s12859-022-04920-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 06/06/2022] [Indexed: 11/23/2022] Open
Abstract
Background Generating polygenic risk scores for diseases and complex traits requires high quality GWAS summary statistic files. Often, these files can be difficult to acquire either as a result of unshared or incomplete data. To date, bioinformatics tools which focus on restoring missing columns containing identification and association data are limited, which has the potential to increase the number of usable GWAS summary statistics files. Results SumStatsRehab was able to restore rsID, effect/other alleles, chromosome, base pair position, effect allele frequencies, beta, standard error, and p-values to a better extent than any other currently available tool, with minimal loss. Conclusions SumStatsRehab offers a unique tool utilizing both functional programming and pipeline-like architecture, allowing users to generate accurate data restorations for incomplete summary statistics files. This in turn, increases the number of usable GWAS summary statistics files, which may be invaluable for less researched health traits.
Collapse
Affiliation(s)
- Mykyta Matushyn
- SelfDecode.Com, 1031 Ives Dairy Road Suite 228 - 1047, Miami, FL, 33179, USA
| | - Madhuchanda Bose
- SelfDecode.Com, 1031 Ives Dairy Road Suite 228 - 1047, Miami, FL, 33179, USA
| | | | - Lewis Cuthbertson
- SelfDecode.Com, 1031 Ives Dairy Road Suite 228 - 1047, Miami, FL, 33179, USA
| | - Carlos Tello
- SelfDecode.Com, 1031 Ives Dairy Road Suite 228 - 1047, Miami, FL, 33179, USA
| | - Karatuğ Ozan Bircan
- SelfDecode.Com, 1031 Ives Dairy Road Suite 228 - 1047, Miami, FL, 33179, USA
| | - Andrew Terpolovsky
- SelfDecode.Com, 1031 Ives Dairy Road Suite 228 - 1047, Miami, FL, 33179, USA
| | - Varuna Bamunusinghe
- SelfDecode.Com, 1031 Ives Dairy Road Suite 228 - 1047, Miami, FL, 33179, USA
| | - Umar Khan
- SelfDecode.Com, 1031 Ives Dairy Road Suite 228 - 1047, Miami, FL, 33179, USA
| | - Biljana Novković
- SelfDecode.Com, 1031 Ives Dairy Road Suite 228 - 1047, Miami, FL, 33179, USA
| | - Manfred G Grabherr
- SelfDecode.Com, 1031 Ives Dairy Road Suite 228 - 1047, Miami, FL, 33179, USA
| | - Puya G Yazdi
- SelfDecode.Com, 1031 Ives Dairy Road Suite 228 - 1047, Miami, FL, 33179, USA.
| |
Collapse
|
13
|
Privé F, Arbel J, Aschard H, Vilhjálmsson BJ. Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores. HGG ADVANCES 2022; 3:100136. [PMID: 36105883 PMCID: PMC9465343 DOI: 10.1016/j.xhgg.2022.100136] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 08/11/2022] [Indexed: 11/18/2022] Open
Abstract
Publicly available genome-wide association studies (GWAS) summary statistics exhibit uneven quality, which can impact the validity of follow-up analyses. First, we present an overview of possible misspecifications that come with GWAS summary statistics. Then, in both simulations and real-data analyses, we show that additional information such as imputation INFO scores, allele frequencies, and per-variant sample sizes in GWAS summary statistics can be used to detect possible issues and correct for misspecifications in the GWAS summary statistics. One important motivation for us is to improve the predictive performance of polygenic scores built from these summary statistics. Unfortunately, owing to the lack of reporting standards for GWAS summary statistics, this additional information is not systematically reported. We also show that using well-matched linkage disequilibrium (LD) references can improve model fit and translate into more accurate prediction. Finally, we discuss how to make polygenic score methods such as lassosum and LDpred2 more robust to these misspecifications to improve their predictive power.
Collapse
Affiliation(s)
- Florian Privé
- National Centre for Register-Based Research, Aarhus University, 8210 Aarhus, Denmark
- Corresponding author
| | - Julyan Arbel
- Université Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| | - Hugues Aschard
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, 75015 Paris, France
- Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Bjarni J. Vilhjálmsson
- National Centre for Register-Based Research, Aarhus University, 8210 Aarhus, Denmark
- Bioinformatics Research Centre, Aarhus University, 8000 Aarhus, Denmark
| |
Collapse
|
14
|
Ruan E, Nemeth E, Moffitt R, Sandoval L, Machiela MJ, Freedman ND, Huang WY, Wong W, Chen KL, Park B, Jiang K, Hicks B, Liu J, Russ D, Minasian L, Pinsky P, Chanock SJ, Garcia-Closas M, Almeida JS. PLCOjs, a FAIR GWAS web SDK for the NCI Prostate, Lung, Colorectal and Ovarian Cancer Genetic Atlas project. Bioinformatics 2022; 38:4434-4436. [PMID: 35900159 PMCID: PMC9890300 DOI: 10.1093/bioinformatics/btac531] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 07/11/2022] [Accepted: 07/25/2022] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION The Division of Cancer Epidemiology and Genetics (DCEG) and the Division of Cancer Prevention (DCP) at the National Cancer Institute (NCI) have recently generated genome-wide association study (GWAS) data for multiple traits in the Prostate, Lung, Colorectal, and Ovarian (PLCO) Genomic Atlas project. The GWAS included 110 000 participants. The dissemination of the genetic association data through a data portal called GWAS Explorer, in a manner that addresses the modern expectations of FAIR reusability by data scientists and engineers, is the main motivation for the development of the open-source JavaScript software development kit (SDK) reported here. RESULTS The PLCO GWAS Explorer resource relies on a public stateless HTTP application programming interface (API) deployed as the sole backend service for both the landing page's web application and third-party analytical workflows. The core PLCOjs SDK is mapped to each of the API methods, and also to each of the reference graphic visualizations in the GWAS Explorer. A few additional visualization methods extend it. As is the norm with web SDKs, no download or installation is needed and modularization supports targeted code injection for web applications, reactive notebooks (Observable) and node-based web services. AVAILABILITY AND IMPLEMENTATION code at https://github.com/episphere/plco; project page at https://episphere.github.io/plco.
Collapse
Affiliation(s)
- Eric Ruan
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY 11794, USA
| | - Erika Nemeth
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY 11794, USA
| | - Richard Moffitt
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY 11794, USA
| | - Lorena Sandoval
- Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute, Rockville, MD 20850, USA
| | - Mitchell J Machiela
- Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute, Rockville, MD 20850, USA
| | - Neal D Freedman
- Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute, Rockville, MD 20850, USA
| | - Wen-Yi Huang
- Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute, Rockville, MD 20850, USA
| | - Wendy Wong
- Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute, Rockville, MD 20850, USA
| | - Kai-Ling Chen
- Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD 20850, USA
| | - Brian Park
- Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD 20850, USA
| | - Kevin Jiang
- Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD 20850, USA
| | - Belynda Hicks
- Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute, Rockville, MD 20850, USA
| | - Jia Liu
- Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute, Rockville, MD 20850, USA
| | - Daniel Russ
- Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute, Rockville, MD 20850, USA
| | - Lori Minasian
- Division of Cancer Prevention, National Cancer Institute, Rockville, MD 20850, USA
| | - Paul Pinsky
- Division of Cancer Prevention, National Cancer Institute, Rockville, MD 20850, USA
| | - Stephen J Chanock
- Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute, Rockville, MD 20850, USA
| | - Montserrat Garcia-Closas
- Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute, Rockville, MD 20850, USA
| | | |
Collapse
|
15
|
Pettit RW, Amos CI. Linkage Disequilibrium Score Statistic Regression for Identifying Novel Trait Associations. CURR EPIDEMIOL REP 2022. [DOI: 10.1007/s40471-022-00297-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
16
|
Schatz MC, Philippakis AA, Afgan E, Banks E, Carey VJ, Carroll RJ, Culotti A, Ellrott K, Goecks J, Grossman RL, Hall IM, Hansen KD, Lawson J, Leek JT, Luria AO, Mosher S, Morgan M, Nekrutenko A, O’Connor BD, Osborn K, Paten B, Patterson C, Tan FJ, Taylor CO, Vessio J, Waldron L, Wang T, Wuichet K. Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space. CELL GENOMICS 2022; 2:100085. [PMID: 35199087 PMCID: PMC8863334 DOI: 10.1016/j.xgen.2021.100085] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org) was developed to address a widespread community need for a unified computing environment for genomics data storage, management, and analysis. In this perspective, we present AnVIL, describe its ecosystem and interoperability with other platforms, and highlight how this platform and associated initiatives contribute to improved genomic data sharing efforts. The AnVIL is a federated cloud platform designed to manage and store genomics and related data, enable population-scale analysis, and facilitate collaboration through the sharing of data, code, and analysis results. By inverting the traditional model of data sharing, the AnVIL eliminates the need for data movement while also adding security measures for active threat detection and monitoring and provides scalable, shared computing resources for any researcher. We describe the core data management and analysis components of the AnVIL, which currently consists of Terra, Gen3, Galaxy, RStudio/Bioconductor, Dockstore, and Jupyter, and describe several flagship genomics datasets available within the AnVIL. We continue to extend and innovate the AnVIL ecosystem by implementing new capabilities, including mechanisms for interoperability and responsible data sharing, while streamlining access management. The AnVIL opens many new opportunities for analysis, collaboration, and data sharing that are needed to drive research and to make discoveries through the joint analysis of hundreds of thousands to millions of genomes along with associated clinical and molecular data types.
Collapse
Affiliation(s)
- Michael C. Schatz
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA,Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA,Corresponding author
| | | | - Enis Afgan
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Eric Banks
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Robert J. Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Alessandro Culotti
- Broad Institute of MIT and Harvard, Cambridge, MA, USA,Center for Translational Data Science, University of Chicago, Chicago, IL, USA
| | - Kyle Ellrott
- Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - Jeremy Goecks
- Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - Robert L. Grossman
- Center for Translational Data Science, University of Chicago, Chicago, IL, USA
| | - Ira M. Hall
- Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Kasper D. Hansen
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
| | | | - Jeffrey T. Leek
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
| | | | - Stephen Mosher
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Martin Morgan
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, State College, PA, USA
| | | | - Kevin Osborn
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | | | - Frederick J. Tan
- Department of Embryology, Carnegie Institution, Baltimore, MD, USA
| | - Casey Overby Taylor
- Departments of Medicine and Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Jennifer Vessio
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Levi Waldron
- Department of Epidemiology and Biostatistics, City University of New York Graduate School of Public Health and Health Policy, New York, NY, USA
| | - Ting Wang
- Department of Genetics, Washington University of St. Louis, St. Louis, MO, USA
| | - Kristin Wuichet
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | | |
Collapse
|
17
|
Colona VL, Biancolella M, Novelli A, Novelli G. Will GWAS eventually allow the identification of genomic biomarkers for COVID-19 severity and mortality? J Clin Invest 2021; 131:e155011. [PMID: 34673571 PMCID: PMC8631589 DOI: 10.1172/jci155011] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
GWAS involve testing genetic variants across the genomes of many individuals to identify genotype-phenotype associations. GWAS have enabled the identification of numerous genomic biomarkers in various complex human diseases, including infectious ones. However, few of these studies are relevant for clinical practice or at the bedside. In this issue of the JCI, Nakanishi et al. characterized the clinical implications of a major genetic risk factor for COVID-19 severity and its age-dependent effect, using individual-level data in a large international multicenter consortium. This study indicates that a common COVID-19 genetic risk factor (rs10490770) associates with increased risks of morbidity and mortality, suggesting potential implications for future clinical risk management. How can the genomic biomarkers identified by GWAS be associated with the clinical outcomes of an infectious disease? In this Commentary, we evaluate the advantages and limitations of this approach.
Collapse
Affiliation(s)
| | | | - Antonio Novelli
- Laboratory of Medical Genetics, IRCCS Bambino Gesù Children’s Hospital, Rome, Italy
| | - Giuseppe Novelli
- Department of Biomedicine and Prevention and
- IRCCS Neuromed, Pozzilli (IS), Italy
- Department of Pharmacology, School of Medicine, University of Nevada, Reno, Nevada, USA
| |
Collapse
|