1
|
Carter RC, Yang Z, Akkaya-Hocagil T, Jacobson SW, Jacobson JL, Dodge NC, Hoyme HE, Zeisel SH, Meintjes EM, Kizil C, Tosto G. Genetic admixture predictors of fetal alcohol spectrum disorders (FASD) in a South African population. Gene 2024; 931:148854. [PMID: 39147113 DOI: 10.1016/j.gene.2024.148854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Revised: 07/17/2024] [Accepted: 08/12/2024] [Indexed: 08/17/2024]
Abstract
Ancestrally admixed populations are underrepresented in genetic studies of complex diseases, which are still dominated by European-descent populations. This is relevant not only from a representation standpoint but also because of admixed populations' unique features, including being enriched for rare variants, for which effect sizes are disproportionately larger than common polymorphisms. Furthermore, results from these populations may be generalizable to other populations. The South African Cape Coloured (SACC) population is genetically admixed and has one of the highest prevalences of fetal alcohol spectrum disorders (FASD) worldwide. We profiled its admixture and examined associations between ancestry profiles and FASD outcomes using two longitudinal birth cohorts (N=308 mothers, 280 children) designed to examine effects of prenatal alcohol exposure on development. Participants were genotyped via MEGAex array to capture common and rare variants. Rare variants were overrepresented in our SACC cohorts, with numerous polymorphisms being monomorphic in other reference populations (e.g., ∼30,000 and ∼ 221,000 variants in gnomAD European and Asian populations, respectively). The cohorts showed global African (51 %; Bantu and San); European (26 %; Northern/Western); South Asian (18 %); and East Asian (5 %; largely Southern regions) ancestries. The cohorts exhibited high rates of homozygosity (6 %), with regions of homozygosity harboring more deleterious variants when lying within African local-ancestry genomic segments. Both maternal and child ancestry profiles were associated with higher FASD risk, and maternal and child ancestry-by-prenatal alcohol exposure interaction effects were seen on child cognition. Our findings indicate that the SACC population may be a valuable asset to identify novel disease-associated genetic loci for FASD and other diseases.
Collapse
Affiliation(s)
- R Colin Carter
- Departments of Emergency Medicine and Pediatrics and the Institute of Human Nutrition, Columbia University Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, NY, USA; Department of Human Biology, University of Cape Town Faculty of Health Sciences, Cape Town, South Africa.
| | - Zikun Yang
- Department of Neurology and the Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, NY, USA
| | - Tugba Akkaya-Hocagil
- Department of Biostatistics, School of Medicine, Ankara University, Ankara, Turkey; Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, Canada
| | - Sandra W Jacobson
- Department of Human Biology, University of Cape Town Faculty of Health Sciences, Cape Town, South Africa; Department of Psychiatry and Behavioral Neurosciences, Wayne State University School of Medicine, Detroit, MI, USA; Department of Psychiatry and Mental Health, University of Cape Town Faculty of Health Sciences, Cape Town, South Africa
| | - Joseph L Jacobson
- Department of Human Biology, University of Cape Town Faculty of Health Sciences, Cape Town, South Africa; Department of Psychiatry and Behavioral Neurosciences, Wayne State University School of Medicine, Detroit, MI, USA; Department of Psychiatry and Mental Health, University of Cape Town Faculty of Health Sciences, Cape Town, South Africa
| | - Neil C Dodge
- Department of Psychiatry and Behavioral Neurosciences, Wayne State University School of Medicine, Detroit, MI, USA
| | - H Eugene Hoyme
- Sanford Children's Genomic Medicine Consortium, Sanford Health, and the University of South Dakota Sanford School of Medicine, Sioux Falls, SD, USA
| | - Steven H Zeisel
- University of North Carolina Nutrition Research Institute, Kannapolis, NC, USA
| | - Ernesta M Meintjes
- Department of Human Biology, University of Cape Town Faculty of Health Sciences, Cape Town, South Africa
| | - Caghan Kizil
- Department of Neurology and the Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, NY, USA
| | - Giuseppe Tosto
- Department of Neurology and the Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, NY, USA.
| |
Collapse
|
2
|
Murali H, Wang P, Liao EC, Wang K. Genetic variant classification by predicted protein structure: A case study on IRF6. Comput Struct Biotechnol J 2024; 23:892-904. [PMID: 38370976 PMCID: PMC10869248 DOI: 10.1016/j.csbj.2024.01.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 01/24/2024] [Accepted: 01/25/2024] [Indexed: 02/20/2024] Open
Abstract
Next-generation genome sequencing has revolutionized genetic testing, identifying numerous rare disease-associated gene variants. However, to impute pathogenicity, computational approaches remain inadequate and functional testing of gene variant is required to provide the highest level of evidence. The emergence of AlphaFold2 has transformed the field of protein structure determination, and here we outline a strategy that leverages predicted protein structure to enhance genetic variant classification. We used the gene IRF6 as a case study due to its clinical relevance, its critical role in cleft lip/palate malformation, and the availability of experimental data on the pathogenicity of IRF6 gene variants through phenotype rescue experiments in irf6-/- zebrafish. We compared results from over 30 pathogenicity prediction tools on 37 IRF6 missense variants. IRF6 lacks an experimentally derived structure, so we used predicted structures to explore associations between mutational clustering and pathogenicity. We found that among these variants, 19 of 37 were unanimously predicted as deleterious by computational tools. Comparing in silico predictions with experimental findings, 12 variants predicted as pathogenic were experimentally determined as benign. Even with the recently published AlphaMissense model, 15/18 (83%) of the predicted pathogenic variants were experimentally determined as benign. In comparison, mapping variants to the protein revealed deleterious mutation clusters around the protein binding domain, whereas N-terminal variants tend to be benign, suggesting the importance of structural information in determining pathogenicity of mutations in this gene. In conclusion, incorporating gene-specific structural features of known pathogenic/benign mutations may provide meaningful insights into pathogenicity predictions in a gene-specific manner and facilitate the interpretation of variant pathogenicity.
Collapse
Affiliation(s)
- Hemma Murali
- Graduate Program in Biochemistry and Molecular Biophysics, University of Pennsylvania, Philadelphia, PA 19104, United States
- Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, United States
| | - Peng Wang
- Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, United States
- Master of Biotechnology Program, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Eric C. Liao
- Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
- Center for Craniofacial Innovation, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, United States
| | - Kai Wang
- Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, United States
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
| |
Collapse
|
3
|
Cengnata A, Deng L, Yap WS, Lim LHR, Leong CO, Xu S, Hoh BP. A genotype imputation reference panel specific for native Southeast Asian populations. NPJ Genom Med 2024; 9:47. [PMID: 39368969 PMCID: PMC11455956 DOI: 10.1038/s41525-024-00435-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 09/24/2024] [Indexed: 10/07/2024] Open
Abstract
We report the development of a "Southeast Asian Specific (SEA-specific) Reference Panel" through a "Cross-panel Imputation" approach, consisting of 2550 samples derived from the GA100K, SG10K, and the Peninsular Malaysia Orang Asli (OA) datasets, covering 113,851,450 variants. The SEA-specific panel produced more high confidence variants than 1000 Genomes Project (1KGP) when imputing the OA (8.9 million SEA-specific vs 8.1 million 1KGP) and the Singapore Genome Variation Project (SGVP) (12.5 million SEA-specific vs 11.8 million 1KGP) genotyping datasets. Further, the SEA-specific panel imputed SNPs with better estimated quality scores (INFO, DR2 and R2) on the OA genotyping dataset when comparing with TOPMED and the Human Genome Diversity Project, but performed similarly on SGVP dataset. This panel also exhibited higher recall and non-reference disconcordance rates, indicating the influence of ancestry closeness of the reference panel. However, we note that the imputation accuracy may be compromised by the size of the reference panel.
Collapse
Affiliation(s)
- Alvin Cengnata
- Faculty of Applied Sciences, UCSI University, Kuala Lumpur, Malaysia
| | - Lian Deng
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China
- Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai, China
| | - Wai-Sum Yap
- Faculty of Applied Sciences, UCSI University, Kuala Lumpur, Malaysia
| | | | - Chee-Onn Leong
- Advanced Genomics Technology Center, AGTC Genomics Inc., Kuala Lumpur, Malaysia
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China
- Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai, China
- Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Boon-Peng Hoh
- Division of Applied Biomedical Sciences and Biotechnology, School of Health Sciences, IMU University, Kuala Lumpur, Malaysia.
| |
Collapse
|
4
|
Ha EK, Shriner D, Callier SL, Riley L, Adeyemo AA, Rotimi CN, Bentley AR. Native Hawaiian and Pacific Islander populations in genomic research. NPJ Genom Med 2024; 9:45. [PMID: 39349931 PMCID: PMC11442686 DOI: 10.1038/s41525-024-00428-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 09/06/2024] [Indexed: 10/04/2024] Open
Abstract
The role of genomic research and medicine in improving health continues to grow significantly, highlighting the need for increased equitable inclusion of diverse populations in genomics. Native Hawaiian and Pacific Islander (NHPI) communities are often missing from these efforts to ensure that the benefits of genomics are accessible to all individuals. In this article, we analyze the qualities of NHPI populations relevant to their inclusion in genomic research and investigate their current representation using data from the genome-wide association studies (GWAS) catalog. A discussion of the barriers NHPI experience regarding participating in research and recommendations to improve NHPI representation in genomic research are also included.
Collapse
Affiliation(s)
- Edra K Ha
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- University of Hawai'i at Mānoa, Honolulu, HI, USA
- University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Daniel Shriner
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shawneequa L Callier
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Department of Clinical Research and Leadership, The George Washington University School of Medicine and Health Sciences, Washington, DC, USA
| | | | - Adebowale A Adeyemo
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Charles N Rotimi
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Amy R Bentley
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
5
|
Çubukcu H, Kılınç GM. Evaluation of genotype imputation using Glimpse tools on low coverage ancient DNA. Mamm Genome 2024; 35:461-473. [PMID: 39028337 DOI: 10.1007/s00335-024-10053-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 07/12/2024] [Indexed: 07/20/2024]
Abstract
Ancient DNA provides a unique frame for directly studying human population genetics in time and space. Still, since most of the ancient genomic data is low coverage, analysis is confronted with a low number of SNPs, genotype uncertainties, and reference-bias. Here, we for the first time benchmark the two distinct versions of Glimpse tools on 120 ancient human genomes from Eurasia including those largely from previously under-evaluated regions and compare the performance of genotype imputation with de facto analysis approaches for low coverage genomic data analysis. We further investigate the impact of two distinct reference panels on imputation accuracy for low coverage genomic data. We compute accuracy statistics and perform PCA and f4-statistics to explore the behaviour of genotype imputation on low coverage data regarding (i)two versions of Glimpse, (ii)two reference panels, (iii)four post-imputation filters and coverages, as well as (iv)data type and geographical origin of the samples on the analyses. Our results reveal that even for 0.1X coverage ancient human genomes, genotype imputation using Glimpse-v2 is suitable. Additionally, using the 1000 Genomes merged with Human Genome Diversity Panel improves the accuracy of imputation for the rare variants with low MAF, which might be important not only for ancient genomics but also for modern human genomic studies based on low coverage data and for haplotype-based analysis. Most importantly, we reveal that genotype imputation of low coverage ancient human genomes reduces the genetic affinity of the samples towards human reference genome. Through solving one of the most challenging biases in data analysis, so-called reference bias, genotype imputation using Glimpse v2 is promising for low coverage ancient human genomic data analysis and for rare-variant-based and haplotype-based analysis.
Collapse
Affiliation(s)
- Hande Çubukcu
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, 06100, Ankara, Turkey
| | - Gülşah Merve Kılınç
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, 06100, Ankara, Turkey.
| |
Collapse
|
6
|
Zhen X, Betti M, Kars ME, Patterson A, Medina-Torres EA, Scheffler Mendoza SC, Herrera Sánchez DA, Lopez-Herrera G, Svyryd Y, Mutchinick O, Gamazon E, Rathmell J, Itan Y, Markle J, O'Farrill Romanillos P, Lugo-Reyes SO, Martinez-Barricarte R. Molecular and clinical characterization of a founder mutation causing G6PC3 deficiency. RESEARCH SQUARE 2024:rs.3.rs-4595246. [PMID: 39041036 PMCID: PMC11261954 DOI: 10.21203/rs.3.rs-4595246/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
G6PC3 deficiency is a monogenic immunometabolic disorder that causes syndromic congenital neutropenia. Patients display heterogeneous extra-hematological manifestations, contributing to delayed diagnosis. Here, we investigated the origin and functional consequence of the G6PC3 c.210delC variant found in patients of Mexican origin. Based on the shared haplotypes amongst carriers of the c.210delC mutation, we estimated that this variant originated from a founder effect in a common ancestor. Furthermore, by ancestry analysis, we concluded that it originated in the indigenous Mexican population. At the protein level, we showed that this frameshift mutation leads to an aberrant protein expression in overexpression and patient-derived cells. G6PC3 pathology is driven by the intracellular accumulation of the metabolite 1,5-anhydroglucitol-6-phosphate (1,5-AG6P) that inhibits glycolysis. We characterized how the variant c.210delC impacts glycolysis by performing extracellular flux assays on patient-derived cells. When treated with 1,5-anhydroglucitol (1,5-AG), the precursor to 1,5-AG6P, patient-derived cells exhibited markedly reduced engagement of glycolysis. Finally, we compared the clinical presentation of patients with the mutation c.210delC and all other G6PC3 deficient patients reported in the literature to date, and we found that c.210delC carriers display all prominent clinical features observed in prior G6PC3 deficient patients. In conclusion, G6PC3 c.210delC is a loss-of-function mutation that arose from a founder effect in the indigenous Mexican population. These findings may facilitate the diagnosis of additional patients in this geographical area. Moreover, the in vitro 1,5-AG-dependent functional assay used in our study could be employed to assess the pathogenicity of additional G6PC3 variants.
Collapse
Affiliation(s)
- Xin Zhen
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center
| | - Michael Betti
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center
| | - Meltem Ece Kars
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai
| | - Andrew Patterson
- Division of Molecular Pathogenesis, Department of Medicine, Vanderbilt University Medical Center
| | | | | | | | | | - Yevgeniya Svyryd
- Department of Genetics, Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán
| | - Osvaldo Mutchinick
- Department of Genetics, Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán
| | - Eric Gamazon
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center
| | - Jeffrey Rathmell
- Division of Molecular Pathogenesis, Department of Medicine, Vanderbilt University Medical Center
| | - Yuval Itan
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai
| | - Janet Markle
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center
| | | | | | | |
Collapse
|
7
|
Betschart RO, Riccio C, Aguilera-Garcia D, Blankenberg S, Guo L, Moch H, Seidl D, Solleder H, Thalén F, Thiéry A, Twerenbold R, Zeller T, Zoche M, Ziegler A. Biostatistical Aspects of Whole Genome Sequencing Studies: Preprocessing and Quality Control. Biom J 2024; 66:e202300278. [PMID: 38988195 DOI: 10.1002/bimj.202300278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 03/21/2024] [Accepted: 05/14/2024] [Indexed: 07/12/2024]
Abstract
Rapid advances in high-throughput DNA sequencing technologies have enabled large-scale whole genome sequencing (WGS) studies. Before performing association analysis between phenotypes and genotypes, preprocessing and quality control (QC) of the raw sequence data need to be performed. Because many biostatisticians have not been working with WGS data so far, we first sketch Illumina's short-read sequencing technology. Second, we explain the general preprocessing pipeline for WGS studies. Third, we provide an overview of important QC metrics, which are applied to WGS data: on the raw data, after mapping and alignment, after variant calling, and after multisample variant calling. Fourth, we illustrate the QC with the data from the GENEtic SequencIng Study Hamburg-Davos (GENESIS-HD), a study involving more than 9000 human whole genomes. All samples were sequenced on an Illumina NovaSeq 6000 with an average coverage of 35× using a PCR-free protocol. For QC, one genome in a bottle (GIAB) trio was sequenced in four replicates, and one GIAB sample was successfully sequenced 70 times in different runs. Fifth, we provide empirical data on the compression of raw data using the DRAGEN original read archive (ORA). The most important quality metrics in the application were genetic similarity, sample cross-contamination, deviations from the expected Het/Hom ratio, relatedness, and coverage. The compression ratio of the raw files using DRAGEN ORA was 5.6:1, and compression time was linear by genome coverage. In summary, the preprocessing, joint calling, and QC of large WGS studies are feasible within a reasonable time, and efficient QC procedures are readily available.
Collapse
Affiliation(s)
| | | | - Domingo Aguilera-Garcia
- Institute of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
| | - Stefan Blankenberg
- Cardio-CARE, Medizincampus Davos, Davos, Switzerland
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Linlin Guo
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Holger Moch
- Institute of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
| | - Dagmar Seidl
- Institute of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
| | - Hugo Solleder
- Cardio-CARE, Medizincampus Davos, Davos, Switzerland
| | - Felix Thalén
- Cardio-CARE, Medizincampus Davos, Davos, Switzerland
| | | | - Raphael Twerenbold
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- German Center for Cardiovascular Research (DZHK), partner site Hamburg/Kiel/Lübeck, Hamburg, Germany
| | - Tanja Zeller
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- German Center for Cardiovascular Research (DZHK), partner site Hamburg/Kiel/Lübeck, Hamburg, Germany
| | - Martin Zoche
- Institute of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
| | - Andreas Ziegler
- Cardio-CARE, Medizincampus Davos, Davos, Switzerland
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, South Africa
| |
Collapse
|
8
|
Misek SA, Fultineer A, Kalfon J, Noorbakhsh J, Boyle I, Roy P, Dempster J, Petronio L, Huang K, Saadat A, Green T, Brown A, Doench JG, Root DE, McFarland JM, Beroukhim R, Boehm JS. Germline variation contributes to false negatives in CRISPR-based experiments with varying burden across ancestries. Nat Commun 2024; 15:4892. [PMID: 38849329 PMCID: PMC11161638 DOI: 10.1038/s41467-024-48957-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 05/20/2024] [Indexed: 06/09/2024] Open
Abstract
Reducing disparities is vital for equitable access to precision treatments in cancer. Socioenvironmental factors are a major driver of disparities, but differences in genetic variation likely also contribute. The impact of genetic ancestry on prioritization of cancer targets in drug discovery pipelines has not been systematically explored due to the absence of pre-clinical data at the appropriate scale. Here, we analyze data from 611 genome-scale CRISPR/Cas9 viability experiments in human cell line models to identify ancestry-associated genetic dependencies essential for cell survival. Surprisingly, we find that most putative associations between ancestry and dependency arise from artifacts related to germline variants. Our analysis suggests that for 1.2-2.5% of guides, germline variants in sgRNA targeting sequences reduce cutting by the CRISPR/Cas9 nuclease, disproportionately affecting cell models derived from individuals of recent African descent. We propose three approaches to mitigate this experimental bias, enabling the scientific community to address these disparities.
Collapse
Affiliation(s)
- Sean A Misek
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Departments of Cancer Biology and Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Koch Institute, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA
| | - Aaron Fultineer
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Jeremie Kalfon
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | | | - Isabella Boyle
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Priyanka Roy
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Joshua Dempster
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Lia Petronio
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Katherine Huang
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Alham Saadat
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Thomas Green
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Adam Brown
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - John G Doench
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - David E Root
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | | | - Rameen Beroukhim
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Departments of Cancer Biology and Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA.
| | - Jesse S Boehm
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Koch Institute, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA.
| |
Collapse
|
9
|
Horesh ME, Martin-Fernandez M, Gruber C, Buta S, Le Voyer T, Puzenat E, Lesmana H, Wu Y, Richardson A, Stein D, Hodeib S, Youssef M, Kurowski JA, Feuille E, Pedroza LA, Fuleihan RL, Haseley A, Hovnanian A, Quartier P, Rosain J, Davis G, Mullan D, Stewart O, Patel R, Lee AE, Rubinstein R, Ewald L, Maheshwari N, Rahming V, Chinn IK, Lupski JR, Orange JS, Sancho-Shimizu V, Casanova JL, Abul-Husn NS, Itan Y, Milner JD, Bustamante J, Bogunovic D. Individuals with JAK1 variants are affected by syndromic features encompassing autoimmunity, atopy, colitis, and dermatitis. J Exp Med 2024; 221:e20232387. [PMID: 38563820 PMCID: PMC10986756 DOI: 10.1084/jem.20232387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 02/19/2024] [Accepted: 02/23/2024] [Indexed: 04/04/2024] Open
Abstract
Inborn errors of immunity lead to autoimmunity, inflammation, allergy, infection, and/or malignancy. Disease-causing JAK1 gain-of-function (GoF) mutations are considered exceedingly rare and have been identified in only four families. Here, we use forward and reverse genetics to identify 59 individuals harboring one of four heterozygous JAK1 variants. In vitro and ex vivo analysis of these variants revealed hyperactive baseline and cytokine-induced STAT phosphorylation and interferon-stimulated gene (ISG) levels compared with wild-type JAK1. A systematic review of electronic health records from the BioME Biobank revealed increased likelihood of clinical presentation with autoimmunity, atopy, colitis, and/or dermatitis in JAK1 variant-positive individuals. Finally, treatment of one affected patient with severe atopic dermatitis using the JAK1/JAK2-selective inhibitor, baricitinib, resulted in clinically significant improvement. These findings suggest that individually rare JAK1 GoF variants may underlie an emerging syndrome with more common presentations of autoimmune and inflammatory disease (JAACD syndrome). More broadly, individuals who present with such conditions may benefit from genetic testing for the presence of JAK1 GoF variants.
Collapse
Affiliation(s)
- Michael E. Horesh
- Center for Inborn Errors of Immunity, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Marta Martin-Fernandez
- Center for Inborn Errors of Immunity, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Conor Gruber
- Center for Inborn Errors of Immunity, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sofija Buta
- Center for Inborn Errors of Immunity, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Tom Le Voyer
- Laboratory of Human Genetics of Infectious Diseases, INSERM UMR1163, Paris, France
- Imagine Institute, University of Paris, Paris, France
- Clinical Immunology Department, Assistance Publique Hôpitaux de Paris (AP-HP), Saint-Louis Hospital, Paris, France
| | - Eve Puzenat
- Department of Dermatology and INSERM 1098, University of Bourgogne-Franche Comté, Besançon, France
| | - Harry Lesmana
- Genomic Medicine Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
- Department of Pediatric Hematology, Oncology and Bone Marrow Transplantation, Cleveland Clinic, Cleveland, OH, USA
| | - Yiming Wu
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ashley Richardson
- Center for Inborn Errors of Immunity, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - David Stein
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Stephanie Hodeib
- Department of Paediatric Infectious Diseases and Virology, Imperial College London, London, UK
- Imperial College London, Centre for Paediatrics and Child Health, London, UK
| | - Mariam Youssef
- Department of Pediatrics, Division of Pediatric Allergy, Immunology and Rheumatology, Columbia University, New York, NY, USA
| | - Jacob A. Kurowski
- Department of Pediatric Gastroenterology, Hepatology, and Nutrition, Cleveland Clinic, Cleveland, OH, USA
| | | | - Luis A. Pedroza
- Department of Pediatrics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, USA
| | - Ramsay L. Fuleihan
- Department of Pediatrics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, USA
| | - Alexandria Haseley
- Center for Personalized Genetic Healthcare, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Alain Hovnanian
- Imagine Institute, University of Paris, Paris, France
- Laboratory of Genetic Skin Diseases, INSERM U1163, Paris, France
| | - Pierre Quartier
- Université Paris-Cité, Paris, France
- Paediatric Hematology-Immunology and Rheumatology Unit, Hopital Necker-Enfants Malades, Assistance Publique-Hopitaux de Paris, Paris, Fance
| | - Jérémie Rosain
- Laboratory of Human Genetics of Infectious Diseases, INSERM UMR1163, Paris, France
- Imagine Institute, University of Paris, Paris, France
- Center for the Study of Primary Immunodeficiencies, Necker Hospital for Sick Children, Paris, France
| | - Georgina Davis
- Department of Immunology, Derriford Hospital, Plymouth, UK
| | - Daniel Mullan
- Department of Immunology, Derriford Hospital, Plymouth, UK
| | - O’Jay Stewart
- Center for Inborn Errors of Immunity, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Roosheel Patel
- Center for Inborn Errors of Immunity, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Angelica E. Lee
- Center for Inborn Errors of Immunity, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Rebecca Rubinstein
- Center for Inborn Errors of Immunity, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Leyla Ewald
- Center for Inborn Errors of Immunity, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Nikhil Maheshwari
- Center for Inborn Errors of Immunity, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | - Ivan K. Chinn
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
- Division of Immunology, Allergy, and Retrovirology, Texas Children’s Hospital, Houston, TX, USA
| | - James R. Lupski
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Jordan S. Orange
- Department of Pediatrics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, USA
| | - Vanessa Sancho-Shimizu
- Department of Paediatric Infectious Diseases and Virology, Imperial College London, London, UK
- Imperial College London, Centre for Paediatrics and Child Health, London, UK
| | - Jean-Laurent Casanova
- Laboratory of Human Genetics of Infectious Diseases, INSERM UMR1163, Paris, France
- Imagine Institute, University of Paris, Paris, France
- St. Giles Laboratory of Human Genetics of Infectious Diseases, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, New Yor, NY, USA
- Department of Pediatrics, Necker Hospital for Sick Children, Paris, France
| | - Noura S. Abul-Husn
- Department of Medicine, Division of Genomic Medicine, Icahn School of Medicine at Mount Sinai, Institute for Genomic Health, New York, NY, USA
| | - Yuval Itan
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Joshua D. Milner
- Department of Pediatrics, Division of Pediatric Allergy, Immunology and Rheumatology, Columbia University, New York, NY, USA
| | - Jacinta Bustamante
- Laboratory of Human Genetics of Infectious Diseases, INSERM UMR1163, Paris, France
- Imagine Institute, University of Paris, Paris, France
- Center for the Study of Primary Immunodeficiencies, Necker Hospital for Sick Children, Paris, France
- St. Giles Laboratory of Human Genetics of Infectious Diseases, The Rockefeller University, New York, NY, USA
| | - Dusan Bogunovic
- Center for Inborn Errors of Immunity, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Pediatrics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, USA
- Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Pediatrics, Hiroshima University, Hiroshima, Japan
| |
Collapse
|
10
|
Lambert SA, Wingfield B, Gibson JT, Gil L, Ramachandran S, Yvon F, Saverimuttu S, Tinsley E, Lewis E, Ritchie SC, Wu J, Canovas R, McMahon A, Harris LW, Parkinson H, Inouye M. The Polygenic Score Catalog: new functionality and tools to enable FAIR research. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.29.24307783. [PMID: 38853961 PMCID: PMC11160819 DOI: 10.1101/2024.05.29.24307783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Polygenic scores (PGS) have transformed human genetic research and have multiple potential clinical applications, including risk stratification for disease prevention and prediction of treatment response. Here, we present a series of recent enhancements to the PGS Catalog (www.PGSCatalog.org), the largest findable, accessible, interoperable, and reusable (FAIR) repository of PGS. These include expansions in data content and ancestral diversity as well as the addition of new features. We further present the PGS Catalog Calculator (pgsc_calc, https://github.com/PGScatalog/pgsc_calc), an open-source, scalable and portable pipeline to reproducibly calculate PGS that securely democratizes equitable PGS applications by implementing genetic ancestry estimation and score normalization using reference data. With the PGS Catalog & calculator users can now quantify an individual's genetic predisposition for hundreds of common diseases and clinically relevant traits. Taken together, these updates and tools facilitate the next generation of PGS, thus lowering barriers to the clinical studies necessary to identify where PGS may be integrated into clinical practice.
Collapse
Affiliation(s)
- Samuel A. Lambert
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Benjamin Wingfield
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Joel T. Gibson
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
| | - Laurent Gil
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Hinxton, UK
| | - Santhi Ramachandran
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Florent Yvon
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
| | - Shirin Saverimuttu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Emily Tinsley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Elizabeth Lewis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Scott C. Ritchie
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK
| | - Jingqin Wu
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Rodrigo Canovas
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Aoife McMahon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Laura W. Harris
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Helen Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| |
Collapse
|
11
|
Zhen X, Betti MJ, Kars ME, Patterson A, Medina-Torres EA, Scheffler Mendoza SC, Herrera Sánchez DA, Lopez-Herrera G, Svyryd Y, Mutchinick OM, Gamazon E, Rathmell JC, Itan Y, Markle J, O’Farrill Romanillos P, Lugo-Reyes SO, Martinez-Barricarte R. Molecular and clinical characterization of a founder mutation causing G6PC3 deficiency. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.13.24307299. [PMID: 38798393 PMCID: PMC11118594 DOI: 10.1101/2024.05.13.24307299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Background G6PC3 deficiency is a rare genetic disorder that causes syndromic congenital neutropenia. It is driven by the intracellular accumulation of a metabolite named 1,5-anhydroglucitol-6-phosphate (1,5-AG6P) that inhibits glycolysis. Patients display heterogeneous extra-hematological manifestations, contributing to delayed diagnosis. Objective The G6PC3 c.210delC variant has been identified in patients of Mexican origin. We set out to study the origin and functional consequence of this mutation. Furthermore, we sought to characterize the clinical phenotypes caused by it. Methods Using whole-genome sequencing data, we conducted haplotype analysis to estimate the age of this allele and traced its ancestral origin. We examined how this mutation affected G6PC3 protein expression and performed extracellular flux assays on patient-derived cells to characterize how this mutation impacts glycolysis. Finally, we compared the clinical presentations of patients with the c.210delC mutation relative to other G6PC3 deficient patients published to date. Results Based on the length of haplotypes shared amongst ten carriers of the G6PC3 c.210delC mutation, we estimated that this variant originated in a common ancestor of indigenous American origin. The mutation causes a frameshift that introduces a premature stop codon, leading to a complete loss of G6PC3 protein expression. When treated with 1,5-anhydroglucitol (1,5-AG), the precursor to 1,5-AG6P, patient-derived cells exhibited markedly reduced engagement of glycolysis. Clinically, c.210delC carriers display all the clinical features of syndromic severe congenital neutropenia type 4 observed in prior reports of G6PC3 deficiency. Conclusion The G6PC3 c.210delC is a loss-of-function mutation that arose from a founder effect in the indigenous Mexican population. These findings may facilitate the diagnosis of additional patients in this geographical area. Moreover, the in vitro 1,5-AG-dependent functional assay used in our study could be employed to assess the pathogenicity of additional G6PC3 variants.
Collapse
Affiliation(s)
- Xin Zhen
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Division of Molecular Pathogenesis, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Michael J Betti
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Meltem Ece Kars
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Andrew Patterson
- Division of Molecular Pathogenesis, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Center for Immunobiology, Nashville, TN, USA
| | | | | | | | - Gabriela Lopez-Herrera
- Immune deficiencies laboratory, National Institute of Pediatrics, Health Secretariat, Mexico City, Mexico
| | - Yevgeniya Svyryd
- Department of Genetics, Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán, Mexico City, Mexico
| | - Osvaldo M. Mutchinick
- Department of Genetics, Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán, Mexico City, Mexico
| | - Eric Gamazon
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jeffrey C Rathmell
- Division of Molecular Pathogenesis, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Center for Immunobiology, Nashville, TN, USA
- Vanderbilt Institute for Infection, Immunology and Inflammation, Nashville, TN, USA
| | - Yuval Itan
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Janet Markle
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Division of Molecular Pathogenesis, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Center for Immunobiology, Nashville, TN, USA
- Vanderbilt Institute for Infection, Immunology and Inflammation, Nashville, TN, USA
| | | | - Saul Oswaldo Lugo-Reyes
- Immune deficiencies laboratory, National Institute of Pediatrics, Health Secretariat, Mexico City, Mexico
| | - Ruben Martinez-Barricarte
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Division of Molecular Pathogenesis, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Center for Immunobiology, Nashville, TN, USA
- Vanderbilt Institute for Infection, Immunology and Inflammation, Nashville, TN, USA
| |
Collapse
|
12
|
Gorman BR, Francis M, Nealon CL, Halladay CW, Duro N, Markianos K, Genovese G, Hysi PG, Choquet H, Afshari NA, Li YJ, Gaziano JM, Hung AM, Wu WC, Greenberg PB, Pyarajan S, Lass JH, Peachey NS, Iyengar SK. A multi-ancestry GWAS of Fuchs corneal dystrophy highlights the contributions of laminins, collagen, and endothelial cell regulation. Commun Biol 2024; 7:418. [PMID: 38582945 PMCID: PMC10998918 DOI: 10.1038/s42003-024-06046-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 03/13/2024] [Indexed: 04/08/2024] Open
Abstract
Fuchs endothelial corneal dystrophy (FECD) is a leading indication for corneal transplantation, but its molecular etiology remains poorly understood. We performed genome-wide association studies (GWAS) of FECD in the Million Veteran Program followed by multi-ancestry meta-analysis with the previous largest FECD GWAS, for a total of 3970 cases and 333,794 controls. We confirm the previous four loci, and identify eight novel loci: SSBP3, THSD7A, LAMB1, PIDD1, RORA, HS3ST3B1, LAMA5, and COL18A1. We further confirm the TCF4 locus in GWAS for admixed African and Hispanic/Latino ancestries and show an enrichment of European-ancestry haplotypes at TCF4 in FECD cases. Among the novel associations are low frequency missense variants in laminin genes LAMA5 and LAMB1 which, together with previously reported LAMC1, form laminin-511 (LM511). AlphaFold 2 protein modeling, validated through homology, suggests that mutations at LAMA5 and LAMB1 may destabilize LM511 by altering inter-domain interactions or extracellular matrix binding. Finally, phenome-wide association scans and colocalization analyses suggest that the TCF4 CTG18.1 trinucleotide repeat expansion leads to dysregulation of ion transport in the corneal endothelium and has pleiotropic effects on renal function.
Collapse
Affiliation(s)
- Bryan R Gorman
- Center for Data and Computational Sciences (C-DACS), VA Boston Healthcare System, Boston, MA, USA
- Booz Allen Hamilton, McLean, VA, USA
| | - Michael Francis
- Center for Data and Computational Sciences (C-DACS), VA Boston Healthcare System, Boston, MA, USA
- Booz Allen Hamilton, McLean, VA, USA
| | - Cari L Nealon
- Eye Clinic, VA Northeast Ohio Healthcare System, Cleveland, OH, USA
| | - Christopher W Halladay
- Center of Innovation in Long Term Services and Supports, Providence VA Medical Center, Providence, RI, USA
| | - Nalvi Duro
- Center for Data and Computational Sciences (C-DACS), VA Boston Healthcare System, Boston, MA, USA
- Booz Allen Hamilton, McLean, VA, USA
| | - Kyriacos Markianos
- Center for Data and Computational Sciences (C-DACS), VA Boston Healthcare System, Boston, MA, USA
| | - Giulio Genovese
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Pirro G Hysi
- Department of Ophthalmology, King's College London, London, UK
- Department of Twins Research and Genetic Epidemiology, King's College London, London, UK
- UCL Great Ormond Street Hospital Institute of Child Health, King's College London, London, UK
| | - Hélène Choquet
- Division of Research, Kaiser Permanente Northern California (KPNC), Oakland, CA, USA
| | - Natalie A Afshari
- Shiley Eye Institute, Viterbi Family Department of Ophthalmology, University of California, San Diego, La Jolla, CA, USA
| | - Yi-Ju Li
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
| | - J Michael Gaziano
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
- Division of Aging, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Adriana M Hung
- Division of Nephrology and Hypertension, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Center for Kidney Disease, Vanderbilt University Medical Center, Nashville, TN, USA
- VA Tennessee Valley Healthcare System, Nashville, TN, USA
| | - Wen-Chih Wu
- Cardiology Section, Medical Service, Providence VA Medical Center, Providence, RI, USA
| | - Paul B Greenberg
- Ophthalmology Section, Providence VA Medical Center, Providence, RI, USA
- Division of Ophthalmology, Alpert Medical School, Brown University, Providence, RI, USA
| | - Saiju Pyarajan
- Center for Data and Computational Sciences (C-DACS), VA Boston Healthcare System, Boston, MA, USA
| | - Jonathan H Lass
- Department of Ophthalmology and Visual Sciences, Case Western Reserve University, Cleveland, OH, USA
| | - Neal S Peachey
- Research Service, VA Northeast Ohio Healthcare System, Cleveland, OH, USA.
- Cole Eye Institute, Cleveland Clinic Foundation, Cleveland, OH, USA.
- Department of Ophthalmology, Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, OH, USA.
| | - Sudha K Iyengar
- Research Service, VA Northeast Ohio Healthcare System, Cleveland, OH, USA.
- Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA.
- Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA.
| |
Collapse
|
13
|
Carter RC, Yang Z, Akkaya-Hocagil T, Jacobson SW, Jacobson JL, Dodge NC, Hoyme HE, Zeisel SH, Meintjes EM, Kizil C, Tosto G. Genetic admixture predictors of fetal alcohol spectrum disorders (FASD) in the South African Cape Coloured population. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.31.24305130. [PMID: 38633769 PMCID: PMC11023663 DOI: 10.1101/2024.03.31.24305130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
Ancestrally admixed populations are underrepresented in genetic studies of complex diseases, which are still dominated by European-descent populations. This is relevant not only from a representation standpoint but also because of admixed populations' unique features, including being enriched for rare variants, for which effect sizes are disproportionately larger than common polymorphisms. Furthermore, results from these populations may be generalizable to other populations. The South African Cape Coloured (SACC) population is genetically admixed, with one of the highest prevalences of fetal alcohol spectrum disorders (FASD) worldwide. We profiled its admixture and examined associations between ancestry profiles and FASD outcomes using two longitudinal birth cohorts ( N =308 mothers, 280 children) designed to examine effects of prenatal alcohol exposure on development. Participants were genotyped via MEGA-ex array to capture common and rare variants. Rare variants were overrepresented in our SACC cohorts, with numerous polymorphisms being monomorphic in other reference populations (e.g., ∼30,000 and ∼221,000 variants in gnomAD European and Asian populations, respectively). The cohorts showed global African (51%; Bantu and San); European (26%; Northern/Western); South Asian (18%); and East Asian (5%; largely Southern regions) ancestries. The cohorts exhibited high rates of homozygosity (6%), with regions of homozygosity harboring more deleterious variants when lying within African local-ancestry genomic segments. Both maternal and child ancestry profiles were associated with FASD risk and altered severity of prenatal alcohol exposure-related cognitive deficits in the child. Our findings indicate that the SACC population may be a valuable asset to identify novel disease-associated genetic loci for FASD and other diseases.
Collapse
|
14
|
Brīvība M, Atava I, Pečulis R, Elbere I, Ansone L, Rozenberga M, Silamiķelis I, Kloviņš J. Evaluating the Efficacy of Type 2 Diabetes Polygenic Risk Scores in an Independent European Population. Int J Mol Sci 2024; 25:1151. [PMID: 38256224 PMCID: PMC10817091 DOI: 10.3390/ijms25021151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 01/04/2024] [Accepted: 01/09/2024] [Indexed: 01/24/2024] Open
Abstract
Numerous type 2 diabetes (T2D) polygenic risk scores (PGSs) have been developed to predict individuals' predisposition to the disease. An independent assessment and verification of the best-performing PGS are warranted to allow for a rapid application of developed models. To date, only 3% of T2D PGSs have been evaluated. In this study, we assessed all (n = 102) presently published T2D PGSs in an independent cohort of 3718 individuals, which has not been included in the construction or fine-tuning of any T2D PGS so far. We further chose the best-performing PGS, assessed its performance across major population principal component analysis (PCA) clusters, and compared it with newly developed population-specific T2D PGS. Our findings revealed that 88% of the published PGSs were significantly associated with T2D; however, their performance was lower than what had been previously reported. We found a positive association of PGS improvement over the years (p-value = 8.01 × 10-4 with PGS002771 currently showing the best discriminatory power (area under the receiver operating characteristic (AUROC) = 0.669) and PGS003443 exhibiting the strongest association PGS003443 (odds ratio (OR) = 1.899). Further investigation revealed no difference in PGS performance across major population PCA clusters and when compared with newly developed population-specific PGS. Our findings revealed a positive trend in T2D PGS performance, consistently identifying high-T2D-risk individuals in an independent European population.
Collapse
Affiliation(s)
- Monta Brīvība
- Latvian Biomedical Research and Study Centre, LV-1067 Riga, Latvia; (I.A.); (I.E.); (L.A.); (J.K.)
| | | | | | | | | | | | | | | |
Collapse
|
15
|
Poterba T, Vittal C, King D, Goldstein D, Goldstein JI, Schultz P, Karczewski KJ, Seed C, Neale BM. The Scalable Variant Call Representation: Enabling Genetic Analysis Beyond One Million Genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.09.574205. [PMID: 38260295 PMCID: PMC10802441 DOI: 10.1101/2024.01.09.574205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
The Variant Call Format (VCF) is widely used in genome sequencing but scales poorly. For instance, we estimate a 150,000 genome VCF would occupy 900 TiB, making it both costly and complicated to produce and analyze. The issue stems from VCF's requirement to densely represent both reference-genotypes and allele-indexed arrays. These requirements lead to unnecessary data duplication and, ultimately, very large files. To address these challenges, we introduce the Scalable Variant Call Representation (SVCR). This representation reduces file sizes by ensuring they scale linearly with samples. SVCR achieves this by adopting reference blocks from the Genomic Variant Call Format (GVCF) and employing local allele indices. SVCR is also lossless and mergeable, allowing for N+1 and N+K incremental joint-calling. We present two implementations of SVCR: SVCR-VCF, which encodes SVCR in VCF format, and VDS, which uses Hail's native format. Our experiments confirm the linear scalability of SVCR-VCF and VDS, in contrast to the super-linear growth seen with standard VCF files. We also discuss the VDS Combiner, a scalable, open-source tool for producing a VDS from GVCFs and unique features of VDS which enable rapid data analysis. SVCR, and VDS in particular, ensure the scientific community can generate, analyze, and disseminate genetics datasets with millions of samples.
Collapse
Affiliation(s)
- Timothy Poterba
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Christopher Vittal
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Daniel King
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Daniel Goldstein
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Jacqueline I. Goldstein
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Patrick Schultz
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Konrad J. Karczewski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Cotton Seed
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Benjamin M. Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| |
Collapse
|
16
|
Chen S, Francioli LC, Goodrich JK, Collins RL, Kanai M, Wang Q, Alföldi J, Watts NA, Vittal C, Gauthier LD, Poterba T, Wilson MW, Tarasova Y, Phu W, Grant R, Yohannes MT, Koenig Z, Farjoun Y, Banks E, Donnelly S, Gabriel S, Gupta N, Ferriera S, Tolonen C, Novod S, Bergelson L, Roazen D, Ruano-Rubio V, Covarrubias M, Llanwarne C, Petrillo N, Wade G, Jeandet T, Munshi R, Tibbetts K, O'Donnell-Luria A, Solomonson M, Seed C, Martin AR, Talkowski ME, Rehm HL, Daly MJ, Tiao G, Neale BM, MacArthur DG, Karczewski KJ. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 2024; 625:92-100. [PMID: 38057664 DOI: 10.1038/s41586-023-06045-0] [Citation(s) in RCA: 175] [Impact Index Per Article: 175.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 04/03/2023] [Indexed: 12/08/2023]
Abstract
The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.
Collapse
Affiliation(s)
- Siwei Chen
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
| | - Laurent C Francioli
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Julia K Goodrich
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ryan L Collins
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Division of Medical Sciences, Harvard Medical School, Boston, MA, USA
| | - Masahiro Kanai
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Qingbo Wang
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Jessica Alföldi
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Nicholas A Watts
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Christopher Vittal
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Laura D Gauthier
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Timothy Poterba
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Michael W Wilson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Yekaterina Tarasova
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - William Phu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Riley Grant
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mary T Yohannes
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Zan Koenig
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Yossi Farjoun
- Richards Lab, Lady Davis Institute, Montreal, Quebec, Canada
| | - Eric Banks
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Stacey Gabriel
- Broad Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Namrata Gupta
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Broad Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Steven Ferriera
- Broad Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Charlotte Tolonen
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Sam Novod
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Louis Bergelson
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - David Roazen
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Miguel Covarrubias
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Nikelle Petrillo
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gordon Wade
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Thibault Jeandet
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ruchi Munshi
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kathleen Tibbetts
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Matthew Solomonson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Cotton Seed
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Alicia R Martin
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Michael E Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Mark J Daly
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Institute for Molecular Medicine Finland (FIMM), Helsinki, Finland
| | - Grace Tiao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Daniel G MacArthur
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Centre for Population Genomics, Garvan Institute of Medical Research and UNSW Sydney, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| | - Konrad J Karczewski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
17
|
Koyama S, Wang Y, Paruchuri K, Uddin MM, Cho SMJ, Urbut SM, Haidermota S, Hornsby WE, Green RC, Daly MJ, Neale BM, Ellinor PT, Smoller JW, Lebo MS, Karlson EW, Martin AR, Natarajan P. Decoding Genetics, Ancestry, and Geospatial Context for Precision Health. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.10.24.23297096. [PMID: 37961173 PMCID: PMC10635180 DOI: 10.1101/2023.10.24.23297096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Mass General Brigham, an integrated healthcare system based in the Greater Boston area of Massachusetts, annually serves 1.5 million patients. We established the Mass General Brigham Biobank (MGBB), encompassing 142,238 participants, to unravel the intricate relationships among genomic profiles, environmental context, and disease manifestations within clinical practice. In this study, we highlight the impact of ancestral diversity in the MGBB by employing population genetics, geospatial assessment, and association analyses of rare and common genetic variants. The population structures captured by the genetics mirror the sequential immigration to the Greater Boston area throughout American history, highlighting communities tied to shared genetic and environmental factors. Our investigation underscores the potency of unbiased, large-scale analyses in a healthcare-affiliated biobank, elucidating the dynamic interplay across genetics, immigration, structural geospatial factors, and health outcomes in one of the earliest American sites of European colonization.
Collapse
Affiliation(s)
- Satoshi Koyama
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Ying Wang
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kaavya Paruchuri
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Md Mesbah Uddin
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - So Mi J. Cho
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Integrative Research Center for Cerebrovascular and Cardiovascular Diseases, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Sarah M. Urbut
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Sara Haidermota
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Whitney E. Hornsby
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Robert C. Green
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Medicine (Genetics), MassGeneralBrigham, Boston, MA, USA
- Broad Institute and Ariadne Labs, Boston, MA, USA
| | - Mark J. Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Molecular Medicine Finland (FIMM), Finland
- University of Helsinki, Helsinki, Finland
| | - Benjamin M. Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Patrick T. Ellinor
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Jordan W. Smoller
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - Matthew S. Lebo
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Mass General Brigham Personalized Medicine, Cambridge, MA, USA
- Department of Pathology, Brigham and Women’s Hospital, Boston, MA, USA
| | - Elizabeth W. Karlson
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Mass General Brigham Personalized Medicine, Cambridge, MA, USA
- Division of Rheumatology, Inflammation and Immunity, Department of Medicine, Brigham and Women’s Hospital., Boston, MA, USA
| | - Alicia R. Martin
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Pradeep Natarajan
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
18
|
Lu W, Gauthier LD, Poterba T, Giacopuzzi E, Goodrich JK, Stevens CR, King D, Daly MJ, Neale BM, Karczewski KJ. CHARR efficiently estimates contamination from DNA sequencing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.28.545801. [PMID: 37425834 PMCID: PMC10327099 DOI: 10.1101/2023.06.28.545801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
DNA sample contamination is a major issue in clinical and research applications of whole genome and exome sequencing. Even modest levels of contamination can substantially affect the overall quality of variant calls and lead to widespread genotyping errors. Currently, popular tools for estimating the contamination level use short-read data (BAM/CRAM files), which are expensive to store and manipulate and often not retained or shared widely. We propose a new metric to estimate DNA sample contamination from variant-level whole genome and exome sequence data, CHARR, Contamination from Homozygous Alternate Reference Reads, which leverages the infiltration of reference reads within homozygous alternate variant calls. CHARR uses a small proportion of variant-level genotype information and thus can be computed from single-sample gVCFs or callsets in VCF or BCF formats, as well as efficiently stored variant calls in Hail VDS format. Our results demonstrate that CHARR accurately recapitulates results from existing tools with substantially reduced costs, improving the accuracy and efficiency of downstream analyses of ultra-large whole genome and exome sequencing datasets.
Collapse
Affiliation(s)
- Wenhan Lu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Laura D Gauthier
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Timothy Poterba
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | | | - Julia K Goodrich
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Christine R Stevens
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Daniel King
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Mark J Daly
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Institute for Molecular Medicine Finland, Helsinki, Finland
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Novo Nordisk Foundation Center, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Konrad J Karczewski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Novo Nordisk Foundation Center, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| |
Collapse
|