1
|
Jolma A, Laverty KU, Fathi A, Yang AWH, Yellan I, Vorontsov IE, Inukai S, Kribelbauer-Swietek JF, Gralak AJ, Razavi R, Albu M, Brechalov A, Patel ZM, Nozdrin V, Meshcheryakov G, Kozin I, Abramov S, Boytsov A, Fornes O, Makeev VJ, Grau J, Grosse I, Bucher P, Deplancke B, Kulakovskiy IV, Hughes TR. Perspectives on Codebook: sequence specificity of uncharacterized human transcription factors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.11.622097. [PMID: 39605729 PMCID: PMC11601247 DOI: 10.1101/2024.11.11.622097] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
We describe an effort ("Codebook") to determine the sequence specificity of 332 putative and largely uncharacterized human transcription factors (TFs), as well as 61 control TFs. Nearly 5,000 independent experiments across multiple in vitro and in vivo assays produced motifs for just over half of the putative TFs analyzed (177, or 53%), of which most are unique to a single TF. The data highlight the extensive contribution of transposable elements to TF evolution, both in cis and trans, and identify tens of thousands of conserved, base-level binding sites in the human genome. The use of multiple assays provides an unprecedented opportunity to benchmark and analyze TF sequence specificity, function, and evolution, as further explored in accompanying manuscripts. 1,421 human TFs are now associated with a DNA binding motif. Extrapolation from the Codebook benchmarking, however, suggests that many of the currently known binding motifs for well-studied TFs may inaccurately describe the TF's true sequence preferences.
Collapse
Affiliation(s)
- Arttu Jolma
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Kaitlin U Laverty
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Ali Fathi
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Ally W H Yang
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Isaac Yellan
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Ilya E Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
| | - Sachi Inukai
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Judith F Kribelbauer-Swietek
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Antoni J Gralak
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Rozita Razavi
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Mihai Albu
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | | | - Zain M Patel
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Vladimir Nozdrin
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991, Moscow, Russia
| | - Georgy Meshcheryakov
- Institute of Protein Research, Russian Academy of Sciences, 142290, Pushchino, Russia
| | - Ivan Kozin
- Institute of Protein Research, Russian Academy of Sciences, 142290, Pushchino, Russia
| | - Sergey Abramov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Alexandr Boytsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Oriol Fornes
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| | - Vsevolod J Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
| | - Jan Grau
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, 06099, Halle, Germany
| | - Ivo Grosse
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, 06099, Halle, Germany
| | - Philipp Bucher
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Bart Deplancke
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Institute of Protein Research, Russian Academy of Sciences, 142290, Pushchino, Russia
| | - Timothy R Hughes
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| |
Collapse
|
2
|
Porter DF, Meyers RM, Miao W, Reynolds DL, Hong AW, Yang X, Mondal S, Siprashvili Z, Srinivasan S, Ducoli L, Meyers JM, Nguyen DT, Ko LA, Kellman L, Elfaki I, Guo M, Winge MC, Lopez-Pajares V, Porter IE, Tao S, Khavari PA. Disease-Linked Regulatory DNA Variants and Homeostatic Transcription Factors in Epidermis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.07.622542. [PMID: 39605549 PMCID: PMC11601284 DOI: 10.1101/2024.11.07.622542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Identifying noncoding single nucleotide variants ( SNVs ) in regulatory DNA linked to polygenic disease risk, the transcription factors ( TFs ) they bind, and the target genes they dysregulate is a goal in polygenic disease research. Massively parallel reporter gene analysis ( MPRA ) of 3,451 SNVs linked to risk for polygenic skin diseases characterized by disrupted epidermal homeostasis identified 355 differentially active SNVs ( daSNVs ). daSNV target gene analysis, combined with daSNV editing, underscored dysregulated epidermal differentiation as a pathomechanism shared across common polygenic skin diseases. CRISPR knockout screens of 1772 human TFs revealed 108 TFs essential for epidermal progenitor differentiation, uncovering novel roles for ZNF217, CXXC1, FOXJ2, IRX2 and NRF1. Population sampling CUT&RUN of 27 homeostatic TFs identified allele-specific DNA binding ( ASB ) differences at daSNVs enriched near epidermal homeostasis and monogenic skin disease genes, with notable representation of SP/KLF and AP-1/2 TFs. This resource implicates dysregulated differentiation in risk for diverse polygenic skin diseases.
Collapse
|
3
|
Li X, Melo LAN, Bussemaker HJ. Benchmarking and building DNA binding affinity models using allele-specific and allele-agnostic transcription factor binding data. Genome Biol 2024; 25:284. [PMID: 39482734 PMCID: PMC11529166 DOI: 10.1186/s13059-024-03424-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 10/17/2024] [Indexed: 11/03/2024] Open
Abstract
BACKGROUND Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity manifests itself in vivo as differences in TF occupancy between the two alleles at heterozygous loci. Genome-scale assays such as ChIP-seq currently are limited in their power to detect allele-specific binding (ASB) both in terms of read coverage and representation of individual variants in the cell lines used. This makes prediction of allelic differences in TF binding from sequence alone desirable, provided that the reliability of such predictions can be quantitatively assessed. RESULTS We here propose methods for benchmarking sequence-to-affinity models for TF binding in terms of their ability to predict allelic imbalances in ChIP-seq counts. We use a likelihood function based on an over-dispersed binomial distribution to aggregate evidence for allelic preference across the genome without requiring statistical significance for individual variants. This allows us to systematically compare predictive performance when multiple binding models for the same TF are available. To facilitate the de novo inference of high-quality models from paired-end in vivo binding data such as ChIP-seq, ChIP-exo, and CUT&Tag without read mapping or peak calling, we introduce an extensible reimplementation of our biophysically interpretable machine learning framework named PyProBound. Explicitly accounting for assay-specific bias in DNA fragmentation rate when training on ChIP-seq yields improved TF binding models. Moreover, we show how PyProBound can leverage our threshold-free ASB likelihood function to perform de novo motif discovery using allele-specific ChIP-seq counts. CONCLUSION Our work provides new strategies for predicting the functional impact of non-coding variants.
Collapse
Affiliation(s)
- Xiaoting Li
- Department of Biological Sciences, Columbia University, New York, NY, 10027, USA
| | - Lucas A N Melo
- Department of Biological Sciences, Columbia University, New York, NY, 10027, USA
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY, 10027, USA.
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA.
| |
Collapse
|
4
|
Dudek MF, Wenz BM, Brown CD, Voight BF, Almasy L, Grant SF. Characterization of non-coding variants associated with transcription factor binding through ATAC-seq-defined footprint QTLs in liver. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.24.614730. [PMID: 39386531 PMCID: PMC11463493 DOI: 10.1101/2024.09.24.614730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Non-coding variants discovered by genome-wide association studies (GWAS) are enriched in regulatory elements harboring transcription factor (TF) binding motifs, strongly suggesting a connection between disease association and the disruption of cis-regulatory sequences. Occupancy of a TF inside a region of open chromatin can be detected in ATAC-seq where bound TFs block the transposase Tn5, leaving a pattern of relatively depleted Tn5 insertions known as a "footprint". Here, we sought to identify variants associated with TF-binding, or "footprint quantitative trait loci" (fpQTLs) in ATAC-seq data generated from 170 human liver samples. We used computational tools to scan the ATAC-seq reads to quantify TF binding likelihood as "footprint scores" at variants derived from whole genome sequencing generated in the same samples. We tested for association between genotype and footprint score and observed 693 fpQTLs associated with footprint-inferred TF binding (FDR < 5%). Given that Tn5 insertion sites are measured with base-pair resolution, we show that fpQTLs can aid GWAS and QTL fine-mapping by precisely pinpointing TF activity within broad trait-associated loci where the underlying causal variant is unknown. Liver fpQTLs were strongly enriched across ChIP-seq peaks, liver expression QTLs (eQTLs), and liver-related GWAS loci, and their inferred effect on TF binding was concordant with their effect on underlying sequence motifs in 80% of cases. We conclude that fpQTLs can reveal causal GWAS variants, define the role of TF binding site disruption in disease and provide functional insights into non-coding variants, ultimately informing novel treatments for common diseases.
Collapse
Affiliation(s)
- Max F. Dudek
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Brandon M. Wenz
- Cell and Molecular Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Christopher D. Brown
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Cell and Molecular Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Benjamin F. Voight
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Laura Almasy
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute, Children’s Hospital of Philadelphia and Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia
| | - Struan F.A. Grant
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Endocrinology and Diabetes, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| |
Collapse
|
5
|
Uvarova AN, Zheremyan EA, Ustiugova AS, Murashko MM, Bogomolova EA, Demin DE, Stasevich EM, Kuprash DV, Korneev KV. Autoimmunity-Associated SNP rs3024505 Disrupts STAT3 Binding in B Cells, Leading to IL10 Dysregulation. Int J Mol Sci 2024; 25:10196. [PMID: 39337678 PMCID: PMC11432243 DOI: 10.3390/ijms251810196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Revised: 09/11/2024] [Accepted: 09/20/2024] [Indexed: 09/30/2024] Open
Abstract
Interleukin 10 (IL10) is a major anti-inflammatory cytokine that acts as a master regulator of the immune response. A single nucleotide polymorphism rs3024505(C/T), located downstream of the IL10 gene, is associated with several aggressive inflammatory diseases, including systemic lupus erythematosus, Sjögren's syndrome, Crohn's disease, and ulcerative colitis. In such autoimmune pathologies, IL10-producing B cells play a protective role by decreasing the level of inflammation and restoring immune homeostasis. This study demonstrates that rs3024505 is located within an enhancer that augments the activity of the IL10 promoter in a reporter system based on a human B cell line. The common rs3024505(C) variant creates a functional binding site for the transcription factor STAT3, whereas the risk allele rs3024505(T) disrupts STAT3 binding, thereby reducing the IL10 promoter activity. Our findings indicate that B cells from individuals carrying the minor rs3024505(T) allele may produce less IL10 due to the disrupted STAT3 binding site, contributing to the progression of inflammatory pathologies.
Collapse
Affiliation(s)
- Aksinya N. Uvarova
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
- Laboratory of Intracellular Signaling in Health and Disease, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Elina A. Zheremyan
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
- Laboratory of Intracellular Signaling in Health and Disease, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Alina S. Ustiugova
- Laboratory of Intracellular Signaling in Health and Disease, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Matvey M. Murashko
- Laboratory of Intracellular Signaling in Health and Disease, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
- Moscow Center for Advanced Studies, 123592 Moscow, Russia
| | - Elvina A. Bogomolova
- Laboratory of Intracellular Signaling in Health and Disease, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
- Moscow Center for Advanced Studies, 123592 Moscow, Russia
| | - Denis E. Demin
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
- Laboratory of Intracellular Signaling in Health and Disease, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Ekaterina M. Stasevich
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
- Laboratory of Intracellular Signaling in Health and Disease, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
- Moscow Center for Advanced Studies, 123592 Moscow, Russia
| | - Dmitry V. Kuprash
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
- Laboratory of Intracellular Signaling in Health and Disease, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Kirill V. Korneev
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
- Laboratory of Intracellular Signaling in Health and Disease, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| |
Collapse
|
6
|
Phan H, Brouard C, Mourad R. Semi-supervised learning with pseudo-labeling compares favorably with large language models for regulatory sequence prediction. Brief Bioinform 2024; 25:bbae560. [PMID: 39489607 PMCID: PMC11531863 DOI: 10.1093/bib/bbae560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Revised: 09/13/2024] [Accepted: 10/17/2024] [Indexed: 11/05/2024] Open
Abstract
Predicting molecular processes using deep learning is a promising approach to provide biological insights for non-coding single nucleotide polymorphisms identified in genome-wide association studies. However, most deep learning methods rely on supervised learning, which requires DNA sequences associated with functional data, and whose amount is severely limited by the finite size of the human genome. Conversely, the amount of mammalian DNA sequences is growing exponentially due to ongoing large-scale sequencing projects, but in most cases without functional data. To alleviate the limitations of supervised learning, we propose a novel semi-supervised learning (SSL) based on pseudo-labeling, which allows to exploit unlabeled DNA sequences from numerous genomes during model pre-training. We further improved it incorporating principles from the Noisy Student algorithm to predict the confidence in pseudo-labeled data used for pre-training, which showed improvements for transcription factor with very few binding (very small training data). The approach is very flexible and can be used to train any neural architecture including state-of-the-art models, and shows in most cases strong predictive performance improvements compared to standard supervised learning. Moreover, small models trained by SSL showed similar or better performance than large language model DNABERT2.
Collapse
Affiliation(s)
- Han Phan
- INRAE, MIAT, 31326 Castanet-Tolosan, France
| | | | - Raphaël Mourad
- INRAE, MIAT, 31326 Castanet-Tolosan, France
- University of Toulouse, UPS, 31062 Toulouse, France
| |
Collapse
|
7
|
Anderson AG, Moyers BA, Loupe JM, Rodriguez-Nunez I, Felker SA, Lawlor JMJ, Bunney WE, Bunney BG, Cartagena PM, Sequeira A, Watson SJ, Akil H, Mendenhall EM, Cooper GM, Myers RM. Allele-specific transcription factor binding across human brain regions offers mechanistic insight into eQTLs. Genome Res 2024; 34:1224-1234. [PMID: 39152038 PMCID: PMC11444172 DOI: 10.1101/gr.278601.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Accepted: 08/14/2024] [Indexed: 08/19/2024]
Abstract
Transcription factors (TFs) regulate gene expression by facilitating or disrupting the formation of transcription initiation machinery at particular genomic loci. Because TF occupancy is driven in part by recognition of DNA sequence, genetic variation can influence TF-DNA associations and gene regulation. To identify variants that impact TF binding in human brain tissues, we assessed allele-specific binding (ASB) at heterozygous variants for 94 TFs in nine brain regions from two donors. Leveraging graph genomes constructed from phased genomic sequence data, we compared ChIP-seq signals between alleles at heterozygous variants within each brain region and identified thousands of variants exhibiting ASB for at least one TF. ASB reproducibility was measured by comparisons between independent experiments both within and between donors. We found that rare alleles in the general population more frequently led to reduced TF binding, whereas common alleles had an equal likelihood of increasing or decreasing binding. Further, for ASB variants in predicted binding motifs, the favored allele tended to be the one with the stronger expected motif match, but this concordance was not observed within highly occupied sites. We also found that neuron-specific cis-regulatory elements (cCREs), in contrast with oligodendrocyte-specific cCREs, showed depletion of ASB variants. We identified 2670 ASB variants associated with evidence for allele-specific gene expression in the brain from GTEx data and observed increasing eQTL effect direction concordance as ASB significance increases. These results provide a valuable and unique resource for mechanistic analysis of cis-regulatory variation in human brain tissue.
Collapse
Affiliation(s)
- Ashlyn G Anderson
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
- University of Alabama at Birmingham, Birmingham, Alabama 35294, USA
| | - Belle A Moyers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - Jacob M Loupe
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | | | | | - James M J Lawlor
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - William E Bunney
- Department of Psychiatry and Human Behavior, University of California, Irvine, California 92697, USA
| | - Blynn G Bunney
- Department of Psychiatry and Human Behavior, University of California, Irvine, California 92697, USA
| | - Preston M Cartagena
- Department of Psychiatry and Human Behavior, University of California, Irvine, California 92697, USA
| | - Adolfo Sequeira
- Department of Psychiatry and Human Behavior, University of California, Irvine, California 92697, USA
| | - Stanley J Watson
- The Michigan Neuroscience Institute, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Huda Akil
- The Michigan Neuroscience Institute, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Eric M Mendenhall
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - Gregory M Cooper
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA;
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA;
| |
Collapse
|
8
|
Bhattacharyya S, Ay F. Identifying genetic variants associated with chromatin looping and genome function. Nat Commun 2024; 15:8174. [PMID: 39289357 PMCID: PMC11408621 DOI: 10.1038/s41467-024-52296-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 08/30/2024] [Indexed: 09/19/2024] Open
Abstract
Here we present a comprehensive HiChIP dataset on naïve CD4 T cells (nCD4) from 30 donors and identify QTLs that associate with genotype-dependent and/or allele-specific variation of HiChIP contacts defining loops between active regulatory regions (iQTLs). We observe a substantial overlap between iQTLs and previously defined eQTLs and histone QTLs, and an enrichment for fine-mapped QTLs and GWAS variants. Furthermore, we describe a distinct subset of nCD4 iQTLs, for which the significant variation of chromatin contacts in nCD4 are translated into significant eQTL trends in CD4 T cell memory subsets. Finally, we define connectivity-QTLs as iQTLs that are significantly associated with concordant genotype-dependent changes in chromatin contacts over a broad genomic region (e.g., GWAS SNP in the RNASET2 locus). Our results demonstrate the importance of chromatin contacts as a complementary modality for QTL mapping and their power in identifying previously uncharacterized QTLs linked to cell-specific gene expression and connectivity.
Collapse
Affiliation(s)
| | - Ferhat Ay
- La Jolla Institute for Immunology, La Jolla, CA, USA.
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
9
|
Fan J, Li X, Yang J, Zhang S, Qu HQ, Ji D, Glessner JT, Hao J, Ding Z, Wang N, Meng X, Xia Q, Hakonarson H, Wei W, Li J. Revealing novel genomic insights and therapeutic targets for juvenile idiopathic arthritis through omics. Rheumatology (Oxford) 2024; 63:SI249-SI259. [PMID: 38317060 DOI: 10.1093/rheumatology/keae078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 12/22/2023] [Accepted: 01/10/2024] [Indexed: 02/07/2024] Open
Abstract
BACKGROUND The genetic architecture of JIA remains only partially comprehended. There is a clear imperative for continued endeavours to uncover insights into the underlying causes of JIA. METHODS This study encompassed a comprehensive spectrum of endeavours, including conducting a JIA genome-wide association study (GWAS) meta-analysis that incorporated data from 4550 JIA cases and 18 446 controls. We employed in silico and genome-editing approaches to prioritizing target genes. To investigate pleiotropic effects, we conducted phenome-wide association studies. Cell-type enrichment analyses were performed by integrating bulk and single-cell sequencing data. Finally, we delved into potential druggable targets for JIA. RESULTS Fourteen genome-wide significant non-HLA loci were identified, including four novel loci, each exhibiting pleiotropic associations with other autoimmune diseases or musculoskeletal traits. We uncovered strong genetic correlation between JIA and BMD traits at 52 genomic regions, including three GWAS loci for JIA. Candidate genes with immune functions were captured by in silico analyses at each novel locus, with additional findings identified through our experimental approach. Cell-type enrichment analysis revealed 21 specific immune cell types crucial for the affected organs in JIA, indicating their potential contribution to the disease. Finally, 24 known or candidate druggable target genes were prioritized. CONCLUSIONS Our identification of four novel JIA-associated genes, CD247, RHOH, COLEC10 and IRF8, broadens the novel potential drug repositioning opportunities. We established a new genetic link between COLEC10, TNFRSF11B and JIA/BMD. Additionally, the identification of RHOH underscores its role in positive thymocyte selection, thereby illuminating a critical facet of JIA's underlying biological mechanisms.
Collapse
Affiliation(s)
- Jingxian Fan
- Department of Cell Biology, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Tianjin Key Laboratory of Medical Epigenetics, Tianjin Institute of Immunology, Tianjin Key Laboratory of Birth Defects for Prevention and Treatment, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
- Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Xiumei Li
- Department of Cell Biology, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Tianjin Key Laboratory of Medical Epigenetics, Tianjin Institute of Immunology, Tianjin Key Laboratory of Birth Defects for Prevention and Treatment, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Jie Yang
- Department of Cell Biology, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Tianjin Key Laboratory of Medical Epigenetics, Tianjin Institute of Immunology, Tianjin Key Laboratory of Birth Defects for Prevention and Treatment, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Sipeng Zhang
- Department of Cell Biology, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Tianjin Key Laboratory of Medical Epigenetics, Tianjin Institute of Immunology, Tianjin Key Laboratory of Birth Defects for Prevention and Treatment, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Hui-Qi Qu
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Dandan Ji
- Department of Cell Biology, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Tianjin Key Laboratory of Medical Epigenetics, Tianjin Institute of Immunology, Tianjin Key Laboratory of Birth Defects for Prevention and Treatment, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Joseph T Glessner
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jian Hao
- Department of Rheumatology and Immunology, Tianjin Medical University General Hospital, Tianjin, China
| | - Zhiyong Ding
- Mills Institute for Personalized Cancer Care, Fynn Biotechnologies Ltd., Jinan, China
| | - Nan Wang
- Mills Institute for Personalized Cancer Care, Fynn Biotechnologies Ltd., Jinan, China
| | - Xinyi Meng
- Department of Cell Biology, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Tianjin Key Laboratory of Medical Epigenetics, Tianjin Institute of Immunology, Tianjin Key Laboratory of Birth Defects for Prevention and Treatment, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Qianghua Xia
- Department of Cell Biology, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Tianjin Key Laboratory of Medical Epigenetics, Tianjin Institute of Immunology, Tianjin Key Laboratory of Birth Defects for Prevention and Treatment, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
- Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Hakon Hakonarson
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Wei Wei
- Department of Rheumatology and Immunology, Tianjin Medical University General Hospital, Tianjin, China
| | - Jin Li
- Department of Cell Biology, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Tianjin Key Laboratory of Medical Epigenetics, Tianjin Institute of Immunology, Tianjin Key Laboratory of Birth Defects for Prevention and Treatment, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
- Department of Rheumatology and Immunology, Tianjin Medical University General Hospital, Tianjin, China
| |
Collapse
|
10
|
Damarov IS, Korbolina EE, Rykova EY, Merkulova TI. Multi-Omics Analysis Revealed the rSNPs Potentially Involved in T2DM Pathogenic Mechanism and Metformin Response. Int J Mol Sci 2024; 25:9297. [PMID: 39273245 PMCID: PMC11394919 DOI: 10.3390/ijms25179297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Revised: 08/14/2024] [Accepted: 08/26/2024] [Indexed: 09/15/2024] Open
Abstract
The goal of our study was to identify and assess the functionally significant SNPs with potentially important roles in the development of type 2 diabetes mellitus (T2DM) and/or their effect on individual response to antihyperglycemic medication with metformin. We applied a bioinformatics approach to identify the regulatory SNPs (rSNPs) associated with allele-asymmetric binding and expression events in our paired ChIP-seq and RNA-seq data for peripheral blood mononuclear cells (PBMCs) of nine healthy individuals. The rSNP outcomes were analyzed using public data from the GWAS (Genome-Wide Association Studies) and Genotype-Tissue Expression (GTEx). The differentially expressed genes (DEGs) between healthy and T2DM individuals (GSE221521), including metformin responders and non-responders (GSE153315), were searched for in GEO RNA-seq data. The DEGs harboring rSNPs were analyzed using the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). We identified 14,796 rSNPs in the promoters of 5132 genes of human PBMCs. We found 4280 rSNPs to associate with both phenotypic traits (GWAS) and expression quantitative trait loci (eQTLs) from GTEx. Between T2DM patients and controls, 3810 rSNPs were detected in the promoters of 1284 DEGs. Based on the protein-protein interaction (PPI) network, we identified 31 upregulated hub genes, including the genes involved in inflammation, obesity, and insulin resistance. The top-ranked 10 enriched KEGG pathways for these hubs included insulin, AMPK, and FoxO signaling pathways. Between metformin responders and non-responders, 367 rSNPs were found in the promoters of 131 DEGs. Genes encoding transcription factors and transcription regulators were the most widely represented group and many were shown to be involved in the T2DM pathogenesis. We have formed a list of human rSNPs that add functional interpretation to the T2DM-association signals identified in GWAS. The results suggest candidate causal regulatory variants for T2DM, with strong enrichment in the pathways related to glucose metabolism, inflammation, and the effects of metformin.
Collapse
Affiliation(s)
- Igor S Damarov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia
| | - Elena E Korbolina
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia
| | - Elena Y Rykova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia
- Department of Engineering Problems of Ecology, Novosibirsk State Technical University, 630087 Novosibirsk, Russia
| | - Tatiana I Merkulova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia
| |
Collapse
|
11
|
Biddie SC, Weykopf G, Hird EF, Friman ET, Bickmore WA. DNA-binding factor footprints and enhancer RNAs identify functional non-coding genetic variants. Genome Biol 2024; 25:208. [PMID: 39107801 PMCID: PMC11304670 DOI: 10.1186/s13059-024-03352-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 07/25/2024] [Indexed: 08/10/2024] Open
Abstract
BACKGROUND Genome-wide association studies (GWAS) have revealed a multitude of candidate genetic variants affecting the risk of developing complex traits and diseases. However, the highlighted regions are typically in the non-coding genome, and uncovering the functional causative single nucleotide variants (SNVs) is challenging. Prioritization of variants is commonly based on genomic annotation with markers of active regulatory elements, but current approaches still poorly predict functional variants. To address this, we systematically analyze six markers of active regulatory elements for their ability to identify functional variants. RESULTS We benchmark against molecular quantitative trait loci (molQTL) from assays of regulatory element activity that identify allelic effects on DNA-binding factor occupancy, reporter assay expression, and chromatin accessibility. We identify the combination of DNase footprints and divergent enhancer RNA (eRNA) as markers for functional variants. This signature provides high precision, but with a trade-off of low recall, thus substantially reducing candidate variant sets to prioritize variants for functional validation. We present this as a framework called FINDER-Functional SNV IdeNtification using DNase footprints and eRNA. CONCLUSIONS We demonstrate the utility to prioritize variants using leukocyte count trait and analyze variants in linkage disequilibrium with a lead variant to predict a functional variant in asthma. Our findings have implications for prioritizing variants from GWAS, in development of predictive scoring algorithms, and for functionally informed fine mapping approaches.
Collapse
Affiliation(s)
- Simon C Biddie
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK.
- NHS Lothian, Edinburgh, UK.
| | - Giovanna Weykopf
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | | | - Elias T Friman
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Wendy A Bickmore
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
12
|
Ghoreishifar M, Chamberlain AJ, Xiang R, Prowse-Wilkins CP, Lopdell TJ, Littlejohn MD, Pryce JE, Goddard ME. Allele-specific binding variants causing ChIP-seq peak height of histone modification are not enriched in expression QTL annotations. Genet Sel Evol 2024; 56:50. [PMID: 38937662 PMCID: PMC11212393 DOI: 10.1186/s12711-024-00916-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 06/04/2024] [Indexed: 06/29/2024] Open
Abstract
BACKGROUND Genome sequence variants affecting complex traits (quantitative trait loci, QTL) are enriched in functional regions of the genome, such as those marked by certain histone modifications. These variants are believed to influence gene expression. However, due to the linkage disequilibrium among nearby variants, pinpointing the precise location of QTL is challenging. We aimed to identify allele-specific binding (ASB) QTL (asbQTL) that cause variation in the level of histone modification, as measured by the height of peaks assayed by ChIP-seq (chromatin immunoprecipitation sequencing). We identified DNA sequences that predict the difference between alleles in ChIP-seq peak height in H3K4me3 and H3K27ac histone modifications in the mammary glands of cows. RESULTS We used a gapped k-mer support vector machine, a novel best linear unbiased prediction model, and a multiple linear regression model that combines the other two approaches to predict variant impacts on peak height. For each method, a subset of 1000 sites with the highest magnitude of predicted ASB was considered as candidate asbQTL. The accuracy of this prediction was measured by the proportion where the predicted direction matched the observed direction. Prediction accuracy ranged between 0.59 and 0.74, suggesting that these 1000 sites are enriched for asbQTL. Using independent data, we investigated functional enrichment in the candidate asbQTL set and three control groups, including non-causal ASB sites, non-ASB variants under a peak, and SNPs (single nucleotide polymorphisms) not under a peak. For H3K4me3, a higher proportion of the candidate asbQTL were confirmed as ASB when compared to the non-causal ASB sites (P < 0.01). However, these candidate asbQTL did not enrich for the other annotations, including expression QTL (eQTL), allele-specific expression QTL (aseQTL) and sites conserved across mammals (P > 0.05). CONCLUSIONS We identified putatively causal sites for asbQTL using the DNA sequence surrounding these sites. Our results suggest that many sites influencing histone modifications may not directly affect gene expression. However, it is important to acknowledge that distinguishing between putative causal ASB sites and other non-causal ASB sites in high linkage disequilibrium with the causal sites regarding their impact on gene expression may be challenging due to limitations in statistical power.
Collapse
Affiliation(s)
- Mohammad Ghoreishifar
- Agriculture Victoria Research, AgriBio Centre for AgriBioscience, Bundoora, VIC, 3083, Australia.
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia.
| | - Amanda J Chamberlain
- Agriculture Victoria Research, AgriBio Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| | - Ruidong Xiang
- Agriculture Victoria Research, AgriBio Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
- Faculty of Veterinary & Agricultural Science, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Claire P Prowse-Wilkins
- Agriculture Victoria Research, AgriBio Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
- Faculty of Veterinary & Agricultural Science, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Thomas J Lopdell
- Research and Development, Livestock Improvement Corporation, Private Bag 3016, Hamilton, 3240, New Zealand
| | - Mathew D Littlejohn
- Research and Development, Livestock Improvement Corporation, Private Bag 3016, Hamilton, 3240, New Zealand
| | - Jennie E Pryce
- Agriculture Victoria Research, AgriBio Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| | - Michael E Goddard
- Agriculture Victoria Research, AgriBio Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
- Faculty of Veterinary & Agricultural Science, University of Melbourne, Parkville, VIC, 3010, Australia
| |
Collapse
|
13
|
Tian Y, Wu L, Huang CC, Wang L. Identify Regulatory eQTLs by Multiome Sequencing in Prostate Single Cells. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.19.599704. [PMID: 38948854 PMCID: PMC11213234 DOI: 10.1101/2024.06.19.599704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
While genome-wide association studies and expression quantitative trait loci (eQTL) analysis have made significant progress in identifying noncoding variants associated with prostate cancer risk and bulk tissue transcriptome changes, the regulatory effect of these genetic elements on gene expression remains largely unknown. Recent developments in single-cell sequencing have made it possible to perform ATAC-seq and RNA-seq profiling simultaneously to capture functional associations between chromatin accessibility and gene expression. In this study, we tested our hypothesis that this multiome single-cell approach allows for mapping regulatory elements and their target genes at prostate cancer risk loci. We applied a 10X Multiome ATAC + Gene Expression platform to encapsulate Tn5 transposase-tagged nuclei from multiple prostate cell lines for a total of 65,501 high quality single cells from RWPE1, RWPE2, PrEC, BPH1, DU145, PC3, 22Rv1 and LNCaP cell lines. To address data sparsity commonly seen in the single-cell sequencing, we performed targeted sequencing to enrich sequencing data at prostate cancer risk loci involving 2,730 candidate germline variants and 273 associated genes. Although not increasing the number of captured cells, the targeted multiome data did improve eQTL gene expression abundance by about 20% and chromatin accessibility abundance by about 5%. Based on this multiomic profiling, we further associated RNA expression alterations with chromatin accessibility of germline variants at single cell levels. Cross validation analysis showed high overlaps between the multiome associations and the bulk eQTL findings from GTEx prostate cohort. We found that about 20% of GTEx eQTLs were covered within the significant multiome associations (p-value ≤ 0.05, gene abundance percentage ≥ 5%), and roughly 10% of the multiome associations could be identified by significant GTEx eQTLs. We also analyzed accessible regions with available heterozygous SNP reads and observed more frequent association in genomic regions with allelically accessible variants (p = 0.0055). Among these findings were previously reported regulatory variants including rs60464856-RUVBL1 (multiome p-value = 0.0099 in BPH1) and rs7247241-SPINT2 (multiome p-value = 0.0002- 0.0004 in 22Rv1). We also functionally validated a new regulatory SNP and its target gene rs2474694-VPS53 (multiome p-value = 0.00956 in BPH1 and 0.00625 in DU145) by reporter assay and SILAC proteomics sequencing. Taken together, our data demonstrated the feasibility of the multiome single-cell approach for identifying regulatory SNPs and their regulated genes.
Collapse
Affiliation(s)
- Yijun Tian
- Department of Tumor Biology, Moffitt Cancer Center, 12902 Magnolia Drive, Tampa, FL 33612, United States
| | - Lang Wu
- Population Sciences in the Pacific Program, University of Hawai i Cancer Center, University of Hawai i at Mānoa, Honolulu, HI 96813, USA
| | - Chang-Ching Huang
- Zilber College of Public Health, University of Wisconsin, Milwaukee, WI 53226, United States
| | - Liang Wang
- Department of Tumor Biology, Moffitt Cancer Center, 12902 Magnolia Drive, Tampa, FL 33612, United States
| |
Collapse
|
14
|
Uvarova AN, Tkachenko EA, Stasevich EM, Zheremyan EA, Korneev KV, Kuprash DV. Methods for Functional Characterization of Genetic Polymorphisms of Non-Coding Regulatory Regions of the Human Genome. BIOCHEMISTRY. BIOKHIMIIA 2024; 89:1002-1013. [PMID: 38981696 DOI: 10.1134/s0006297924060026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 03/27/2024] [Accepted: 04/11/2024] [Indexed: 07/11/2024]
Abstract
Currently, numerous associations between genetic polymorphisms and various diseases have been characterized through the Genome-Wide Association Studies. Majority of the clinically significant polymorphisms are localized in non-coding regions of the genome. While modern bioinformatic resources make it possible to predict molecular mechanisms that explain influence of the non-coding polymorphisms on gene expression, such hypotheses require experimental verification. This review discusses the methods for elucidating molecular mechanisms underlying dependence of the disease pathogenesis on specific genetic variants within the non-coding sequences. A particular focus is on the methods for identification of transcription factors with binding efficiency dependent on polymorphic variations. Despite remarkable progress in bioinformatic resources enabling prediction of the impact of polymorphisms on the disease pathogenesis, there is still the need for experimental approaches to investigate this issue.
Collapse
Affiliation(s)
- Aksinya N Uvarova
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 119991, Russia.
| | - Elena A Tkachenko
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 119991, Russia
- Faculty of Biology, Lomonosov Moscow State University, Moscow, 119234, Russia
| | - Ekaterina M Stasevich
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 119991, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, 141700, Russia
| | - Elina A Zheremyan
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 119991, Russia
| | - Kirill V Korneev
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 119991, Russia
| | - Dmitry V Kuprash
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 119991, Russia
- Faculty of Biology, Lomonosov Moscow State University, Moscow, 119234, Russia
| |
Collapse
|
15
|
Siraj L, Castro RI, Dewey H, Kales S, Nguyen TTL, Kanai M, Berenzy D, Mouri K, Wang QS, McCaw ZR, Gosai SJ, Aguet F, Cui R, Vockley CM, Lareau CA, Okada Y, Gusev A, Jones TR, Lander ES, Sabeti PC, Finucane HK, Reilly SK, Ulirsch JC, Tewhey R. Functional dissection of complex and molecular trait variants at single nucleotide resolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.05.592437. [PMID: 38766054 PMCID: PMC11100724 DOI: 10.1101/2024.05.05.592437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Identifying the causal variants and mechanisms that drive complex traits and diseases remains a core problem in human genetics. The majority of these variants have individually weak effects and lie in non-coding gene-regulatory elements where we lack a complete understanding of how single nucleotide alterations modulate transcriptional processes to affect human phenotypes. To address this, we measured the activity of 221,412 trait-associated variants that had been statistically fine-mapped using a Massively Parallel Reporter Assay (MPRA) in 5 diverse cell-types. We show that MPRA is able to discriminate between likely causal variants and controls, identifying 12,025 regulatory variants with high precision. Although the effects of these variants largely agree with orthogonal measures of function, only 69% can plausibly be explained by the disruption of a known transcription factor (TF) binding motif. We dissect the mechanisms of 136 variants using saturation mutagenesis and assign impacted TFs for 91% of variants without a clear canonical mechanism. Finally, we provide evidence that epistasis is prevalent for variants in close proximity and identify multiple functional variants on the same haplotype at a small, but important, subset of trait-associated loci. Overall, our study provides a systematic functional characterization of likely causal common variants underlying complex and molecular human traits, enabling new insights into the regulatory grammar underlying disease risk.
Collapse
Affiliation(s)
- Layla Siraj
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Program in Biophysics, Harvard Graduate School of Arts and Sciences, Boston, MA, USA
- Harvard-Massachusetts Institute of Technology MD/PhD Program, Harvard Medical School, Boston, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | | | | | | | | | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
- Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, MA, USA
| | | | | | - Qingbo S. Wang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan
| | | | - Sager J. Gosai
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - François Aguet
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Ran Cui
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Caleb A. Lareau
- Program in Computational and Systems Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Kanagawa, Japan
| | - Alexander Gusev
- Harvard Medical School and Dana-Farber Cancer Institute, Boston, MA, USA
| | - Thouis R. Jones
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Eric S. Lander
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biology, MIT, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Pardis C. Sabeti
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Hilary K. Finucane
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
| | - Steven K. Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
- Wu Tsai Institute, Yale University, New Haven, CT, USA
| | - Jacob C. Ulirsch
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, ME, USA
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA
- Graduate School of Biomedical Sciences, Tufts University School of Medicine, Boston, MA, USA
| |
Collapse
|
16
|
Zhang Y, Hou G, Shen N. Non-coding DNA variants for risk in lupus. Best Pract Res Clin Rheumatol 2024; 38:101937. [PMID: 38429183 DOI: 10.1016/j.berh.2024.101937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 02/08/2024] [Accepted: 02/12/2024] [Indexed: 03/03/2024]
Abstract
Systemic Lupus Erythematosus (SLE) is a multifactorial autoimmune disease that arises from a dynamic interplay between genetics and environmental triggers. The advent of sophisticated genomics technology has catalyzed a shift in our understanding of disease etiology, spotlighting the pivotal role of non-coding DNA variants in SLE pathogenesis. In this review, we present a comprehensive examination of the non-coding variants associated with SLE, shedding light on their role in influencing disease risk and progression. We discuss the latest methodological advancements that have been instrumental in the identification and functional characterization of these genomic elements, with a special focus on the transformative power of CRISPR-based gene-editing technologies. Additionally, the review probes into the therapeutic opportunities that arise from modulating non-coding regions associated with SLE. Through an exploration of the complex network of non-coding DNA, this review aspires to decode the genetic puzzle of SLE and set the stage for groundbreaking gene-based therapeutic interventions and the advancement of precision medicine strategies tailored to SLE management.
Collapse
Affiliation(s)
- Yutong Zhang
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200001, China
| | - Guojun Hou
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200001, China
| | - Nan Shen
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200001, China.
| |
Collapse
|
17
|
Avery CN, Russell ND, Steely CJ, Hersh AO, Bohnsack JF, Prahalad S, Jorde LB. Shared genomic segments analysis identifies MHC class I and class III molecules as genetic risk factors for juvenile idiopathic arthritis. HGG ADVANCES 2024; 5:100277. [PMID: 38369753 PMCID: PMC10918567 DOI: 10.1016/j.xhgg.2024.100277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 02/13/2024] [Accepted: 02/13/2024] [Indexed: 02/20/2024] Open
Abstract
Juvenile idiopathic arthritis (JIA) is a complex rheumatic disease encompassing several clinically defined subtypes of varying severity. The etiology of JIA remains largely unknown, but genome-wide association studies (GWASs) have identified up to 22 genes associated with JIA susceptibility, including a well-established association with HLA-DRB1. Continued investigation of heritable risk factors has been hindered by disease heterogeneity and low disease prevalence. In this study, we utilized shared genomic segments (SGS) analysis on whole-genome sequencing of 40 cases from 12 multi-generational pedigrees significantly enriched for JIA. Subsets of cases are connected by a common ancestor in large extended pedigrees, increasing the power to identify disease-associated loci. SGS analysis identifies genomic segments shared among disease cases that are likely identical by descent and anchored by a disease locus. This approach revealed statistically significant signals for major histocompatibility complex (MHC) class I and class III alleles, particularly HLA-A∗02:01, which was observed at a high frequency among cases. Furthermore, we identified an additional risk locus at 12q23.2-23.3, containing genes primarily expressed by naive B cells, natural killer cells, and monocytes. The recognition of additional risk beyond HLA-DRB1 provides a new perspective on immune cell dynamics in JIA. These findings contribute to our understanding of JIA and may guide future research and therapeutic strategies.
Collapse
Affiliation(s)
- Cecile N Avery
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA.
| | - Nicole D Russell
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Cody J Steely
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Aimee O Hersh
- Department of Pediatrics, University of Utah, Salt Lake City, UT 84112, USA
| | - John F Bohnsack
- Department of Pediatrics, University of Utah, Salt Lake City, UT 84112, USA
| | - Sampath Prahalad
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA 30307, USA
| | - Lynn B Jorde
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA.
| |
Collapse
|
18
|
Polcaro G, Liguori L, Manzo V, Chianese A, Donadio G, Caputo A, Scognamiglio G, Dell'Annunziata F, Langella M, Corbi G, Ottaiano A, Cascella M, Perri F, De Marco M, Col JD, Nassa G, Giurato G, Zeppa P, Filippelli A, Franci G, Piaz FD, Conti V, Pepe S, Sabbatino F. rs822336 binding to C/EBPβ and NFIC modulates induction of PD-L1 expression and predicts anti-PD-1/PD-L1 therapy in advanced NSCLC. Mol Cancer 2024; 23:63. [PMID: 38528526 PMCID: PMC10962156 DOI: 10.1186/s12943-024-01976-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 02/29/2024] [Indexed: 03/27/2024] Open
Abstract
Efficient predictive biomarkers are needed for immune checkpoint inhibitor (ICI)-based immunotherapy in non-small cell lung cancer (NSCLC). Testing the predictive value of single nucleotide polymorphisms (SNPs) in programmed cell death 1 (PD-1) or its ligand 1 (PD-L1) has shown contrasting results. Here, we aim to validate the predictive value of PD-L1 SNPs in advanced NSCLC patients treated with ICIs as well as to define the molecular mechanisms underlying the role of the identified SNP candidate. rs822336 efficiently predicted response to anti-PD-1/PD-L1 immunotherapy in advanced non-oncogene addicted NSCLC patients as compared to rs2282055 and rs4143815. rs822336 mapped to the promoter/enhancer region of PD-L1, differentially affecting the induction of PD-L1 expression in human NSCLC cell lines as well as their susceptibility to HLA class I antigen matched PBMCs incubated with anti-PD-1 monoclonal antibody nivolumab. The induction of PD-L1 expression by rs822336 was mediated by a competitive allele-specificity binding of two identified transcription factors: C/EBPβ and NFIC. As a result, silencing of C/EBPβ and NFIC differentially regulated the induction of PD-L1 expression in human NSCLC cell lines carrying different rs822336 genotypes. Analysis by binding microarray further validated the competitive allele-specificity binding of C/EBPβ and NFIC to PD-L1 promoter/enhancer region based on rs822336 genotype in human NSCLC cell lines. These findings have high clinical relevance since identify rs822336 and induction of PD-L1 expression as novel biomarkers for predicting anti-PD-1/PD-L1-based immunotherapy in advanced NSCLC patients.
Collapse
Affiliation(s)
- Giovanna Polcaro
- Oncology Unit, Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, 84081, Italy
| | - Luigi Liguori
- Oncology Unit, Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, 84081, Italy
- Oncology Unit, Department of Medicine, Surgery and Dentistry, University of Naples "Federico II", Naples, 80131, Italy
| | - Valentina Manzo
- Clinical Pharmacology Unit, Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, 84081, Italy
- University Hospital "San Giovanni di Dio e Ruggi d'Aragona", Salerno, 84131, Italy
| | - Annalisa Chianese
- Department of Experimental Medicine, University of Campania "Luigi Vanvitelli", Naples, 80138, Italy
| | - Giuliana Donadio
- Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, 84081, Italy
| | - Alessandro Caputo
- University Hospital "San Giovanni di Dio e Ruggi d'Aragona", Salerno, 84131, Italy
- Pathology Unit, Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, 84081, Italy
| | - Giosuè Scognamiglio
- Pathology Unit, Istituto Nazionale Tumori IRCCS Fondazione G. Pascale, Naples, 80131, Italy
| | - Federica Dell'Annunziata
- Department of Experimental Medicine, University of Campania "Luigi Vanvitelli", Naples, 80138, Italy
| | - Maddalena Langella
- Hematology and Transplant Unit, University Hospital "San Giovanni di Dio e Ruggi d'Aragona", Salerno, 84131, Italy
| | - Graziamaria Corbi
- Department of Translational Medical Sciences, University of Naples "Federico II", Naples, 80131, Italy
| | - Alessandro Ottaiano
- Division of Innovative Therapies for Abdominal Metastases, Istituto Nazionale Tumori IRCCS Fondazione G. Pascale, Naples, 80131, Italy
| | - Marco Cascella
- Unit of Anesthesiology, Intensive Care Medicine, and Pain Medicine, Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, 84081, Italy
| | - Francesco Perri
- Medical and Experimental Head and Neck Oncology Unit, Istituto Nazionale Tumori IRCCS Fondazione G. Pascale, Naples, 80131, Italy
| | - Margot De Marco
- Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, 84081, Italy
| | - Jessica Dal Col
- Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, 84081, Italy
| | - Giovanni Nassa
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, 84081, Italy
| | - Giorgio Giurato
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, 84081, Italy
| | - Pio Zeppa
- University Hospital "San Giovanni di Dio e Ruggi d'Aragona", Salerno, 84131, Italy
- Pathology Unit, Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, 84081, Italy
| | - Amelia Filippelli
- Clinical Pharmacology Unit, Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, 84081, Italy
- University Hospital "San Giovanni di Dio e Ruggi d'Aragona", Salerno, 84131, Italy
| | - Gianluigi Franci
- University Hospital "San Giovanni di Dio e Ruggi d'Aragona", Salerno, 84131, Italy
- Clinical Microbiology Unit, Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, 84081, Italy
| | - Fabrizio Dal Piaz
- University Hospital "San Giovanni di Dio e Ruggi d'Aragona", Salerno, 84131, Italy
- Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, 84081, Italy
| | - Valeria Conti
- Clinical Pharmacology Unit, Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, 84081, Italy.
- University Hospital "San Giovanni di Dio e Ruggi d'Aragona", Salerno, 84131, Italy.
| | - Stefano Pepe
- Oncology Unit, Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, 84081, Italy.
- University Hospital "San Giovanni di Dio e Ruggi d'Aragona", Salerno, 84131, Italy.
| | - Francesco Sabbatino
- Oncology Unit, Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, 84081, Italy.
- University Hospital "San Giovanni di Dio e Ruggi d'Aragona", Salerno, 84131, Italy.
| |
Collapse
|
19
|
Holmes MJ, Mahjour B, Castro CP, Farnum GA, Diehl AG, Boyle AP. HaplotagLR: An efficient and configurable utility for haplotagging long reads. PLoS One 2024; 19:e0298688. [PMID: 38478504 PMCID: PMC10936807 DOI: 10.1371/journal.pone.0298688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 01/30/2024] [Indexed: 03/17/2024] Open
Abstract
Understanding the functional effects of sequence variation is crucial in genomics. Individual human genomes contain millions of variants that contribute to phenotypic variability and disease risks at the population level. Because variants rarely act in isolation, we must consider potential interactions of neighboring variants to accurately predict functional effects. We can accomplish this using haplotagging, which matches sequencing reads to their parental haplotypes using alleles observed at known heterozygous variants. However, few published tools for haplotagging exist and these share several technical and usability-related shortcomings that limit applicability, in particular a lack of insight or control over error rates, and lack of key metrics on the underlying sources of haplotagging error. Here we present HaplotagLR: a user-friendly tool that haplotags long sequencing reads based on a multinomial model and existing phased variant lists. HaplotagLR is user-configurable and includes a basic error model to control the empirical FDR in its output. We show that HaplotagLR outperforms the leading haplotagging method in simulated datasets, especially at high levels of specificity, and displays 7% greater sensitivity in haplotagging real data. HaplotagLR advances both the immediate utility of haplotagging and paves the way for further improvements to this important method.
Collapse
Affiliation(s)
- Monica J. Holmes
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Babak Mahjour
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Christopher P. Castro
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Gregory A. Farnum
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Adam G. Diehl
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Alan P. Boyle
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
20
|
Xu Q, Bao X, Lin Z, Tang L, He LN, Ren J, Zuo Z, Hu K. AStruct: detection of allele-specific RNA secondary structure in structuromic probing data. BMC Bioinformatics 2024; 25:91. [PMID: 38429654 PMCID: PMC11264973 DOI: 10.1186/s12859-024-05704-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 02/14/2024] [Indexed: 03/03/2024] Open
Abstract
BACKGROUND Uncovering functional genetic variants from an allele-specific perspective is of paramount importance in advancing our understanding of gene regulation and genetic diseases. Recently, various allele-specific events, such as allele-specific gene expression, allele-specific methylation, and allele-specific binding, have been explored on a genome-wide scale due to the development of high-throughput sequencing methods. RNA secondary structure, which plays a crucial role in multiple RNA-associated processes like RNA modification, translation and splicing, has emerged as an essential focus of relevant research. However, tools to identify genetic variants associated with allele-specific RNA secondary structures are still lacking. RESULTS Here, we develop a computational tool called 'AStruct' that enables us to detect allele-specific RNA secondary structure (ASRS) from RT-stop based structuromic probing data. AStruct shows robust performance in both simulated datasets and public icSHAPE datasets. We reveal that single nucleotide polymorphisms (SNPs) with higher AStruct scores are enriched in coding regions and tend to be functional. These SNPs are highly conservative, have the potential to disrupt sites involved in m6A modification or protein binding, and are frequently associated with disease. CONCLUSIONS AStruct is a tool dedicated to invoke allele-specific RNA secondary structure events at heterozygous SNPs in RT-stop based structuromic probing data. It utilizes allelic variants, base pairing and RT-stop information under different cell conditions to detect dynamic and functional ASRS. Compared to sequence-based tools, AStruct considers dynamic cell conditions and outperforms in detecting functional variants. AStruct is implemented in JAVA and is freely accessible at: https://github.com/canceromics/AStruct .
Collapse
Affiliation(s)
- Qingru Xu
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510060, China
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Xiaoqiong Bao
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510060, China
| | - Zhuobin Lin
- Guangdong Key Laboratory of Liver Disease Research, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Lin Tang
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510060, China
| | - Li-Na He
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510060, China
| | - Jian Ren
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510060, China
| | - Zhixiang Zuo
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510060, China.
| | - Kunhua Hu
- Guangdong Key Laboratory of Liver Disease Research, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
21
|
Arthur TD, Nguyen JP, D'Antonio-Chronowska A, Matsui H, Silva NS, Joshua IN, Luchessi AD, Greenwald WWY, D'Antonio M, Pera MF, Frazer KA. Complex regulatory networks influence pluripotent cell state transitions in human iPSCs. Nat Commun 2024; 15:1664. [PMID: 38395976 PMCID: PMC10891157 DOI: 10.1038/s41467-024-45506-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 01/26/2024] [Indexed: 02/25/2024] Open
Abstract
Stem cells exist in vitro in a spectrum of interconvertible pluripotent states. Analyzing hundreds of hiPSCs derived from different individuals, we show the proportions of these pluripotent states vary considerably across lines. We discover 13 gene network modules (GNMs) and 13 regulatory network modules (RNMs), which are highly correlated with each other suggesting that the coordinated co-accessibility of regulatory elements in the RNMs likely underlie the coordinated expression of genes in the GNMs. Epigenetic analyses reveal that regulatory networks underlying self-renewal and pluripotency are more complex than previously realized. Genetic analyses identify thousands of regulatory variants that overlapped predicted transcription factor binding sites and are associated with chromatin accessibility in the hiPSCs. We show that the master regulator of pluripotency, the NANOG-OCT4 Complex, and its associated network are significantly enriched for regulatory variants with large effects, suggesting that they play a role in the varying cellular proportions of pluripotency states between hiPSCs. Our work bins tens of thousands of regulatory elements in hiPSCs into discrete regulatory networks, shows that pluripotency and self-renewal processes have a surprising level of regulatory complexity, and suggests that genetic factors may contribute to cell state transitions in human iPSC lines.
Collapse
Affiliation(s)
- Timothy D Arthur
- Biomedical Sciences Graduate Program, University of California, San Diego, La Jolla, CA, 92093, USA
- Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Jennifer P Nguyen
- Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, 92093, USA
| | | | - Hiroko Matsui
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | - Nayara S Silva
- Northeast Biotechnology Network (RENORBIO), Graduate Program in Biotechnology, Federal University of Rio Grande do Norte, Natal, Brazil
| | - Isaac N Joshua
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | - André D Luchessi
- Northeast Biotechnology Network (RENORBIO), Graduate Program in Biotechnology, Federal University of Rio Grande do Norte, Natal, Brazil
- Department of Clinical and Toxicological Analysis, Federal University of Rio Grande do Norte, Natal, Brazil
| | - William W Young Greenwald
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Matteo D'Antonio
- Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | | | - Kelly A Frazer
- Department of Pediatrics, University of California San Diego, La Jolla, CA, 92093, USA.
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA.
| |
Collapse
|
22
|
Shook MS, Lu X, Chen X, Parameswaran S, Edsall L, Trimarchi MP, Ernst K, Granitto M, Forney C, Donmez OA, Diouf AA, VonHandorf A, Rothenberg ME, Weirauch MT, Kottyan LC. Systematic identification of genotype-dependent enhancer variants in eosinophilic esophagitis. Am J Hum Genet 2024; 111:280-294. [PMID: 38183988 PMCID: PMC10870143 DOI: 10.1016/j.ajhg.2023.12.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 12/01/2023] [Accepted: 12/05/2023] [Indexed: 01/08/2024] Open
Abstract
Eosinophilic esophagitis (EoE) is a rare atopic disorder associated with esophageal dysfunction, including difficulty swallowing, food impaction, and inflammation, that develops in a small subset of people with food allergies. Genome-wide association studies (GWASs) have identified 9 independent EoE risk loci reaching genome-wide significance (p < 5 × 10-8) and 27 additional loci of suggestive significance (5 × 10-8 < p < 1 × 10-5). In the current study, we perform linkage disequilibrium (LD) expansion of these loci to nominate a set of 531 variants that are potentially causal. To systematically interrogate the gene regulatory activity of these variants, we designed a massively parallel reporter assay (MPRA) containing the alleles of each variant within their genomic sequence context cloned into a GFP reporter library. Analysis of reporter gene expression in TE-7, HaCaT, and Jurkat cells revealed cell-type-specific gene regulation. We identify 32 allelic enhancer variants, representing 6 genome-wide significant EoE loci and 7 suggestive EoE loci, that regulate reporter gene expression in a genotype-dependent manner in at least one cellular context. By annotating these variants with expression quantitative trait loci (eQTL) and chromatin looping data in related tissues and cell types, we identify putative target genes affected by genetic variation in individuals with EoE. Transcription factor enrichment analyses reveal possible roles for cell-type-specific regulators, including GATA3. Our approach reduces the large set of EoE-associated variants to a set of 32 with allelic regulatory activity, providing functional insights into the effects of genetic variation in this disease.
Collapse
Affiliation(s)
- Molly S Shook
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Xiaoming Lu
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Xiaoting Chen
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Sreeja Parameswaran
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Lee Edsall
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Michael P Trimarchi
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Kevin Ernst
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Marissa Granitto
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Carmy Forney
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Omer A Donmez
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Arame A Diouf
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Andrew VonHandorf
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Marc E Rothenberg
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA; Division of Allergy and Immunology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Matthew T Weirauch
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA; Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA.
| | - Leah C Kottyan
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA; Division of Allergy and Immunology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA.
| |
Collapse
|
23
|
Han D, Li Y, Wang L, Liang X, Miao Y, Li W, Wang S, Wang Z. Comparative analysis of models in predicting the effects of SNPs on TF-DNA binding using large-scale in vitro and in vivo data. Brief Bioinform 2024; 25:bbae110. [PMID: 38517697 PMCID: PMC10959158 DOI: 10.1093/bib/bbae110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 02/22/2024] [Accepted: 02/26/2024] [Indexed: 03/24/2024] Open
Abstract
Non-coding variants associated with complex traits can alter the motifs of transcription factor (TF)-deoxyribonucleic acid binding. Although many computational models have been developed to predict the effects of non-coding variants on TF binding, their predictive power lacks systematic evaluation. Here we have evaluated 14 different models built on position weight matrices (PWMs), support vector machines, ordinary least squares and deep neural networks (DNNs), using large-scale in vitro (i.e. SNP-SELEX) and in vivo (i.e. allele-specific binding, ASB) TF binding data. Our results show that the accuracy of each model in predicting SNP effects in vitro significantly exceeds that achieved in vivo. For in vitro variant impact prediction, kmer/gkm-based machine learning methods (deltaSVM_HT-SELEX, QBiC-Pred) trained on in vitro datasets exhibit the best performance. For in vivo ASB variant prediction, DNN-based multitask models (DeepSEA, Sei, Enformer) trained on the ChIP-seq dataset exhibit relatively superior performance. Among the PWM-based methods, tRap demonstrates better performance in both in vitro and in vivo evaluations. In addition, we find that TF classes such as basic leucine zipper factors could be predicted more accurately, whereas those such as C2H2 zinc finger factors are predicted less accurately, aligning with the evolutionary conservation of these TF classes. We also underscore the significance of non-sequence factors such as cis-regulatory element type, TF expression, interactions and post-translational modifications in influencing the in vivo predictive performance of TFs. Our research provides valuable insights into selecting prioritization methods for non-coding variants and further optimizing such models.
Collapse
Affiliation(s)
- Dongmei Han
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai, 200031, China
| | - Yurun Li
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai, 200031, China
| | - Linxiao Wang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai, 200031, China
| | - Xuan Liang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai, 200031, China
| | - Yuanyuan Miao
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai, 200031, China
| | - Wenran Li
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai, 200031, China
| | - Sijia Wang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai, 200031, China
| | - Zhen Wang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai, 200031, China
| |
Collapse
|
24
|
Vorontsov IE, Eliseeva IA, Zinkevich A, Nikonov M, Abramov S, Boytsov A, Kamenets V, Kasianova A, Kolmykov S, Yevshin I, Favorov A, Medvedeva YA, Jolma A, Kolpakov F, Makeev V, Kulakovskiy I. HOCOMOCO in 2024: a rebuild of the curated collection of binding models for human and mouse transcription factors. Nucleic Acids Res 2024; 52:D154-D163. [PMID: 37971293 PMCID: PMC10767914 DOI: 10.1093/nar/gkad1077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 10/17/2023] [Accepted: 10/26/2023] [Indexed: 11/19/2023] Open
Abstract
We present a major update of the HOCOMOCO collection that provides DNA binding specificity patterns of 949 human transcription factors and 720 mouse orthologs. To make this release, we performed motif discovery in peak sets that originated from 14 183 ChIP-Seq experiments and reads from 2554 HT-SELEX experiments yielding more than 400 thousand candidate motifs. The candidate motifs were annotated according to their similarity to known motifs and the hierarchy of DNA-binding domains of the respective transcription factors. Next, the motifs underwent human expert curation to stratify distinct motif subtypes and remove non-informative patterns and common artifacts. Finally, the curated subset of 100 thousand motifs was supplied to the automated benchmarking to select the best-performing motifs for each transcription factor. The resulting HOCOMOCO v12 core collection contains 1443 verified position weight matrices, including distinct subtypes of DNA binding motifs for particular transcription factors. In addition to the core collection, HOCOMOCO v12 provides motif sets optimized for the recognition of binding sites in vivo and in vitro, and for annotation of regulatory sequence variants. HOCOMOCO is available at https://hocomoco12.autosome.org and https://hocomoco.autosome.org.
Collapse
Affiliation(s)
- Ilya E Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Irina A Eliseeva
- Institute of Protein Research, Russian Academy of Sciences, 142290 Pushchino, Russia
| | - Arsenii Zinkevich
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991 Moscow, Russia
| | - Mikhail Nikonov
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991 Moscow, Russia
| | - Sergey Abramov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Altius Institute for Biomedical Sciences, 98121 Seattle, WA, USA
| | - Alexandr Boytsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Altius Institute for Biomedical Sciences, 98121 Seattle, WA, USA
| | - Vasily Kamenets
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Moscow Institute of Physics and Technology, 141700 Dolgoprudny, Russia
- Institute of Biochemistry and Genetics of the Ufa Federal Research Centre of the Russian Academy of Sciences, 450054 Ufa, Russia
| | - Alexandra Kasianova
- Skolkovo Institute of Science and Technology, 121205 Moscow, Russia
- Institute for Information Transmission Problems of the Russian Academy of Sciences, 127051 Moscow, Russia
| | - Semyon Kolmykov
- Department of Computational Biology, Sirius University of Science and Technology, 354340 Sirius, Krasnodar region, Russia
| | | | - Alexander Favorov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Yulia A Medvedeva
- Research Center of Biotechnology RAS, Russian Academy of Sciences, 119071 Moscow, Russia
| | - Arttu Jolma
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Fedor Kolpakov
- Department of Computational Biology, Sirius University of Science and Technology, 354340 Sirius, Krasnodar region, Russia
- Bioinformatics Laboratory, Federal Research Center for Information and Computational Technologies, 630090 Novosibirsk, Russia
| | - Vsevolod J Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Moscow Institute of Physics and Technology, 141700 Dolgoprudny, Russia
- Institute of Biochemistry and Genetics of the Ufa Federal Research Centre of the Russian Academy of Sciences, 450054 Ufa, Russia
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Institute of Protein Research, Russian Academy of Sciences, 142290 Pushchino, Russia
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, 420008 Kazan, Russia
| |
Collapse
|
25
|
Jonker T, Barnett P, Boink GJJ, Christoffels VM. Role of Genetic Variation in Transcriptional Regulatory Elements in Heart Rhythm. Cells 2023; 13:4. [PMID: 38201209 PMCID: PMC10777909 DOI: 10.3390/cells13010004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 12/08/2023] [Accepted: 12/11/2023] [Indexed: 01/12/2024] Open
Abstract
Genetic predisposition to cardiac arrhythmias has been a field of intense investigation. Research initially focused on rare hereditary arrhythmias, but over the last two decades, the role of genetic variation (single nucleotide polymorphisms) in heart rate, rhythm, and arrhythmias has been taken into consideration as well. In particular, genome-wide association studies have identified hundreds of genomic loci associated with quantitative electrocardiographic traits, atrial fibrillation, and less common arrhythmias such as Brugada syndrome. A significant number of associated variants have been found to systematically localize in non-coding regulatory elements that control the tissue-specific and temporal transcription of genes encoding transcription factors, ion channels, and other proteins. However, the identification of causal variants and the mechanism underlying their impact on phenotype has proven difficult due to the complex tissue-specific, time-resolved, condition-dependent, and combinatorial function of regulatory elements, as well as their modest conservation across different model species. In this review, we discuss research efforts aimed at identifying and characterizing-trait-associated variant regulatory elements and the molecular mechanisms underlying their impact on heart rate or rhythm.
Collapse
Affiliation(s)
- Timo Jonker
- Department of Medical Biology, Amsterdam Cardiovascular Sciences, Amsterdam University Medical Centers, 1105 AZ Amsterdam, The Netherlands; (T.J.); (P.B.); (G.J.J.B.)
| | - Phil Barnett
- Department of Medical Biology, Amsterdam Cardiovascular Sciences, Amsterdam University Medical Centers, 1105 AZ Amsterdam, The Netherlands; (T.J.); (P.B.); (G.J.J.B.)
| | - Gerard J. J. Boink
- Department of Medical Biology, Amsterdam Cardiovascular Sciences, Amsterdam University Medical Centers, 1105 AZ Amsterdam, The Netherlands; (T.J.); (P.B.); (G.J.J.B.)
- Department of Cardiology, Amsterdam Cardiovascular Sciences, Amsterdam University Medical Centers, 1105 AZ Amsterdam, The Netherlands
| | - Vincent M. Christoffels
- Department of Medical Biology, Amsterdam Cardiovascular Sciences, Amsterdam University Medical Centers, 1105 AZ Amsterdam, The Netherlands; (T.J.); (P.B.); (G.J.J.B.)
| |
Collapse
|
26
|
Li X, Melo LAN, Bussemaker HJ. Benchmarking DNA binding affinity models using allele-specific transcription factor binding data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.15.571887. [PMID: 38168434 PMCID: PMC10760129 DOI: 10.1101/2023.12.15.571887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity can manifest itself in vivo at heterozygous loci as a difference in TF occupancy between the two alleles. When applied on a genomic scale, functional genomic assays such as ChIP-seq typically lack the statistical power to detect allele-specific binding (ASB) at the level of individual variants. To address this, we propose a framework for benchmarking sequence-to-affinity models for TF binding in terms of their ability to predict allelic imbalances in ChIP-seq counts. We show that a likelihood function based on an over-dispersed binomial distribution can aggregate evidence for allelic preference across the genome without requiring statistical significance for individual variants. This allows us to systematically compare predictive performance when multiple binding models for the same TF are available. We introduce PyProBound, an easily extensible reimplementation of the ProBound biophysically interpretable machine learning framework. Configuring PyProBound to explicitly account for a confounding sequence-specific bias in DNA fragmentation rate yields improved TF binding models when training on ChIP-seq data. We also show how our likelihood function can be leveraged to perform de novo motif discovery on the raw allele-aware ChIP-seq counts.
Collapse
Affiliation(s)
- Xiaoting Li
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Lucas A. N. Melo
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Harmen J. Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| |
Collapse
|
27
|
Li Y, Zhang XO, Liu Y, Lu A. Allele-specific binding (ASB) analyzer for annotation of allele-specific binding SNPs. BMC Bioinformatics 2023; 24:464. [PMID: 38066439 PMCID: PMC10709849 DOI: 10.1186/s12859-023-05604-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 12/05/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Allele-specific binding (ASB) events occur when transcription factors (TFs) bind more favorably to one of the two parental alleles at heterozygous single nucleotide polymorphisms (SNPs). Evidence suggests that ASB events could reveal the impact of sequence variations on TF binding and may have implications for the risk of diseases. RESULTS Here we present ASB-analyzer, a software platform that enables the users to quickly and efficiently input raw sequencing data to generate individual reports containing the cytogenetic map of ASB SNPs and their associated phenotypes. This interactive tool thereby combines ASB SNP identification, biological annotation, motif analysis, phenotype associations and report summary in one pipeline. With this pipeline, we identified 3772 ASB SNPs from thirty GM12878 ChIP-seq datasets and demonstrated that the ASB SNPs were more likely to be enriched at important sites in TF-binding domains. CONCLUSIONS ASB-analyzer is a user-friendly tool that enables the detection, characterization and visualization of ASB SNPs. It is implemented in Python, R and bash shell and packaged in the Conda environment. It is available as an open-source tool on GitHub at https://github.com/Liying1996/ASBanalyzer .
Collapse
Affiliation(s)
- Ying Li
- Research Center for Translational Medicine at East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, 200010, China
| | - Xiao-Ou Zhang
- Research Center for Translational Medicine at East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, 200010, China
| | - Yan Liu
- Research Center for Translational Medicine at East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, 200010, China
| | - Aiping Lu
- Research Center for Translational Medicine at East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, 200010, China.
| |
Collapse
|
28
|
Shi W, Chen M, Pan T, Chen M, Cheng Y, Hao Y, Chen S, Tang Y. Integration of risk variants from GWAS with SARS-CoV-2 RNA interactome prioritizes FUBP1 and RAB2A as risk genes for COVID-19. Sci Rep 2023; 13:19194. [PMID: 37932299 PMCID: PMC10628159 DOI: 10.1038/s41598-023-44705-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 10/11/2023] [Indexed: 11/08/2023] Open
Abstract
The role of host genetic factors in COVID-19 outcomes remains unclear despite various genome-wide association studies (GWAS). We annotate all significant variants and those variants in high LD (R2 > 0.8) from the COVID-19 host genetics initiative (HGI) and identify risk genes by recognizing genes intolerant nonsynonymous mutations in coding regions and genes associated with cis-expression quantitative trait loci (cis-eQTL) in non-coding regions. These genes are enriched in the immune response pathway and viral life cycle. It has been found that host RNA binding proteins (RBPs) participate in different phases of the SARS-CoV-2 life cycle. We collect 503 RBPs that interact with SARS-CoV-2 RNA concluded from in vitro studies. Combining risk genes from the HGI with RBPs, we identify two COVID-19 risk loci that regulate the expression levels of FUBP1 and RAB2A in the lung. Due to the risk allele, COVID-19 patients show downregulation of FUBP1 and upregulation of RAB2A. Using single-cell RNA sequencing data, we show that FUBP1 and RAB2A are expressed in SARS-CoV-2-infected upper respiratory tract epithelial cells. We further identify NC_000001.11:g.77984833C>A and NC_000008.11:g.60559280T>C as functional variants by surveying allele-specific transcription factor sites and cis-regulatory elements and performing motif analysis. To sum up, our research, which associates human genetics with expression levels of RBPs, identifies FUBP1 and RAB2A as two risk genes for COVID-19 and reveals the anti-viral role of FUBP1 and the pro-viral role of RAB2A in the infection of SARS-CoV-2.
Collapse
Affiliation(s)
- Weiwen Shi
- Shanghai Institute of Rheumatology/Department of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Mengke Chen
- Shanghai Institute of Rheumatology/Department of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Tingting Pan
- Shanghai Institute of Rheumatology/Department of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Mengjie Chen
- Department of Rheumatology, the First People's Hospital of Wenling, Taizhou, China
| | - Yongjun Cheng
- Department of Rheumatology, the First People's Hospital of Wenling, Taizhou, China
| | - Yimei Hao
- Key Laboratory of Tissue Microenvironment and Tumor, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences (CAS), Shanghai, China
| | - Sheng Chen
- Shanghai Institute of Rheumatology/Department of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yuanjia Tang
- Shanghai Institute of Rheumatology/Department of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
- State Key Laboratory of Oncogenes and Related Genes, Shanghai Cancer Institute, Renji Hospital, Shanghai, China.
| |
Collapse
|
29
|
Patruno L, Milite S, Bergamin R, Calonaci N, D’Onofrio A, Anselmi F, Antoniotti M, Graudenzi A, Caravagna G. A Bayesian method to infer copy number clones from single-cell RNA and ATAC sequencing. PLoS Comput Biol 2023; 19:e1011557. [PMID: 37917660 PMCID: PMC10645363 DOI: 10.1371/journal.pcbi.1011557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 11/14/2023] [Accepted: 09/30/2023] [Indexed: 11/04/2023] Open
Abstract
Single-cell RNA and ATAC sequencing technologies enable the examination of gene expression and chromatin accessibility in individual cells, providing insights into cellular phenotypes. In cancer research, it is important to consistently analyze these states within an evolutionary context on genetic clones. Here we present CONGAS+, a Bayesian model to map single-cell RNA and ATAC profiles onto the latent space of copy number clones. CONGAS+ clusters cells into tumour subclones with similar ploidy, rendering straightforward to compare their expression and chromatin profiles. The framework, implemented on GPU and tested on real and simulated data, scales to analyse seamlessly thousands of cells, demonstrating better performance than single-molecule models, and supporting new multi-omics assays. In prostate cancer, lymphoma and basal cell carcinoma, CONGAS+ successfully identifies complex subclonal architectures while providing a coherent mapping between ATAC and RNA, facilitating the study of genotype-phenotype maps and their connection to genomic instability.
Collapse
Affiliation(s)
- Lucrezia Patruno
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
| | - Salvatore Milite
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
- Centre for Computational Biology, Human Technopole, Milan, Italy
| | - Riccardo Bergamin
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
| | - Nicola Calonaci
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
| | - Alberto D’Onofrio
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
| | - Fabio Anselmi
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
| | - Marco Antoniotti
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
- B4—Bicocca Bioinformatics Biostatistics and Bioimaging Centre, Università degli Studi di Milano-Bicocca, Milan, Italy
| | - Alex Graudenzi
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
- B4—Bicocca Bioinformatics Biostatistics and Bioimaging Centre, Università degli Studi di Milano-Bicocca, Milan, Italy
| | - Giulio Caravagna
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
| |
Collapse
|
30
|
Moyers BA, Loupe JM, Felker SA, Lawlor JM, Anderson AG, Rodriguez-Nunez I, Bunney WE, Bunney BG, Cartagena PM, Sequeira A, Watson SJ, Akil H, Mendenhall EM, Cooper GM, Myers RM. Allele biased transcription factor binding across human brain regions gives mechanistic insight into eQTLs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.06.561245. [PMID: 37873117 PMCID: PMC10592666 DOI: 10.1101/2023.10.06.561245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Transcription Factors (TFs) influence gene expression by facilitating or disrupting the formation of transcription initiation machinery at particular genomic loci. Because genomic localization of TFs is in part driven by TF recognition of DNA sequence, variation in TF binding sites can disrupt TF-DNA associations and affect gene regulation. To identify variants that impact TF binding in human brain tissues, we quantified allele bias for 93 TFs analyzed with ChIP-seq experiments of multiple structural brain regions from two donors. Using graph genomes constructed from phased genomic sequence data, we compared ChIP-seq signal between alleles at heterozygous variants within each tissue sample from each donor. Comparison of results from different brain regions within donors and the same regions between donors provided measures of allele bias reproducibility. We identified thousands of DNA variants that show reproducible bias in ChIP-seq for at least one TF. We found that alleles that are rarer in the general population were more likely than common alleles to exhibit large biases, and more frequently led to reduced TF binding. Combining ChIP-seq with RNA-seq, we identified TF-allele interaction biases with RNA bias in a phased allele linked to 6,709 eQTL variants identified in GTEx data, 3,309 of which were found in neural contexts. Our results provide insights into the effects of both common and rare variation on gene regulation in the brain. These findings can facilitate mechanistic understanding of cis-regulatory variation associated with biological traits, including disease.
Collapse
Affiliation(s)
| | - Jacob M. Loupe
- HudsonAlpha Institute for Biotechnology, Huntsville AL, USA
| | | | | | | | | | - William E. Bunney
- Department of Psychiatry and Human Behavior, University of California, Irvine CA, USA
| | - Blynn G. Bunney
- Department of Psychiatry and Human Behavior, University of California, Irvine CA, USA
| | - Preston M. Cartagena
- Department of Psychiatry and Human Behavior, University of California, Irvine CA, USA
| | - Adolfo Sequeira
- Department of Psychiatry and Human Behavior, University of California, Irvine CA, USA
| | - Stanley J. Watson
- The Michigan Neuroscience Institute, University of Michigan, Ann Arbor MI, USA
| | - Huda Akil
- The Michigan Neuroscience Institute, University of Michigan, Ann Arbor MI, USA
| | | | | | | |
Collapse
|
31
|
Tahara S, Tsuchiya T, Matsumoto H, Ozaki H. Transcription factor-binding k-mer analysis clarifies the cell type dependency of binding specificities and cis-regulatory SNPs in humans. BMC Genomics 2023; 24:597. [PMID: 37805453 PMCID: PMC10560430 DOI: 10.1186/s12864-023-09692-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 09/21/2023] [Indexed: 10/09/2023] Open
Abstract
BACKGROUND Transcription factors (TFs) exhibit heterogeneous DNA-binding specificities in individual cells and whole organisms under natural conditions, and de novo motif discovery usually provides multiple motifs, even from a single chromatin immunoprecipitation-sequencing (ChIP-seq) sample. Despite the accumulation of ChIP-seq data and ChIP-seq-derived motifs, the diversity of DNA-binding specificities across different TFs and cell types remains largely unexplored. RESULTS Here, we applied MOCCS2, our k-mer-based motif discovery method, to a collection of human TF ChIP-seq samples across diverse TFs and cell types, and systematically computed profiles of TF-binding specificity scores for all k-mers. After quality control, we compiled a set of TF-binding specificity score profiles for 2,976 high-quality ChIP-seq samples, comprising 473 TFs and 398 cell types. Using these high-quality samples, we confirmed that the k-mer-based TF-binding specificity profiles reflected TF- or TF-family dependent DNA-binding specificities. We then compared the binding specificity scores of ChIP-seq samples with the same TFs but with different cell type classes and found that half of the analyzed TFs exhibited differences in DNA-binding specificities across cell type classes. Additionally, we devised a method to detect differentially bound k-mers between two ChIP-seq samples and detected k-mers exhibiting statistically significant differences in binding specificity scores. Moreover, we demonstrated that differences in the binding specificity scores between k-mers on the reference and alternative alleles could be used to predict the effect of variants on TF binding, as validated by in vitro and in vivo assay datasets. Finally, we demonstrated that binding specificity score differences can be used to interpret disease-associated non-coding single-nucleotide polymorphisms (SNPs) as TF-affecting SNPs and provide candidates responsible for TFs and cell types. CONCLUSIONS Our study provides a basis for investigating the regulation of gene expression in a TF-, TF family-, or cell-type-dependent manner. Furthermore, our differential analysis of binding-specificity scores highlights noncoding disease-associated variants in humans.
Collapse
Affiliation(s)
- Saeko Tahara
- Bioinformatics Laboratory, Institute of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan
- School of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan
| | - Takaho Tsuchiya
- Bioinformatics Laboratory, Institute of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan
- Center for Artificial Intelligence Research, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan
| | - Hirotaka Matsumoto
- School of Information and Data Sciences, Nagasaki University, 1-14, Bunkyo-Machi, Nagasaki City, Nagasaki, 852-8521, Japan
- Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics, Wako, Saitama, 351-0198, Japan
| | - Haruka Ozaki
- Bioinformatics Laboratory, Institute of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan.
- Center for Artificial Intelligence Research, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan.
- Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics, Wako, Saitama, 351-0198, Japan.
| |
Collapse
|
32
|
Prowse-Wilkins CP, Wang J, Garner JB, Goddard ME, Chamberlain AJ. Allele specific binding of histone modifications and a transcription factor does not predict allele specific expression in correlated ChIP-seq peak-exon pairs. Sci Rep 2023; 13:15596. [PMID: 37730913 PMCID: PMC10511416 DOI: 10.1038/s41598-023-42637-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Accepted: 09/13/2023] [Indexed: 09/22/2023] Open
Abstract
Allele specific expression (ASE) is widespread in many species including cows. Therefore, regulatory regions which control gene expression should show cis-regulatory variation which mirrors this differential expression within the animal. ChIP-seq peaks for histone modifications and transcription factors measure activity at functional regions and the height of some peaks have been shown to correlate across tissues with the expression of particular genes, suggesting these peaks are putative regulatory regions. In this study we identified ASE in the bovine genome in multiple tissues and investigated whether ChIP-seq peaks for four histone modifications and the transcription factor CTCF show allele specific binding (ASB) differences in the same tissues. We then investigate whether peak height and gene expression, which correlates across tissues, also correlates within the animal by investigating whether the direction of ASB in putative regulatory regions, mirrors that of the ASE in the genes they are putatively regulating. We found that ASE and ASB were widespread in the bovine genome but vary in extent between tissues. However, even when the height of a peak was positively correlated across tissues with expression of an exon, ASE of the exon and ASB of the peak were in the same direction only half the time. A likely explanation for this finding is that the correlations between peak height and exon expression do not indicate that the height of the peak causes the extent of exon expression, at least in some cases.
Collapse
Affiliation(s)
- Claire P Prowse-Wilkins
- Faculty of Veterinary & Agricultural Science, The University of Melbourne, Parkville, VIC, 3010, Australia.
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia.
| | - Jianghui Wang
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia
| | - Josie B Garner
- Agriculture Victoria, Ellinbank Dairy Centre, Ellinbank, VIC, 3821, Australia
| | - Michael E Goddard
- Faculty of Veterinary & Agricultural Science, The University of Melbourne, Parkville, VIC, 3010, Australia
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia
| | - Amanda J Chamberlain
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia
| |
Collapse
|
33
|
Arthur TD, Nguyen JP, D'Antonio-Chronowska A, Matsui H, Silva NS, Joshua IN, Luchessi AD, Young Greenwald WW, D'Antonio M, Pera MF, Frazer KA. Analysis of regulatory network modules in hundreds of human stem cell lines reveals complex epigenetic and genetic factors contribute to pluripotency state differences between subpopulations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.20.541447. [PMID: 37292794 PMCID: PMC10245835 DOI: 10.1101/2023.05.20.541447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Stem cells exist in vitro in a spectrum of interconvertible pluripotent states. Analyzing hundreds of hiPSCs derived from different individuals, we show the proportions of these pluripotent states vary considerably across lines. We discovered 13 gene network modules (GNMs) and 13 regulatory network modules (RNMs), which were highly correlated with each other suggesting that the coordinated co-accessibility of regulatory elements in the RNMs likely underlied the coordinated expression of genes in the GNMs. Epigenetic analyses revealed that regulatory networks underlying self-renewal and pluripotency have a surprising level of complexity. Genetic analyses identified thousands of regulatory variants that overlapped predicted transcription factor binding sites and were associated with chromatin accessibility in the hiPSCs. We show that the master regulator of pluripotency, the NANOG-OCT4 Complex, and its associated network were significantly enriched for regulatory variants with large effects, suggesting that they may play a role in the varying cellular proportions of pluripotency states between hiPSCs. Our work captures the coordinated activity of tens of thousands of regulatory elements in hiPSCs and bins these elements into discrete functionally characterized regulatory networks, shows that regulatory elements in pluripotency networks harbor variants with large effects, and provides a rich resource for future pluripotent stem cell research.
Collapse
|
34
|
Uvarova AN, Stasevich EM, Ustiugova AS, Mitkin NA, Zheremyan EA, Sheetikov SA, Zornikova KV, Bogolyubova AV, Rubtsov MA, Kulakovskiy IV, Kuprash DV, Korneev KV, Schwartz AM. rs71327024 Associated with COVID-19 Hospitalization Reduces CXCR6 Promoter Activity in Human CD4 + T Cells via Disruption of c-Myb Binding. Int J Mol Sci 2023; 24:13790. [PMID: 37762093 PMCID: PMC10530726 DOI: 10.3390/ijms241813790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 08/11/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023] Open
Abstract
Single-nucleotide polymorphism rs71327024 located in the human 3p21.31 locus has been associated with an elevated risk of hospitalization upon SARS-CoV-2 infection. The 3p21.31 locus contains several genes encoding chemokine receptors potentially relevant to severe COVID-19. In particular, CXCR6, which is prominently expressed in T lymphocytes, NK, and NKT cells, has been shown to be involved in the recruitment of immune cells to non-lymphoid organs in chronic inflammatory and respiratory diseases. In COVID-19, CXCR6 expression is reduced in lung resident memory T cells from patients with severe disease as compared to the control cohort with moderate symptoms. We demonstrate here that rs71327024 is located within an active enhancer that augments the activity of the CXCR6 promoter in human CD4+ T lymphocytes. The common rs71327024(G) variant makes a functional binding site for the c-Myb transcription factor, while the risk rs71327024(T) variant disrupts c-Myb binding and reduces the enhancer activity. Concordantly, c-Myb knockdown in PMA-treated Jurkat cells negates rs71327024's allele-specific effect on CXCR6 promoter activity. We conclude that a disrupted c-Myb binding site may decrease CXCR6 expression in T helper cells of individuals carrying the minor rs71327024(T) allele and thus may promote the progression of severe COVID-19 and other inflammatory pathologies.
Collapse
Affiliation(s)
- Aksinya N. Uvarova
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia; (E.M.S.); (A.S.U.); (N.A.M.); (E.A.Z.); (D.V.K.)
- Faculty of Biology, Lomonosov Moscow State University, 119234 Moscow, Russia; (S.A.S.); (K.V.Z.); (M.A.R.)
| | - Ekaterina M. Stasevich
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia; (E.M.S.); (A.S.U.); (N.A.M.); (E.A.Z.); (D.V.K.)
| | - Alina S. Ustiugova
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia; (E.M.S.); (A.S.U.); (N.A.M.); (E.A.Z.); (D.V.K.)
| | - Nikita A. Mitkin
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia; (E.M.S.); (A.S.U.); (N.A.M.); (E.A.Z.); (D.V.K.)
| | - Elina A. Zheremyan
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia; (E.M.S.); (A.S.U.); (N.A.M.); (E.A.Z.); (D.V.K.)
- Faculty of Biology, Lomonosov Moscow State University, 119234 Moscow, Russia; (S.A.S.); (K.V.Z.); (M.A.R.)
| | - Savely A. Sheetikov
- Faculty of Biology, Lomonosov Moscow State University, 119234 Moscow, Russia; (S.A.S.); (K.V.Z.); (M.A.R.)
- National Research Center for Hematology, 125167 Moscow, Russia;
| | - Ksenia V. Zornikova
- Faculty of Biology, Lomonosov Moscow State University, 119234 Moscow, Russia; (S.A.S.); (K.V.Z.); (M.A.R.)
- National Research Center for Hematology, 125167 Moscow, Russia;
| | | | - Mikhail A. Rubtsov
- Faculty of Biology, Lomonosov Moscow State University, 119234 Moscow, Russia; (S.A.S.); (K.V.Z.); (M.A.R.)
| | | | - Dmitry V. Kuprash
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia; (E.M.S.); (A.S.U.); (N.A.M.); (E.A.Z.); (D.V.K.)
- Faculty of Biology, Lomonosov Moscow State University, 119234 Moscow, Russia; (S.A.S.); (K.V.Z.); (M.A.R.)
| | - Kirill V. Korneev
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia; (E.M.S.); (A.S.U.); (N.A.M.); (E.A.Z.); (D.V.K.)
- National Research Center for Hematology, 125167 Moscow, Russia;
| | - Anton M. Schwartz
- Department of Human Biology, Faculty of Natural Sciences, University of Haifa, 199 Abba Khoushy Avenue, Mount Carmel, Haifa 3498838, Israel;
| |
Collapse
|
35
|
Bhimsaria D, Rodríguez-Martínez JA, Mendez-Johnson JL, Ghoshdastidar D, Varadarajan A, Bansal M, Daniels DL, Ramanathan P, Ansari AZ. Hidden modes of DNA binding by human nuclear receptors. Nat Commun 2023; 14:4179. [PMID: 37443151 PMCID: PMC10345098 DOI: 10.1038/s41467-023-39577-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 06/19/2023] [Indexed: 07/15/2023] Open
Abstract
Human nuclear receptors (NRs) are a superfamily of ligand-responsive transcription factors that have central roles in cellular function. Their malfunction is linked to numerous diseases, and the ability to modulate their activity with synthetic ligands has yielded 16% of all FDA-approved drugs. NRs regulate distinct gene networks, however they often function from genomic sites that lack known binding motifs. Here, to annotate genomic binding sites of known and unexamined NRs more accurately, we use high-throughput SELEX to comprehensively map DNA binding site preferences of all full-length human NRs, in complex with their ligands. Furthermore, to identify non-obvious binding sites buried in DNA-protein interactomes, we develop MinSeq Find, a search algorithm based on the MinTerm concept from electrical engineering and digital systems design. The resulting MinTerm sequence set (MinSeqs) reveal a constellation of binding sites that more effectively annotate NR-binding profiles in cells. MinSeqs also unmask binding sites created or disrupted by 52,106 single-nucleotide polymorphisms associated with human diseases. By implicating druggable NRs as hidden drivers of multiple human diseases, our results not only reveal new biological roles of NRs, but they also provide a resource for drug-repurposing and precision medicine.
Collapse
Affiliation(s)
- Devesh Bhimsaria
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee, 247667, India.
| | | | | | | | - Ashwin Varadarajan
- Department of Electrical and Computer Engineering, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Manju Bansal
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India
| | - Danette L Daniels
- Promega Corporation, Madison, WI, 53711, USA
- Foghorn Therapeutics, Cambridge, MA, 02139, USA
| | - Parameswaran Ramanathan
- Department of Electrical and Computer Engineering, University of Wisconsin-Madison, Madison, WI, 53706, USA.
| | - Aseem Z Ansari
- Department of Chemical Biology and Therapeutics, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.
| |
Collapse
|
36
|
Morris JA, Caragine C, Daniloski Z, Domingo J, Barry T, Lu L, Davis K, Ziosi M, Glinos DA, Hao S, Mimitou EP, Smibert P, Roeder K, Katsevich E, Lappalainen T, Sanjana NE. Discovery of target genes and pathways at GWAS loci by pooled single-cell CRISPR screens. Science 2023; 380:eadh7699. [PMID: 37141313 PMCID: PMC10518238 DOI: 10.1126/science.adh7699] [Citation(s) in RCA: 68] [Impact Index Per Article: 68.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 04/20/2023] [Indexed: 05/06/2023]
Abstract
Most variants associated with complex traits and diseases identified by genome-wide association studies (GWAS) map to noncoding regions of the genome with unknown effects. Using ancestrally diverse, biobank-scale GWAS data, massively parallel CRISPR screens, and single-cell transcriptomic and proteomic sequencing, we discovered 124 cis-target genes of 91 noncoding blood trait GWAS loci. Using precise variant insertion through base editing, we connected specific variants with gene expression changes. We also identified trans-effect networks of noncoding loci when cis target genes encoded transcription factors or microRNAs. Networks were themselves enriched for GWAS variants and demonstrated polygenic contributions to complex traits. This platform enables massively parallel characterization of the target genes and mechanisms of human noncoding variants in both cis and trans.
Collapse
Affiliation(s)
- John A. Morris
- New York Genome Center, New York, NY, 10013, USA
- Department of Biology, New York University, New York, NY, 10003, USA
| | | | - Zharko Daniloski
- New York Genome Center, New York, NY, 10013, USA
- Department of Biology, New York University, New York, NY, 10003, USA
| | | | - Timothy Barry
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Lu Lu
- New York Genome Center, New York, NY, 10013, USA
| | - Kyrie Davis
- New York Genome Center, New York, NY, 10013, USA
| | | | | | - Stephanie Hao
- Technology Innovation Lab, New York Genome Center, New York, NY, 10013, USA
| | - Eleni P. Mimitou
- Technology Innovation Lab, New York Genome Center, New York, NY, 10013, USA
| | - Peter Smibert
- Technology Innovation Lab, New York Genome Center, New York, NY, 10013, USA
| | - Kathryn Roeder
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Eugene Katsevich
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Tuuli Lappalainen
- New York Genome Center, New York, NY, 10013, USA
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, 171 65 Solna, Stockholm, Sweden
| | - Neville E. Sanjana
- New York Genome Center, New York, NY, 10013, USA
- Department of Biology, New York University, New York, NY, 10003, USA
| |
Collapse
|
37
|
Bogomolov A, Filonov S, Chadaeva I, Rasskazov D, Khandaev B, Zolotareva K, Kazachek A, Oshchepkov D, Ivanisenko VA, Demenkov P, Podkolodnyy N, Kondratyuk E, Ponomarenko P, Podkolodnaya O, Mustafin Z, Savinkova L, Kolchanov N, Tverdokhleb N, Ponomarenko M. Candidate SNP Markers Significantly Altering the Affinity of TATA-Binding Protein for the Promoters of Human Hub Genes for Atherogenesis, Atherosclerosis and Atheroprotection. Int J Mol Sci 2023; 24:ijms24109010. [PMID: 37240358 DOI: 10.3390/ijms24109010] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 05/13/2023] [Accepted: 05/17/2023] [Indexed: 05/28/2023] Open
Abstract
Atherosclerosis is a systemic disease in which focal lesions in arteries promote the build-up of lipoproteins and cholesterol they are transporting. The development of atheroma (atherogenesis) narrows blood vessels, reduces the blood supply and leads to cardiovascular diseases. According to the World Health Organization (WHO), cardiovascular diseases are the leading cause of death, which has been especially boosted since the COVID-19 pandemic. There is a variety of contributors to atherosclerosis, including lifestyle factors and genetic predisposition. Antioxidant diets and recreational exercises act as atheroprotectors and can retard atherogenesis. The search for molecular markers of atherogenesis and atheroprotection for predictive, preventive and personalized medicine appears to be the most promising direction for the study of atherosclerosis. In this work, we have analyzed 1068 human genes associated with atherogenesis, atherosclerosis and atheroprotection. The hub genes regulating these processes have been found to be the most ancient. In silico analysis of all 5112 SNPs in their promoters has revealed 330 candidate SNP markers, which statistically significantly change the affinity of the TATA-binding protein (TBP) for these promoters. These molecular markers have made us confident that natural selection acts against underexpression of the hub genes for atherogenesis, atherosclerosis and atheroprotection. At the same time, upregulation of the one for atheroprotection promotes human health.
Collapse
Affiliation(s)
- Anton Bogomolov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Sergey Filonov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
- The Natural Sciences Department, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Irina Chadaeva
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Dmitry Rasskazov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Bato Khandaev
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
- The Natural Sciences Department, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Karina Zolotareva
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
- The Natural Sciences Department, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Anna Kazachek
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
- The Natural Sciences Department, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Dmitry Oshchepkov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Vladimir A Ivanisenko
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Pavel Demenkov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Nikolay Podkolodnyy
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
- Institute of Computational Mathematics and Mathematical Geophysics, Novosibirsk 630090, Russia
| | - Ekaterina Kondratyuk
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Petr Ponomarenko
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Olga Podkolodnaya
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Zakhar Mustafin
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Ludmila Savinkova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Nikolay Kolchanov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Natalya Tverdokhleb
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Mikhail Ponomarenko
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| |
Collapse
|
38
|
Mourad R. Semi-supervised learning improves regulatory sequence prediction with unlabeled sequences. BMC Bioinformatics 2023; 24:186. [PMID: 37147561 PMCID: PMC10163727 DOI: 10.1186/s12859-023-05303-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 04/25/2023] [Indexed: 05/07/2023] Open
Abstract
MOTIVATION Genome-wide association studies have systematically identified thousands of single nucleotide polymorphisms (SNPs) associated with complex genetic diseases. However, the majority of those SNPs were found in non-coding genomic regions, preventing the understanding of the underlying causal mechanism. Predicting molecular processes based on the DNA sequence represents a promising approach to understand the role of those non-coding SNPs. Over the past years, deep learning was successfully applied to regulatory sequence prediction using supervised learning. Supervised learning required DNA sequences associated with functional data for training, whose amount is strongly limited by the finite size of the human genome. Conversely, the amount of mammalian DNA sequences is exponentially increasing due to ongoing large sequencing projects, but without functional data in most cases. RESULTS To alleviate the limitations of supervised learning, we propose a paradigm shift with semi-supervised learning, which does not only exploit labeled sequences (e.g. human genome with ChIP-seq experiment), but also unlabeled sequences available in much larger amounts (e.g. from other species without ChIP-seq experiment, such as chimpanzee). Our approach is flexible and can be plugged into any neural architecture including shallow and deep networks, and shows strong predictive performance improvements compared to supervised learning in most cases (up to [Formula: see text]). AVAILABILITY AND IMPLEMENTATION https://forgemia.inra.fr/raphael.mourad/deepgnn .
Collapse
Affiliation(s)
- Raphaël Mourad
- MIAT, INRAE, 31320, Castanet-Tolosan, France.
- University of Toulouse, UPS, 31062, Toulouse, France.
| |
Collapse
|
39
|
Karollus A, Mauermeier T, Gagneur J. Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers. Genome Biol 2023; 24:56. [PMID: 36973806 PMCID: PMC10045630 DOI: 10.1186/s13059-023-02899-9] [Citation(s) in RCA: 38] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 03/16/2023] [Indexed: 03/29/2023] Open
Abstract
BACKGROUND The largest sequence-based models of transcription control to date are obtained by predicting genome-wide gene regulatory assays across the human genome. This setting is fundamentally correlative, as those models are exposed during training solely to the sequence variation between human genes that arose through evolution, questioning the extent to which those models capture genuine causal signals. RESULTS Here we confront predictions of state-of-the-art models of transcription regulation against data from two large-scale observational studies and five deep perturbation assays. The most advanced of these sequence-based models, Enformer, by and large, captures causal determinants of human promoters. However, models fail to capture the causal effects of enhancers on expression, notably in medium to long distances and particularly for highly expressed promoters. More generally, the predicted impact of distal elements on gene expression predictions is small and the ability to correctly integrate long-range information is significantly more limited than the receptive fields of the models suggest. This is likely caused by the escalating class imbalance between actual and candidate regulatory elements as distance increases. CONCLUSIONS Our results suggest that sequence-based models have advanced to the point that in silico study of promoter regions and promoter variants can provide meaningful insights and we provide practical guidance on how to use them. Moreover, we foresee that it will require significantly more and particularly new kinds of data to train models accurately accounting for distal elements.
Collapse
Affiliation(s)
- Alexander Karollus
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
| | - Thomas Mauermeier
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany.
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany.
- Munich Data Science Institute, Technical University of Munich, Garching, Germany.
| |
Collapse
|
40
|
Zhang B, Zhang Z, Koeken VA, Kumar S, Aillaud M, Tsay HC, Liu Z, Kraft AR, Soon CF, Odak I, Bošnjak B, Vlot A, Swertz MA, Ohler U, Geffers R, Illig T, Huehn J, Saliba AE, Sander LE, Förster R, Xu CJ, Cornberg M, Schulte LN, Li Y. Altered and allele-specific open chromatin landscape reveals epigenetic and genetic regulators of innate immunity in COVID-19. CELL GENOMICS 2023; 3:100232. [PMID: 36474914 PMCID: PMC9715265 DOI: 10.1016/j.xgen.2022.100232] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 10/21/2022] [Accepted: 11/17/2022] [Indexed: 12/05/2022]
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection causes severe COVID-19 in some patients and mild COVID-19 in others. Dysfunctional innate immune responses have been identified to contribute to COVID-19 severity, but the key regulators are still unknown. Here, we present an integrative single-cell multi-omics analysis of peripheral blood mononuclear cells from hospitalized and convalescent COVID-19 patients. In classical monocytes, we identified genes that were potentially regulated by differential chromatin accessibility. Then, sub-clustering and motif-enrichment analyses revealed disease condition-specific regulation by transcription factors and their targets, including an interaction between C/EBPs and a long-noncoding RNA LUCAT1, which we validated through loss-of-function experiments. Finally, we investigated genetic risk variants that exhibit allele-specific open chromatin (ASoC) in COVID-19 patients and identified a SNP rs6800484-C, which is associated with lower expression of CCR2 and may contribute to higher viral loads and higher risk of COVID-19 hospitalization. Altogether, our study highlights the diverse genetic and epigenetic regulators that contribute to COVID-19.
Collapse
Affiliation(s)
- Bowen Zhang
- Department of Computational Biology for Individualised Infection Medicine, Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
- TWINCORE, a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
- Beijing Normal University, College of Life Sciences, Beijing, China
| | - Zhenhua Zhang
- Department of Computational Biology for Individualised Infection Medicine, Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
- TWINCORE, a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
- Genomics Coordination Center, University of Groningen and University Medical Center Groningen, Groningen, the Netherlands
- Department of Genetics, University of Groningen and University Medical Center Groningen, Groningen, the Netherlands
| | - Valerie A.C.M. Koeken
- Department of Computational Biology for Individualised Infection Medicine, Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
- TWINCORE, a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
- Department of Internal Medicine and Radboud Center for Infectious Diseases (RCI), Radboud University Medical Center, Nijmegen, the Netherlands
| | - Saumya Kumar
- Department of Computational Biology for Individualised Infection Medicine, Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
- TWINCORE, a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
| | - Michelle Aillaud
- Institute for Lung Research, Philipps University, Marburg, Germany
| | - Hsin-Chieh Tsay
- Department of Computational Biology for Individualised Infection Medicine, Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
- TWINCORE, a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
| | - Zhaoli Liu
- Department of Computational Biology for Individualised Infection Medicine, Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
- TWINCORE, a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
| | - Anke R.M. Kraft
- Department of Computational Biology for Individualised Infection Medicine, Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
- TWINCORE, a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
- Department of Gastroenterology, Hepatology and Endocrinology, Hannover Medical School, Hannover, Germany
- German Centre for Infection Research (Deutsches Zentrum für Infektionsforschung [DZIF]), Partner Site Hannover-Braunschweig, Hannover, Germany
- Cluster of Excellence Resolving Infection Susceptibility (RESIST; EXC 2155), Hannover Medical School, Hannover, Germany
| | - Chai Fen Soon
- TWINCORE, a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
- Department of Gastroenterology, Hepatology and Endocrinology, Hannover Medical School, Hannover, Germany
| | - Ivan Odak
- Institute of Immunology, Hannover Medical School, Hannover, Germany
| | - Berislav Bošnjak
- Institute of Immunology, Hannover Medical School, Hannover, Germany
| | - Anna Vlot
- Berlin Institute for Medical Systems Biology (BIMSB), Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Deutsche COVID-19 OMICS Initiative (DeCOI)
- Department of Computational Biology for Individualised Infection Medicine, Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
- TWINCORE, a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
- Beijing Normal University, College of Life Sciences, Beijing, China
- Genomics Coordination Center, University of Groningen and University Medical Center Groningen, Groningen, the Netherlands
- Department of Genetics, University of Groningen and University Medical Center Groningen, Groningen, the Netherlands
- Department of Internal Medicine and Radboud Center for Infectious Diseases (RCI), Radboud University Medical Center, Nijmegen, the Netherlands
- Institute for Lung Research, Philipps University, Marburg, Germany
- Department of Gastroenterology, Hepatology and Endocrinology, Hannover Medical School, Hannover, Germany
- German Centre for Infection Research (Deutsches Zentrum für Infektionsforschung [DZIF]), Partner Site Hannover-Braunschweig, Hannover, Germany
- Cluster of Excellence Resolving Infection Susceptibility (RESIST; EXC 2155), Hannover Medical School, Hannover, Germany
- Institute of Immunology, Hannover Medical School, Hannover, Germany
- Berlin Institute for Medical Systems Biology (BIMSB), Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
- Genome Analytics, Helmholtz-Center for Infection Research (HZI), Braunschweig, Germany
- German Center for Lung Research (DZL), Giessen, Germany
- Hannover Unified Biobank, Hannover Medical School, Hannover, Germany
- Department of Experimental Immunology, Helmholtz Centre for Infection Research, Braunschweig, Germany
- Helmholtz Institute for RNA-based Infection Research (HIRI), Helmholtz-Centre for Infection Research (HZI), Wurzburg, Germany
- Charité–Universitätsmedizin Berlin, Department of Infectious Diseases and Respiratory Medicine, Charité, Universitätsmedizin Berlin, Berlin, Germany
| | - Morris A. Swertz
- Genomics Coordination Center, University of Groningen and University Medical Center Groningen, Groningen, the Netherlands
- Department of Genetics, University of Groningen and University Medical Center Groningen, Groningen, the Netherlands
| | - Uwe Ohler
- Berlin Institute for Medical Systems Biology (BIMSB), Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Robert Geffers
- Genome Analytics, Helmholtz-Center for Infection Research (HZI), Braunschweig, Germany
| | - Thomas Illig
- German Center for Lung Research (DZL), Giessen, Germany
- Hannover Unified Biobank, Hannover Medical School, Hannover, Germany
| | - Jochen Huehn
- Cluster of Excellence Resolving Infection Susceptibility (RESIST; EXC 2155), Hannover Medical School, Hannover, Germany
- Department of Experimental Immunology, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Antoine-Emmanuel Saliba
- Helmholtz Institute for RNA-based Infection Research (HIRI), Helmholtz-Centre for Infection Research (HZI), Wurzburg, Germany
| | - Leif Erik Sander
- German Center for Lung Research (DZL), Giessen, Germany
- Charité–Universitätsmedizin Berlin, Department of Infectious Diseases and Respiratory Medicine, Charité, Universitätsmedizin Berlin, Berlin, Germany
| | - Reinhold Förster
- German Centre for Infection Research (Deutsches Zentrum für Infektionsforschung [DZIF]), Partner Site Hannover-Braunschweig, Hannover, Germany
- Cluster of Excellence Resolving Infection Susceptibility (RESIST; EXC 2155), Hannover Medical School, Hannover, Germany
- Institute of Immunology, Hannover Medical School, Hannover, Germany
| | - Cheng-Jian Xu
- Department of Computational Biology for Individualised Infection Medicine, Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
- TWINCORE, a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
- Department of Internal Medicine and Radboud Center for Infectious Diseases (RCI), Radboud University Medical Center, Nijmegen, the Netherlands
- Department of Gastroenterology, Hepatology and Endocrinology, Hannover Medical School, Hannover, Germany
| | - Markus Cornberg
- Department of Computational Biology for Individualised Infection Medicine, Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
- TWINCORE, a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
- Department of Gastroenterology, Hepatology and Endocrinology, Hannover Medical School, Hannover, Germany
- German Centre for Infection Research (Deutsches Zentrum für Infektionsforschung [DZIF]), Partner Site Hannover-Braunschweig, Hannover, Germany
- Cluster of Excellence Resolving Infection Susceptibility (RESIST; EXC 2155), Hannover Medical School, Hannover, Germany
| | - Leon N. Schulte
- Institute for Lung Research, Philipps University, Marburg, Germany
- German Center for Lung Research (DZL), Giessen, Germany
| | - Yang Li
- Department of Computational Biology for Individualised Infection Medicine, Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
- TWINCORE, a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
- Department of Internal Medicine and Radboud Center for Infectious Diseases (RCI), Radboud University Medical Center, Nijmegen, the Netherlands
- Cluster of Excellence Resolving Infection Susceptibility (RESIST; EXC 2155), Hannover Medical School, Hannover, Germany
| |
Collapse
|
41
|
Kerner G, Neehus AL, Philippot Q, Bohlen J, Rinchai D, Kerrouche N, Puel A, Zhang SY, Boisson-Dupuis S, Abel L, Casanova JL, Patin E, Laval G, Quintana-Murci L. Genetic adaptation to pathogens and increased risk of inflammatory disorders in post-Neolithic Europe. CELL GENOMICS 2023; 3:100248. [PMID: 36819665 PMCID: PMC9932995 DOI: 10.1016/j.xgen.2022.100248] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 10/24/2022] [Accepted: 12/14/2022] [Indexed: 01/15/2023]
Abstract
Ancient genomics can directly detect human genetic adaptation to environmental cues. However, it remains unclear how pathogens have exerted selective pressures on human genome diversity across different epochs and affected present-day inflammatory disease risk. Here, we use an ancestry-aware approximate Bayesian computation framework to estimate the nature, strength, and time of onset of selection acting on 2,879 ancient and modern European genomes from the last 10,000 years. We found that the bulk of genetic adaptation occurred after the start of the Bronze Age, <4,500 years ago, and was enriched in genes relating to host-pathogen interactions. Furthermore, we detected directional selection acting on specific leukocytic lineages and experimentally demonstrated that the strongest negatively selected candidate variant in immunity genes, lipopolysaccharide-binding protein (LBP) D283G, is hypomorphic. Finally, our analyses suggest that the risk of inflammatory disorders has increased in post-Neolithic Europeans, possibly because of antagonistic pleiotropy following genetic adaptation to pathogens.
Collapse
Affiliation(s)
- Gaspard Kerner
- Institut Pasteur, Université Paris Cité, CNRS UMR2000, Human Evolutionary Genetics Unit, 75015 Paris, France
| | - Anna-Lena Neehus
- Laboratory of Human Genetics of Infectious Diseases, INSERM UMR 1163, Necker Hospital for Sick Children, 75015 Paris, France
- University Paris Cité, Imagine Institute, 75015 Paris, France
| | - Quentin Philippot
- Laboratory of Human Genetics of Infectious Diseases, INSERM UMR 1163, Necker Hospital for Sick Children, 75015 Paris, France
- University Paris Cité, Imagine Institute, 75015 Paris, France
| | - Jonathan Bohlen
- Laboratory of Human Genetics of Infectious Diseases, INSERM UMR 1163, Necker Hospital for Sick Children, 75015 Paris, France
- University Paris Cité, Imagine Institute, 75015 Paris, France
| | - Darawan Rinchai
- St. Giles Laboratory of Human Genetics of Infectious Diseases, The Rockefeller University, New York, NY 10065, USA
| | - Nacim Kerrouche
- St. Giles Laboratory of Human Genetics of Infectious Diseases, The Rockefeller University, New York, NY 10065, USA
| | - Anne Puel
- Laboratory of Human Genetics of Infectious Diseases, INSERM UMR 1163, Necker Hospital for Sick Children, 75015 Paris, France
- University Paris Cité, Imagine Institute, 75015 Paris, France
| | - Shen-Ying Zhang
- Laboratory of Human Genetics of Infectious Diseases, INSERM UMR 1163, Necker Hospital for Sick Children, 75015 Paris, France
- University Paris Cité, Imagine Institute, 75015 Paris, France
- St. Giles Laboratory of Human Genetics of Infectious Diseases, The Rockefeller University, New York, NY 10065, USA
| | - Stéphanie Boisson-Dupuis
- Laboratory of Human Genetics of Infectious Diseases, INSERM UMR 1163, Necker Hospital for Sick Children, 75015 Paris, France
- University Paris Cité, Imagine Institute, 75015 Paris, France
- St. Giles Laboratory of Human Genetics of Infectious Diseases, The Rockefeller University, New York, NY 10065, USA
| | - Laurent Abel
- Laboratory of Human Genetics of Infectious Diseases, INSERM UMR 1163, Necker Hospital for Sick Children, 75015 Paris, France
- University Paris Cité, Imagine Institute, 75015 Paris, France
- St. Giles Laboratory of Human Genetics of Infectious Diseases, The Rockefeller University, New York, NY 10065, USA
| | - Jean-Laurent Casanova
- Laboratory of Human Genetics of Infectious Diseases, INSERM UMR 1163, Necker Hospital for Sick Children, 75015 Paris, France
- University Paris Cité, Imagine Institute, 75015 Paris, France
- St. Giles Laboratory of Human Genetics of Infectious Diseases, The Rockefeller University, New York, NY 10065, USA
- Howard Hughes Medical Institute, New York, NY 10065, USA
- Department of Pediatrics, Necker Hospital for Sick Children, 75015 Paris, France
| | - Etienne Patin
- Institut Pasteur, Université Paris Cité, CNRS UMR2000, Human Evolutionary Genetics Unit, 75015 Paris, France
| | - Guillaume Laval
- Institut Pasteur, Université Paris Cité, CNRS UMR2000, Human Evolutionary Genetics Unit, 75015 Paris, France
| | - Lluis Quintana-Murci
- Institut Pasteur, Université Paris Cité, CNRS UMR2000, Human Evolutionary Genetics Unit, 75015 Paris, France
- Collège de France, Chair of Human Genomics and Evolution, 75005 Paris, France
| |
Collapse
|
42
|
Deviatiiarov RM, Gams A, Kulakovskiy IV, Buyan A, Meshcheryakov G, Syunyaev R, Singh R, Shah P, Tatarinova TV, Gusev O, Efimov IR. An atlas of transcribed human cardiac promoters and enhancers reveals an important role of regulatory elements in heart failure. NATURE CARDIOVASCULAR RESEARCH 2023; 2:58-75. [PMID: 39196209 DOI: 10.1038/s44161-022-00182-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2021] [Accepted: 11/02/2022] [Indexed: 08/29/2024]
Abstract
A deeper knowledge of the dynamic transcriptional activity of promoters and enhancers is needed to improve mechanistic understanding of the pathogenesis of heart failure and heart diseases. In this study, we used cap analysis of gene expression (CAGE) to identify and quantify the activity of transcribed regulatory elements (TREs) in the four cardiac chambers of 21 healthy and ten failing adult human hearts. We identified 17,668 promoters and 14,920 enhancers associated with the expression of 14,519 genes. We showed how these regulatory elements are alternatively transcribed in different heart regions, in healthy versus failing hearts and in ischemic versus non-ischemic heart failure samples. Cardiac-disease-related single-nucleotide polymorphisms (SNPs) appeared to be enriched in TREs, potentially affecting the allele-specific transcription factor binding. To conclude, our open-source heart CAGE atlas will serve the cardiovascular community in improving the understanding of the role of the cardiac gene regulatory networks in cardiovascular disease and therapy.
Collapse
Affiliation(s)
- Ruslan M Deviatiiarov
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, Russia
| | - Anna Gams
- Department of Biomedical Engineering, The George Washington University, Washington, DC, USA
| | - Ivan V Kulakovskiy
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, Russia
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Andrey Buyan
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, Russia
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia
| | | | - Roman Syunyaev
- Department of Biomedical Engineering, The George Washington University, Washington, DC, USA
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Ramesh Singh
- Inova Heart and Vascular Institute, Falls Church, VA, USA
| | - Palak Shah
- Department of Biomedical Engineering, The George Washington University, Washington, DC, USA
- Inova Heart and Vascular Institute, Falls Church, VA, USA
| | - Tatiana V Tatarinova
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia.
- Department of Biology, University of La Verne, La Verne, CA, USA.
| | - Oleg Gusev
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, Russia.
- Graduate School of Medicine, Juntendo University, Tokyo, Japan.
- RIKEN Center for Integrative Medical Sciences, RIKEN, Yokohama, Japan.
- Endocrinology Research Center, Moscow, Russia.
| | - Igor R Efimov
- Department of Biomedical Engineering, The George Washington University, Washington, DC, USA.
- Department of Biomedical Engineering, Northwestern University, Chicago, IL, USA.
- Department of Medicine, Northwestern University, Chicago, IL, USA.
| |
Collapse
|
43
|
Morova T, Ding Y, Huang CCF, Sar F, Schwarz T, Giambartolomei C, Baca S, Grishin D, Hach F, Gusev A, Freedman M, Pasaniuc B, Lack N. Optimized high-throughput screening of non-coding variants identified from genome-wide association studies. Nucleic Acids Res 2022; 51:e18. [PMID: 36546757 PMCID: PMC9943666 DOI: 10.1093/nar/gkac1198] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 11/19/2022] [Accepted: 12/06/2022] [Indexed: 12/24/2022] Open
Abstract
The vast majority of disease-associated single nucleotide polymorphisms (SNP) identified from genome-wide association studies (GWAS) are localized in non-coding regions. A significant fraction of these variants impact transcription factors binding to enhancer elements and alter gene expression. To functionally interrogate the activity of such variants we developed snpSTARRseq, a high-throughput experimental method that can interrogate the functional impact of hundreds to thousands of non-coding variants on enhancer activity. snpSTARRseq dramatically improves signal-to-noise by utilizing a novel sequencing and bioinformatic approach that increases both insert size and the number of variants tested per loci. Using this strategy, we interrogated known prostate cancer (PCa) risk-associated loci and demonstrated that 35% of them harbor SNPs that significantly altered enhancer activity. Combining these results with chromosomal looping data we could identify interacting genes and provide a mechanism of action for 20 PCa GWAS risk regions. When benchmarked to orthogonal methods, snpSTARRseq showed a strong correlation with in vivo experimental allelic-imbalance studies whereas there was no correlation with predictive in silico approaches. Overall, snpSTARRseq provides an integrated experimental and computational framework to functionally test non-coding genetic variants.
Collapse
Affiliation(s)
- Tunc Morova
- Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | - Yi Ding
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | | | - Funda Sar
- Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | - Tommer Schwarz
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Claudia Giambartolomei
- Central RNA Lab, Istituto Italiano di Tecnologia, Genova 16163, Italy,Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Sylvan C Baca
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA
| | - Dennis Grishin
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA
| | - Faraz Hach
- Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada,Department of Urologic Science, University of British Columbia, Vancouver, BC V5Z 1M9, Canada
| | - Alexander Gusev
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA,Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Matthew L Freedman
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA,The Center for Cancer Genome Discovery, Dana Farber Cancer Institute, Boston, MA 02215, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Nathan A Lack
- To whom correspondence should be addressed. Tel: +1 604 875 4411;
| |
Collapse
|
44
|
Hung KL, Luebeck J, Dehkordi SR, Colón CI, Li R, Wong ITL, Coruh C, Dharanipragada P, Lomeli SH, Weiser NE, Moriceau G, Zhang X, Bailey C, Houlahan KE, Yang W, González RC, Swanton C, Curtis C, Jamal-Hanjani M, Henssen AG, Law JA, Greenleaf WJ, Lo RS, Mischel PS, Bafna V, Chang HY. Targeted profiling of human extrachromosomal DNA by CRISPR-CATCH. Nat Genet 2022; 54:1746-1754. [PMID: 36253572 PMCID: PMC9649439 DOI: 10.1038/s41588-022-01190-0] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 08/22/2022] [Indexed: 12/15/2022]
Abstract
Extrachromosomal DNA (ecDNA) is a common mode of oncogene amplification but is challenging to analyze. Here, we adapt CRISPR-CATCH, in vitro CRISPR-Cas9 treatment and pulsed field gel electrophoresis of agarose-entrapped genomic DNA, previously developed for bacterial chromosome segments, to isolate megabase-sized human ecDNAs. We demonstrate strong enrichment of ecDNA molecules containing EGFR, FGFR2 and MYC from human cancer cells and NRAS ecDNA from human metastatic melanoma with acquired therapeutic resistance. Targeted enrichment of ecDNA versus chromosomal DNA enabled phasing of genetic variants, identified the presence of an EGFRvIII mutation exclusively on ecDNAs and supported an excision model of ecDNA genesis in a glioblastoma model. CRISPR-CATCH followed by nanopore sequencing enabled single-molecule ecDNA methylation profiling and revealed hypomethylation of the EGFR promoter on ecDNAs. We distinguished heterogeneous ecDNA species within the same sample by size and sequence with base-pair resolution and discovered functionally specialized ecDNAs that amplify select enhancers or oncogene-coding sequences.
Collapse
Affiliation(s)
- King L Hung
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
| | - Jens Luebeck
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, USA
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Siavash R Dehkordi
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Caterina I Colón
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
- Sarafan ChEM-H, Stanford University, Stanford, CA, USA
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Rui Li
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
| | - Ivy Tsz-Lo Wong
- Sarafan ChEM-H, Stanford University, Stanford, CA, USA
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Ceyda Coruh
- Plant Molecular and Cellular Biology Laboratory, Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Prashanthi Dharanipragada
- Division of Dermatology, Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Shirley H Lomeli
- Division of Dermatology, Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Natasha E Weiser
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Gatien Moriceau
- Division of Dermatology, Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Xiao Zhang
- Division of Dermatology, Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Chris Bailey
- Cancer Evolution and Genome Instability Laboratory, The Francis Crick Institute, London, UK
| | - Kathleen E Houlahan
- Department of Medicine, Division of Oncology, Stanford University School of Medicine, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Wenting Yang
- Department of Medicine, Division of Oncology, Stanford University School of Medicine, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Rocío Chamorro González
- Department of Pediatric Oncology/Hematology, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Charles Swanton
- Cancer Evolution and Genome Instability Laboratory, The Francis Crick Institute, London, UK
- Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, University College London, London, UK
- University College London Hospitals NHS Trust, London, UK
| | - Christina Curtis
- Department of Medicine, Division of Oncology, Stanford University School of Medicine, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Mariam Jamal-Hanjani
- Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, University College London, London, UK
- University College London Hospitals NHS Trust, London, UK
| | - Anton G Henssen
- Department of Pediatric Oncology/Hematology, Charité-Universitätsmedizin Berlin, Berlin, Germany
- Experimental and Clinical Research Center (ECRC), Max Delbrück Center for Molecular Medicine and Charité-Universitätsmedizin Berlin, Berlin, Germany
- German Cancer Consortium (DKTK), partner site Berlin, and German Cancer Research Center DKFZ, Heidelberg, Germany
- Berlin Institute of Health, Berlin, Germany
| | - Julie A Law
- Plant Molecular and Cellular Biology Laboratory, Salk Institute for Biological Studies, La Jolla, CA, USA
| | - William J Greenleaf
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Roger S Lo
- Division of Dermatology, Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
- Jonsson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
| | - Paul S Mischel
- Sarafan ChEM-H, Stanford University, Stanford, CA, USA
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Vineet Bafna
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Howard Y Chang
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA.
- Department of Genetics, Stanford University, Stanford, CA, USA.
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA, USA.
| |
Collapse
|
45
|
Hou G, Zhou T, Xu N, Yin Z, Zhu X, Zhang Y, Cui Y, Ma J, Tang Y, Cheng Z, Shen Y, Chen Y, Zou LH, Wang YF, Yin Z, Guo Y, Ding H, Ye Z, Shen N. Integrative Functional Genomics Identifies Systemic Lupus Erythematosus Causal Genetic Variant in the IRF5 Risk Locus. Arthritis Rheumatol 2022; 75:574-585. [PMID: 36245280 DOI: 10.1002/art.42390] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 09/14/2022] [Accepted: 10/04/2022] [Indexed: 11/09/2022]
Abstract
OBJECTIVE IRF5 plays a crucial role in the development of lupus. Genome-wide association studies have identified several systemic lupus erythematosus (SLE) risk single-nucleotide polymorphisms (SNPs) enriched in the IRF5 locus. However, no comprehensive genome editing-based functional analysis exists to establish a direct link between these variants and altered IRF5 expression, particularly for enhancer variants. This study was undertaken to dissect the regulatory function and mechanisms of SLE IRF5 enhancer risk variants and to explore the utilization of clustered regularly interspaced short palindromic repeat interference (CRISPRi) to regulate the expression of disease risk gene to intervene in the disease. METHODS Epigenomic profiles and expression quantitative trait locus analysis were applied to prioritize putative functional variants in the IRF5 locus. CRISPR-mediated deletion, activation, and interference were performed to investigate the genetic function of rs4728142. Allele-specific chromatin immunoprecipitation-quantitative polymerase chain reaction and allele-specific formaldehyde-assisted isolation of regulatory element-quantitative polymerase chain reaction were used to decipher the mechanism of alleles differentially regulating IRF5 expression. The CRISPRi approach was used to evaluate the intervention effect in monocytes from SLE patients. RESULTS SLE risk SNP rs4728142 was located in an enhancer region, indicating a disease-related regulatory function, and risk allele rs4728142-A was closely associated with increased IRF5 expression. We demonstrated that an rs4728142-containing region could act as an enhancer to regulate the expression of IRF5. Moreover, rs4728142 affected the binding affinity of zinc finger and BTB domain-containing protein 3 (ZBTB3), a transcription factor involved in regulation. Furthermore, in monocytes from SLE patients, CRISPR-based interference with the regulation of this enhancer attenuated the production of disease-associated cytokines. CONCLUSION These results demonstrate that the rs4728142-A allele increases the SLE risk by affecting ZBTB3 binding, chromatin status, and regulating IRF5 expression, establishing a biologic link between genetic variation and lupus pathogenesis.
Collapse
Affiliation(s)
- Guojun Hou
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China, Shenzhen Futian Hospital for Rheumatic Diseases, Shenzhen, China, and State Key Laboratory of Oncogenes and Related Genes, Shanghai Cancer Institute, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Tian Zhou
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Ning Xu
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zhihua Yin
- Shenzhen Futian Hospital for Rheumatic Diseases, and Joint Research Laboratory for Rheumatology of Shenzhen University Health Science Center and Shenzhen Futian Hospital for Rheumatic Diseases, Shenzhen, China
| | - Xinyi Zhu
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yutong Zhang
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yange Cui
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Jianyang Ma
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yuanjia Tang
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zhaorui Cheng
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yiwei Shen
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yashuo Chen
- Shenzhen Futian Hospital for Rheumatic Diseases, and Joint Research Laboratory for Rheumatology of Shenzhen University Health Science Center and Shenzhen Futian Hospital for Rheumatic Diseases, Shenzhen, China
| | - Ling-Hua Zou
- Shenzhen Futian Hospital for Rheumatic Diseases, and Joint Research Laboratory for Rheumatology of Shenzhen University Health Science Center and Shenzhen Futian Hospital for Rheumatic Diseases, Shenzhen, China
| | - Yong-Fei Wang
- School of Life and Health Sciences, School of Medicine, and Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Guangdong, China
| | - Zihang Yin
- Sheng Yushou Center of Cell Biology and Immunology, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Ya Guo
- Sheng Yushou Center of Cell Biology and Immunology, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Huihua Ding
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zhizhong Ye
- Shenzhen Futian Hospital for Rheumatic Diseases, and Joint Research Laboratory for Rheumatology of Shenzhen University Health Science Center and Shenzhen Futian Hospital for Rheumatic Diseases, Shenzhen, China
| | - Nan Shen
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China, Shenzhen Futian Hospital for Rheumatic Diseases, Shenzhen, China, State Key Laboratory of Oncogenes and Related Genes, Shanghai Cancer Institute, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China, Center for Autoimmune Genomics and Etiology, Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, and Department of Pediatrics, University of Cincinnati, Cincinnati, Ohio
| |
Collapse
|
46
|
Boytsov A, Abramov S, Aiusheeva AZ, Kasianova A, Baulin E, Kuznetsov I, Aulchenko Y, Kolmykov S, Yevshin I, Kolpakov F, Vorontsov I, Makeev V, Kulakovskiy I. ANANASTRA: annotation and enrichment analysis of allele-specific transcription factor binding at SNPs. Nucleic Acids Res 2022; 50:W51-W56. [PMID: 35446421 PMCID: PMC9252736 DOI: 10.1093/nar/gkac262] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 03/15/2022] [Accepted: 04/04/2022] [Indexed: 11/12/2022] Open
Abstract
We present ANANASTRA, https://ananastra.autosome.org, a web server for the identification and annotation of regulatory single-nucleotide polymorphisms (SNPs) with allele-specific binding events. ANANASTRA accepts a list of dbSNP IDs or a VCF file and reports allele-specific binding (ASB) sites of particular transcription factors or in specific cell types, highlighting those with ASBs significantly enriched at SNPs in the query list. ANANASTRA is built on top of a systematic analysis of allelic imbalance in ChIP-Seq experiments and performs the ASB enrichment test against background sets of SNPs found in the same source experiments as ASB sites but not displaying significant allelic imbalance. We illustrate ANANASTRA usage with selected case studies and expect that ANANASTRA will help to conduct the follow-up of GWAS in terms of establishing functional hypotheses and designing experimental verification.
Collapse
Affiliation(s)
- Alexandr Boytsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny, 141701, Russia
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, 420008, Russia
| | - Sergey Abramov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny, 141701, Russia
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, 420008, Russia
| | - Ariuna Z Aiusheeva
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, 142290, Russia
| | - Alexandra M Kasianova
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, 142290, Russia
- Southern Federal University, Rostov-on-Don, 344006, Russia
| | - Eugene Baulin
- Moscow Institute of Physics and Technology, Dolgoprudny, 141701, Russia
- Institute of Mathematical Problems of Biology RAS - the Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Pushchino, 142290, Russia
| | - Ivan A Kuznetsov
- Skolkovo Institute of Science and Technology, Moscow, 121205, Russia
| | - Yurii S Aulchenko
- Institute of Cytology and Genetics SB RAS, Novosibirsk, 630090, Russia
- PolyKnomics BV, ’s-Hertogenbosch, 5237 PA, Netherlands
| | - Semyon Kolmykov
- Sirius University of Science and Technology, Sochi, 354340, Russia
- Biosoft.Ru LLC, Novosibirsk, 630090, Russia
| | - Ivan Yevshin
- Sirius University of Science and Technology, Sochi, 354340, Russia
- Biosoft.Ru LLC, Novosibirsk, 630090, Russia
| | - Fedor Kolpakov
- Sirius University of Science and Technology, Sochi, 354340, Russia
- Federal Research Center for Information and Computational Technologies, Novosibirsk, 630090, Russia
| | - Ilya E Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russia
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, 142290, Russia
| | - Vsevolod J Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny, 141701, Russia
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, 420008, Russia
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 119991, Russia
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russia
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, 420008, Russia
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, 142290, Russia
| |
Collapse
|
47
|
Eapen AA, Parameswaran S, Forney C, Edsall LE, Miller D, Donmez O, Dunn K, Lu X, Granitto M, Rowden H, Magier AZ, Pujato M, Chen X, Kaufman K, Bernstein DI, Devonshire AL, Rothenberg ME, Weirauch MT, Kottyan LC. Epigenetic and transcriptional dysregulation in CD4+ T cells in patients with atopic dermatitis. PLoS Genet 2022; 18:e1009973. [PMID: 35576187 PMCID: PMC9135339 DOI: 10.1371/journal.pgen.1009973] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 05/26/2022] [Accepted: 04/20/2022] [Indexed: 12/30/2022] Open
Abstract
Atopic dermatitis (AD) is one of the most common skin disorders among children. Disease etiology involves genetic and environmental factors, with 29 independent AD risk loci enriched for risk allele-dependent gene expression in the skin and CD4+ T cell compartments. We investigated the potential epigenetic mechanisms responsible for the genetic susceptibility of CD4+ T cells. To understand the differences in gene regulatory activity in peripheral blood T cells in AD, we measured chromatin accessibility (an assay based on transposase-accessible chromatin sequencing, ATAC-seq), nuclear factor kappa B subunit 1 (NFKB1) binding (chromatin immunoprecipitation with sequencing, ChIP-seq), and gene expression levels (RNA-seq) in stimulated CD4+ T cells from subjects with active moderate-to-severe AD, as well as in age-matched non-allergic controls. Open chromatin regions in stimulated CD4+ T cells were highly enriched for AD genetic risk variants, with almost half of the AD risk loci overlapping AD-dependent ATAC-seq peaks. AD-specific open chromatin regions were strongly enriched for NF-κB DNA-binding motifs. ChIP-seq identified hundreds of NFKB1-occupied genomic loci that were AD- or control-specific. As expected, the AD-specific ChIP-seq peaks were strongly enriched for NF-κB DNA-binding motifs. Surprisingly, control-specific NFKB1 ChIP-seq peaks were not enriched for NFKB1 motifs, but instead contained motifs for other classes of human transcription factors, suggesting a mechanism involving altered indirect NFKB1 binding. Using DNA sequencing data, we identified 63 instances of altered genotype-dependent chromatin accessibility at 36 AD risk variant loci (30% of AD risk loci) that might lead to genotype-dependent gene expression. Based on these findings, we propose that CD4+ T cells respond to stimulation in an AD-specific manner, resulting in disease- and genotype-dependent chromatin accessibility alterations involving NFKB1 binding.
Collapse
Affiliation(s)
- Amy A. Eapen
- Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Division of Allergy and Clinical Immunology, Henry Ford Health System, Detroit, Michigan, United States of America
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Sreeja Parameswaran
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Carmy Forney
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Lee E. Edsall
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Daniel Miller
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Omer Donmez
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Katelyn Dunn
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Xiaoming Lu
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Marissa Granitto
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Hope Rowden
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Adam Z. Magier
- Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Mario Pujato
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Xiaoting Chen
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Kenneth Kaufman
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Divisions of Biomedical Informatics and Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Cincinnati Veterans Administration, Cincinnati, Ohio, United States of America
| | - David I. Bernstein
- Division of Immunology, Allergy, and Rheumatology, University of Cincinnati, College of Medicine, Cincinnati, Ohio, United States of America
| | - Ashley L. Devonshire
- Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| | - Marc E. Rothenberg
- Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| | - Matthew T. Weirauch
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Divisions of Biomedical Informatics and Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| | - Leah C. Kottyan
- Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| |
Collapse
|
48
|
Flynn E, Lappalainen T. Functional Characterization of Genetic Variant Effects on Expression. Annu Rev Biomed Data Sci 2022; 5:119-139. [PMID: 35483347 DOI: 10.1146/annurev-biodatasci-122120-010010] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Thousands of common genetic variants in the human population have been associated with disease risk and phenotypic variation by genome-wide association studies (GWAS). However, the majority of GWAS variants fall into noncoding regions of the genome, complicating our understanding of their regulatory functions, and few molecular mechanisms of GWAS variant effects have been clearly elucidated. Here, we set out to review genetic variant effects, focusing on expression quantitative trait loci (eQTLs), including their utility in interpreting GWAS variant mechanisms. We discuss the interrelated challenges and opportunities for eQTL analysis, covering determining causal variants, elucidating molecular mechanisms of action, and understanding context variability. Addressing these questions can enable better functional characterization of disease-associated loci and provide insights into fundamental biological questions of the noncoding genetic regulatory code and its control of gene expression. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Elise Flynn
- New York Genome Center, New York, NY, USA; , .,Department of Systems Biology, Columbia University, New York, NY, USA
| | - Tuuli Lappalainen
- New York Genome Center, New York, NY, USA; , .,Department of Systems Biology, Columbia University, New York, NY, USA.,Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| |
Collapse
|
49
|
Zhou T, Zhu X, Ye Z, Wang YF, Yao C, Xu N, Zhou M, Ma J, Qin Y, Shen Y, Tang Y, Yin Z, Xu H, Zhang Y, Zang X, Ding H, Yang W, Guo Y, Harley JB, Namjou B, Kaufman KM, Kottyan LC, Weirauch MT, Hou G, Shen N. Lupus enhancer risk variant causes dysregulation of IRF8 through cooperative lncRNA and DNA methylation machinery. Nat Commun 2022; 13:1855. [PMID: 35388006 PMCID: PMC8987079 DOI: 10.1038/s41467-022-29514-y] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 03/21/2022] [Indexed: 02/06/2023] Open
Abstract
Despite strong evidence that human genetic variants affect the expression of many key transcription factors involved in autoimmune diseases, establishing biological links between non-coding risk variants and the gene targets they regulate remains a considerable challenge. Here, we combine genetic, epigenomic, and CRISPR activation approaches to screen for functional variants that regulate IRF8 expression. We demonstrate that the locus containing rs2280381 is a cell-type-specific enhancer for IRF8 that spatially interacts with the IRF8 promoter. Further, rs2280381 mediates IRF8 expression through enhancer RNA AC092723.1, which recruits TET1 to the IRF8 promoter regulating IRF8 expression by affecting methylation levels. The alleles of rs2280381 modulate PU.1 binding and chromatin state to regulate AC092723.1 and IRF8 expression differentially. Our work illustrates an integrative strategy to define functional genetic variants that regulate the expression of critical genes in autoimmune diseases and decipher the mechanisms underlying the dysregulation of IRF8 expression mediated by lupus risk variants.
Collapse
Affiliation(s)
- Tian Zhou
- grid.16821.3c0000 0004 0368 8293Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200001 China ,grid.16821.3c0000 0004 0368 8293State Key Laboratory of Oncogenes and Related Genes, Shanghai Cancer Institute, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200032 China ,Shenzhen Futian Hospital for Rheumatic Diseases, Shenzhen, 518040 China
| | - Xinyi Zhu
- grid.16821.3c0000 0004 0368 8293Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200001 China
| | - Zhizhong Ye
- Shenzhen Futian Hospital for Rheumatic Diseases, Shenzhen, 518040 China
| | - Yong-Fei Wang
- grid.194645.b0000000121742757Department of Paediatrics and Adolescent Medicine, The University of Hong Kong, Hong Kong, 999077 China
| | - Chao Yao
- grid.9227.e0000000119573309Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences (SIBS), University of Chinese Academy of Sciences, Chinese Academy of Sciences (CAS), Shanghai, 200031 China
| | - Ning Xu
- grid.16821.3c0000 0004 0368 8293Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200001 China
| | - Mi Zhou
- grid.16821.3c0000 0004 0368 8293Sheng Yushou Center of Cell Biology and Immunology, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University (SJTU), Shanghai, 200240 China
| | - Jianyang Ma
- grid.16821.3c0000 0004 0368 8293Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200001 China
| | - Yuting Qin
- grid.16821.3c0000 0004 0368 8293Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200001 China
| | - Yiwei Shen
- grid.16821.3c0000 0004 0368 8293Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200001 China
| | - Yuanjia Tang
- grid.16821.3c0000 0004 0368 8293Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200001 China
| | - Zhihua Yin
- Shenzhen Futian Hospital for Rheumatic Diseases, Shenzhen, 518040 China
| | - Hong Xu
- grid.16821.3c0000 0004 0368 8293Department of Obstetrics and Gynecology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200127 China ,grid.16821.3c0000 0004 0368 8293Shanghai Key Laboratory of Gynecologic Oncology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200127 China
| | - Yutong Zhang
- grid.16821.3c0000 0004 0368 8293Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200001 China
| | - Xiaoli Zang
- grid.16821.3c0000 0004 0368 8293Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200001 China
| | - Huihua Ding
- grid.16821.3c0000 0004 0368 8293Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200001 China
| | - Wanling Yang
- grid.194645.b0000000121742757Department of Paediatrics and Adolescent Medicine, The University of Hong Kong, Hong Kong, 999077 China
| | - Ya Guo
- grid.16821.3c0000 0004 0368 8293Sheng Yushou Center of Cell Biology and Immunology, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University (SJTU), Shanghai, 200240 China
| | - John B. Harley
- grid.413848.20000 0004 0420 2128US Department of Veterans Affairs Medical Center, Cincinnati, OH 45229 USA
| | - Bahram Namjou
- grid.239573.90000 0000 9025 8099Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229 USA
| | - Kenneth M. Kaufman
- grid.239573.90000 0000 9025 8099Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229 USA ,grid.239573.90000 0000 9025 8099Division of Immunobiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229 USA ,grid.24827.3b0000 0001 2179 9593Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229 USA
| | - Leah C. Kottyan
- grid.239573.90000 0000 9025 8099Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229 USA ,grid.24827.3b0000 0001 2179 9593Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229 USA ,grid.239573.90000 0000 9025 8099Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229 USA
| | - Matthew T. Weirauch
- grid.239573.90000 0000 9025 8099Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229 USA ,grid.24827.3b0000 0001 2179 9593Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229 USA ,grid.239573.90000 0000 9025 8099Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229 USA ,grid.239573.90000 0000 9025 8099Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229 USA
| | - Guojun Hou
- grid.16821.3c0000 0004 0368 8293Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200001 China ,grid.16821.3c0000 0004 0368 8293State Key Laboratory of Oncogenes and Related Genes, Shanghai Cancer Institute, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200032 China ,Shenzhen Futian Hospital for Rheumatic Diseases, Shenzhen, 518040 China
| | - Nan Shen
- grid.16821.3c0000 0004 0368 8293Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200001 China ,grid.16821.3c0000 0004 0368 8293State Key Laboratory of Oncogenes and Related Genes, Shanghai Cancer Institute, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200032 China ,Shenzhen Futian Hospital for Rheumatic Diseases, Shenzhen, 518040 China ,grid.239573.90000 0000 9025 8099Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229 USA ,grid.24827.3b0000 0001 2179 9593Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229 USA
| |
Collapse
|
50
|
Nakamura K, Reid BM, Chen A, Chen Z, Goode EL, Permuth JB, Teer JK, Tyrer J, Yu X, Kanetsky PA, Pharoah PD, Gayther SA, Sellers TA, Lawrenson K, Karreth FA. Functional analysis of the 1p34.3 risk locus implicates GNL2 in high-grade serous ovarian cancer. Am J Hum Genet 2022; 109:116-135. [PMID: 34965383 DOI: 10.1016/j.ajhg.2021.11.020] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 11/29/2021] [Indexed: 12/20/2022] Open
Abstract
The high-grade serous ovarian cancer (HGSOC) risk locus at chromosome 1p34.3 resides within a frequently amplified genomic region signifying the presence of an oncogene. Here, we integrate in silico variant-to-function analysis with functional studies to characterize the oncogenic potential of candidate genes in the 1p34.3 locus. Fine mapping of genome-wide association statistics identified candidate causal SNPs local to H3K27ac-demarcated enhancer regions that exhibit allele-specific binding for CTCF in HGSOC and normal fallopian tube secretory epithelium cells (FTSECs). SNP risk associations colocalized with eQTL for six genes (DNALI1, GNL2, RSPO1, SNIP1, MEAF6, and LINC01137) that are more highly expressed in carriers of the risk allele, and three (DNALI1, GNL2, and RSPO1) were upregulated in HGSOC compared to normal ovarian surface epithelium cells and/or FTSECs. Increased expression of GNL2 and MEAF6 was associated with shorter survival in HGSOC with 1p34.3 amplifications. Despite its activation of β-catenin signaling, RSPO1 overexpression exerted no effects on proliferation or colony formation in our study of ovarian cancer and FTSECs. Instead, GNL2, MEAF6, and SNIP1 silencing impaired in vitro ovarian cancer cell growth. Additionally, GNL2 silencing diminished xenograft tumor formation, whereas overexpression stimulated proliferation and colony formation in FTSECs. GNL2 influences 60S ribosomal subunit maturation and global protein synthesis in ovarian cancer and FTSECs, providing a potential mechanism of how GNL2 upregulation might promote ovarian cancer development and mediate genetic susceptibility of HGSOC.
Collapse
|