1
|
Hutchison WJ, Keyes TJ, Crowell HL, Serizay J, Soneson C, Davis ES, Sato N, Moses L, Tarlinton B, Nahid AA, Kosmac M, Clayssen Q, Yuan V, Mu W, Park JE, Mamede I, Ryu MH, Axisa PP, Paiz P, Poon CL, Tang M, Gottardo R, Morgan M, Lee S, Lawrence M, Hicks SC, Nolan GP, Davis KL, Papenfuss AT, Love MI, Mangiola S. The tidyomics ecosystem: enhancing omic data analyses. Nat Methods 2024; 21:1166-1170. [PMID: 38877315 DOI: 10.1038/s41592-024-02299-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 05/05/2024] [Indexed: 06/16/2024]
Abstract
The growth of omic data presents evolving challenges in data manipulation, analysis and integration. Addressing these challenges, Bioconductor provides an extensive community-driven biological data analysis platform. Meanwhile, tidy R programming offers a revolutionary data organization and manipulation standard. Here we present the tidyomics software ecosystem, bridging Bioconductor to the tidy R paradigm. This ecosystem aims to streamline omic analysis, ease learning and encourage cross-disciplinary collaborations. We demonstrate the effectiveness of tidyomics by analyzing 7.5 million peripheral blood mononuclear cells from the Human Cell Atlas, spanning six data frameworks and ten analysis tools.
Collapse
Affiliation(s)
- William J Hutchison
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, University of Melbourne, Parkville, Victoria, Australia
| | - Timothy J Keyes
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
| | - Helena L Crowell
- University of Zurich, Zurich, Switzerland
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
| | - Jacques Serizay
- Unité Régulation Spatiale des Génomes, Institut Pasteur, CNRS UMR3525, Paris, France
| | - Charlotte Soneson
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Eric S Davis
- Bioinformatics and Computational Biology Program, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Noriaki Sato
- Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Lambda Moses
- California Institute of Technology, Pasadena, CA, USA
| | - Boyd Tarlinton
- Queensland Department of Agriculture and Fisheries, Brisbane, Queensland, Australia
| | - Abdullah A Nahid
- Department of Biochemistry and Molecular Biology, Shahjalal University of Science and Technology, Sylhet, Bangladesh
| | | | | | - Victor Yuan
- Department of Statistics, The University of British Columbia, Vancouver, British Columbia, Canada
| | - Wancen Mu
- Biostatistics Department, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Ji-Eun Park
- Biostatistics Department, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Izabela Mamede
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Min Hyung Ryu
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Pierre-Paul Axisa
- Centre de Recherches en Cancérologie de Toulouse, Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Toulouse, France
| | - Paulina Paiz
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Chi-Lam Poon
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Ming Tang
- Immunitas Therapeutics, Waltham, MA, USA
| | - Raphael Gottardo
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
- University of Lausanne, Lausanne, Switzerland
- Lausanne University Hospital, Lausanne, Switzerland
| | - Martin Morgan
- Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA
| | - Stuart Lee
- Department of Bioinformatics and Computational Biology, Genentech, South San Francisco, CA, USA
| | - Michael Lawrence
- Department of Bioinformatics and Computational Biology, Genentech, South San Francisco, CA, USA
| | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, USA
| | - Garry P Nolan
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
| | - Kara L Davis
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
| | - Anthony T Papenfuss
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
- Department of Medical Biology, University of Melbourne, Parkville, Victoria, Australia.
| | - Michael I Love
- Biostatistics Department, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA.
- Genetics Department, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA.
| | - Stefano Mangiola
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
- Department of Medical Biology, University of Melbourne, Parkville, Victoria, Australia.
- South Australian immunoGENomics Cancer Institute, The University of Adelaide, Adelaide, South Australia, Australia.
| |
Collapse
|
2
|
Hutchison WJ, Keyes TJ, Crowell HL, Serizay J, Soneson C, Davis ES, Sato N, Moses L, Tarlinton B, Nahid AA, Kosmac M, Clayssen Q, Yuan V, Mu W, Park JE, Mamede I, Ryu MH, Axisa PP, Paiz P, Poon CL, Tang M, Gottardo R, Morgan M, Lee S, Lawrence M, Hicks SC, Nolan GP, Davis KL, Papenfuss AT, Love MI, Mangiola S. The tidyomics ecosystem: Enhancing omic data analyses. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.10.557072. [PMID: 38826347 PMCID: PMC11142095 DOI: 10.1101/2023.09.10.557072] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
The growth of omic data presents evolving challenges in data manipulation, analysis, and integration. Addressing these challenges, Bioconductor1 provides an extensive community-driven biological data analysis platform. Meanwhile, tidy R programming2 offers a revolutionary standard for data organisation and manipulation. Here, we present the tidyomics software ecosystem, bridging Bioconductor to the tidy R paradigm. This ecosystem aims to streamline omic analysis, ease learning, and encourage cross-disciplinary collaborations. We demonstrate the effectiveness of tidyomics by analysing 7.5 million peripheral blood mononuclear cells from the Human Cell Atlas3, spanning six data frameworks and ten analysis tools.
Collapse
Affiliation(s)
- William J. Hutchison
- Walter and Eliza Hall Institute of Medical Research, Division of Bioinformatics, Parkville, VIC 3052, Australia
- Department of Medical Biology, University of Melbourne, Parkville, VIC 3052, Australia
| | - Timothy J. Keyes
- Stanford University School of Medicine, Department of Biomedical Data Science, USA
- Stanford University School of Medicine, Department of Pediatrics, USA
| | | | - Helena L. Crowell
- University of Zurich, Switzerland
- Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Jacques Serizay
- Institut Pasteur, CNRS UMR3525, Unité Régulation Spatiale des Génomes, F-75015, Paris, France
| | - Charlotte Soneson
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Eric S. Davis
- Bioinformatics and Computational Biology Program, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Noriaki Sato
- Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Japan
| | | | - Boyd Tarlinton
- Queensland Department of Agriculture and Fisheries, Australia
| | - Abdullah A. Nahid
- Department of Biochemistry and Molecular Biology, Shahjalal University of Science and Technology, Sylhet, Bangladesh
| | | | | | - Victor Yuan
- Department of Statistics, The University of British Columbia, Canada
| | - Wancen Mu
- Biostatistics Department, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Ji-Eun Park
- Biostatistics Department, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Izabela Mamede
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Min Hyung Ryu
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, USA
- Department of Medicine, Harvard Medical School, USA
| | - Pierre-Paul Axisa
- Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
| | - Paulina Paiz
- Stanford University School of Medicine, Department of Biomedical Data Science, USA
| | - Chi-Lam Poon
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | | | - Raphael Gottardo
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
- University of Lausanne, Switzerland
- Lausanne University Hospital
| | | | - Stuart Lee
- Genentech, Department of Bioinformatics and Computational Biology, USA
| | - Michael Lawrence
- Genentech, Department of Bioinformatics and Computational Biology, USA
| | - Stephanie C. Hicks
- Department of Biostatistics, Johns Hopkins University, USA
- Department of Biomedical Engineering, Johns Hopkins University, USA
- Malone Center for Engineering in Healthcare, Johns Hopkins University, MD, USA
| | - Garry P. Nolan
- Stanford University School of Medicine, Department of Pathology, USA
| | - Kara L. Davis
- Stanford University School of Medicine, Department of Pediatrics, USA
| | - Anthony T. Papenfuss
- Walter and Eliza Hall Institute of Medical Research, Division of Bioinformatics, Parkville, VIC 3052, Australia
- Department of Medical Biology, University of Melbourne, Parkville, VIC 3052, Australia
| | - Michael I. Love
- Biostatistics Department, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
- Genetics Department, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Stefano Mangiola
- Walter and Eliza Hall Institute of Medical Research, Division of Bioinformatics, Parkville, VIC 3052, Australia
- Department of Medical Biology, University of Melbourne, Parkville, VIC 3052, Australia
- The University of Adelaide, South Australian immunoGENomics Cancer Institute, Adelaide, South Australia, Australia
| |
Collapse
|
3
|
Kramer NE, Coryell P, D'Costa S, Thulson E, Byun S, Kim H, Parkus SM, Bond ML, Shine J, Chubinskaya S, Love MI, Mohlke KL, Diekman BO, Loeser RF, Phanstiel DH. Response eQTLs, chromatin accessibility, and 3D chromatin structure in chondrocytes provide mechanistic insight into osteoarthritis risk. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.05.592567. [PMID: 38952796 PMCID: PMC11216363 DOI: 10.1101/2024.05.05.592567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/03/2024]
Abstract
Osteoarthritis (OA) poses a significant healthcare burden with limited treatment options. While genome-wide association studies (GWAS) have identified over 100 OA-associated loci, translating these findings into therapeutic targets remains challenging. Integrating expression quantitative trait loci (eQTL), 3D chromatin structure, and other genomic approaches with OA GWAS data offers a promising approach to elucidate disease mechanisms; however, comprehensive eQTL maps in OA-relevant tissues and conditions remain scarce. We mapped gene expression, chromatin accessibility, and 3D chromatin structure in primary human articular chondrocytes in both resting and OA-mimicking conditions. We identified thousands of differentially expressed genes, including those associated with differences in sex and age. RNA-seq in chondrocytes from 101 donors across two conditions uncovered 3782 unique eGenes, including 420 that exhibited strong and significant condition-specific effects. Colocalization with OA GWAS signals revealed 13 putative OA risk genes, 10 of which have not been previously identified. Chromatin accessibility and 3D chromatin structure provided insights into the mechanisms and conditional specificity of these variants. Our findings shed light on OA pathogenesis and highlight potential targets for therapeutic development. Highlights ∘ Comprehensive analysis of sex- and age-related global gene expression in human chondrocytes revealed differences that correlate with osteoarthritis ∘ First response eQTLs in chondrocytes treated with an OA-related stimulus ∘ Deeply sequenced Hi-C in resting and activated chondrocytes helps connect OA risk variants to their putative causal genes ∘ Colocalization analysis reveals 13 (including 10 novel) putative OA risk genes.
Collapse
|
4
|
Natri HM, Del Azodi CB, Peter L, Taylor CJ, Chugh S, Kendle R, Chung MI, Flaherty DK, Matlock BK, Calvi CL, Blackwell TS, Ware LB, Bacchetta M, Walia R, Shaver CM, Kropski JA, McCarthy DJ, Banovich NE. Cell-type-specific and disease-associated expression quantitative trait loci in the human lung. Nat Genet 2024; 56:595-604. [PMID: 38548990 PMCID: PMC11018522 DOI: 10.1038/s41588-024-01702-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 02/28/2024] [Indexed: 04/04/2024]
Abstract
Common genetic variants confer substantial risk for chronic lung diseases, including pulmonary fibrosis. Defining the genetic control of gene expression in a cell-type-specific and context-dependent manner is critical for understanding the mechanisms through which genetic variation influences complex traits and disease pathobiology. To this end, we performed single-cell RNA sequencing of lung tissue from 66 individuals with pulmonary fibrosis and 48 unaffected donors. Using a pseudobulk approach, we mapped expression quantitative trait loci (eQTLs) across 38 cell types, observing both shared and cell-type-specific regulatory effects. Furthermore, we identified disease interaction eQTLs and demonstrated that this class of associations is more likely to be cell-type-specific and linked to cellular dysregulation in pulmonary fibrosis. Finally, we connected lung disease risk variants to their regulatory targets in disease-relevant cell types. These results indicate that cellular context determines the impact of genetic variation on gene expression and implicates context-specific eQTLs as key regulators of lung homeostasis and disease.
Collapse
Affiliation(s)
- Heini M Natri
- Translational Genomics Research Institute, Phoenix, AZ, USA
| | - Christina B Del Azodi
- St. Vincent's Institute of Medical Research, Melbourne, Victoria, Australia
- Melbourne Integrative Genomics, University of Melbourne, Melbourne, Victoria, Australia
| | - Lance Peter
- Translational Genomics Research Institute, Phoenix, AZ, USA
| | - Chase J Taylor
- Division of Allergy, Pulmonary and Critical Care Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Sagrika Chugh
- St. Vincent's Institute of Medical Research, Melbourne, Victoria, Australia
- Melbourne Integrative Genomics, University of Melbourne, Melbourne, Victoria, Australia
- School of Mathematics and Statistics, Faculty of Science, University of Melbourne, Melbourne, Victoria, Australia
| | - Robert Kendle
- Translational Genomics Research Institute, Phoenix, AZ, USA
| | - Mei-I Chung
- Translational Genomics Research Institute, Phoenix, AZ, USA
| | - David K Flaherty
- Flow Cytometry Shared Resource, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Brittany K Matlock
- Flow Cytometry Shared Resource, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Carla L Calvi
- Division of Allergy, Pulmonary and Critical Care Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Timothy S Blackwell
- Division of Allergy, Pulmonary and Critical Care Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Cell and Developmental Biology, Vanderbilt University, Nashville, TN, USA
- Department of Veterans Affairs Medical Center, Nashville, TN, USA
| | - Lorraine B Ware
- Division of Allergy, Pulmonary and Critical Care Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Matthew Bacchetta
- Department of Cardiac Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Rajat Walia
- Department of Thoracic Disease and Transplantation, Norton Thoracic Institute, Phoenix, AZ, USA
| | - Ciara M Shaver
- Division of Allergy, Pulmonary and Critical Care Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jonathan A Kropski
- Division of Allergy, Pulmonary and Critical Care Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Cell and Developmental Biology, Vanderbilt University, Nashville, TN, USA
- Department of Veterans Affairs Medical Center, Nashville, TN, USA
| | - Davis J McCarthy
- St. Vincent's Institute of Medical Research, Melbourne, Victoria, Australia
- Melbourne Integrative Genomics, University of Melbourne, Melbourne, Victoria, Australia
- School of Mathematics and Statistics, Faculty of Science, University of Melbourne, Melbourne, Victoria, Australia
| | | |
Collapse
|
5
|
Steward RA, Pruisscher P, Roberts KT, Wheat CW. Genetic constraints in genes exhibiting splicing plasticity in facultative diapause. Heredity (Edinb) 2024; 132:142-155. [PMID: 38291272 PMCID: PMC10923799 DOI: 10.1038/s41437-024-00669-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 12/28/2023] [Accepted: 01/03/2024] [Indexed: 02/01/2024] Open
Abstract
Phenotypic plasticity is produced and maintained by processes regulating the transcriptome. While differential gene expression is among the most important of these processes, relatively little is known about other sources of transcriptional variation. Previous work suggests that alternative splicing plays an extensive and functionally unique role in transcriptional plasticity, though plastically spliced genes may be more constrained than the remainder of expressed genes. In this study, we explore the relationship between expression and splicing plasticity, along with the genetic diversity in those genes, in an ecologically consequential polyphenism: facultative diapause. Using 96 samples spread over two tissues and 10 timepoints, we compare the extent of differential splicing and expression between diapausing and direct developing pupae of the butterfly Pieris napi. Splicing differs strongly between diapausing and direct developing trajectories but alters a smaller and functionally unique set of genes compared to differential expression. We further test the hypothesis that among these expressed loci, plastically spliced genes are likely to experience the strongest purifying selection to maintain seasonally plastic phenotypes. Genes with unique transcriptional changes through diapause consistently had the lowest nucleotide diversity, and this effect was consistently stronger among genes that were differentially spliced compared to those with just differential expression through diapause. Further, the strength of negative selection was higher in the population expressing diapause every generation. Our results suggest that maintenance of the molecular mechanisms involved in diapause progression, including post-transcriptional modifications, are highly conserved and likely to experience genetic constraints, especially in northern populations of P. napi.
Collapse
Affiliation(s)
- Rachel A Steward
- Zoology Department, Stockholm University, Stockholm, Sweden.
- Biology Department, Lund University, Lund, Sweden.
| | - Peter Pruisscher
- Zoology Department, Stockholm University, Stockholm, Sweden
- Science for Life Laboratory, Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Stockholm, Sweden
| | | | | |
Collapse
|
6
|
Seneviratne JA, Ho WWH, Glancy E, Eckersley-Maslin MA. A low-input high resolution sequential chromatin immunoprecipitation method captures genome-wide dynamics of bivalent chromatin. Epigenetics Chromatin 2024; 17:3. [PMID: 38336688 PMCID: PMC10858499 DOI: 10.1186/s13072-024-00527-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 01/11/2024] [Indexed: 02/12/2024] Open
Abstract
BACKGROUND Bivalent chromatin is an exemplar of epigenetic plasticity. This co-occurrence of active-associated H3K4me3 and inactive-associated H3K27me3 histone modifications on opposite tails of the same nucleosome occurs predominantly at promoters that are poised for future transcriptional upregulation or terminal silencing. We know little of the dynamics, resolution, and regulation of this chromatin state outside of embryonic stem cells where it was first described. This is partly due to the technical challenges distinguishing bone-fide bivalent chromatin, where both marks are on the same nucleosome, from allelic or sample heterogeneity where there is a mix of H3K4me3-only and H3K27me3-only mononucleosomes. RESULTS Here, we present a robust and sensitive method to accurately map bivalent chromatin genome-wide, along with controls, from as little as 2 million cells. We optimized and refined the sequential ChIP protocol which uses two sequential overnight immunoprecipitation reactions to robustly purify nucleosomes that are truly bivalent and contain both H3K4me3 and H3K27me3 modifications. Our method generates high quality genome-wide maps with strong peak enrichment and low background, which can be analyzed using standard bioinformatic packages. Using this method, we detect 8,789 bivalent regions in mouse embryonic stem cells corresponding to 3,918 predominantly CpG rich and developmentally regulated gene promoters. Furthermore, profiling Dppa2/4 knockout mouse embryonic stem cells, which lose both H3K4me3 and H3K27me3 at approximately 10% of bivalent promoters, demonstrated the ability of our method to capture bivalent chromatin dynamics. CONCLUSIONS Our optimized sequential reChIP method enables high-resolution genome-wide assessment of bivalent chromatin together with all required controls in as little as 2 million cells. We share a detailed protocol and guidelines that will enable bivalent chromatin landscapes to be generated in a range of cellular contexts, greatly enhancing our understanding of bivalent chromatin and epigenetic plasticity beyond embryonic stem cells.
Collapse
Affiliation(s)
- Janith A Seneviratne
- Peter MacCallum Cancer Centre, Melbourne, Victoria, 3000, Australia
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Victoria, 3010, Australia
| | - William W H Ho
- Peter MacCallum Cancer Centre, Melbourne, Victoria, 3000, Australia
| | - Eleanor Glancy
- Peter MacCallum Cancer Centre, Melbourne, Victoria, 3000, Australia
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Victoria, 3010, Australia
| | - Melanie A Eckersley-Maslin
- Peter MacCallum Cancer Centre, Melbourne, Victoria, 3000, Australia.
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Victoria, 3010, Australia.
- Department of Anatomy and Physiology, The University of Melbourne, Victoria, 3010, Australia.
| |
Collapse
|
7
|
Fair T, Pavlovic BJ, Schaefer NK, Pollen AA. Mapping cis- and trans-regulatory target genes of human-specific deletions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.27.573461. [PMID: 38234800 PMCID: PMC10793408 DOI: 10.1101/2023.12.27.573461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Deletion of functional sequence is predicted to represent a fundamental mechanism of molecular evolution1,2. Comparative genetic studies of primates2,3 have identified thousands of human-specific deletions (hDels), and the cis-regulatory potential of short (≤31 base pairs) hDels has been assessed using reporter assays4. However, how structural variant-sized (≥50 base pairs) hDels influence molecular and cellular processes in their native genomic contexts remains unexplored. Here, we design genome-scale libraries of single-guide RNAs targeting 7.2 megabases of sequence in 6,358 hDels and present a systematic CRISPR interference (CRISPRi) screening approach to identify hDels that modify cellular proliferation in chimpanzee pluripotent stem cells. By intersecting hDels with chromatin state features and performing single-cell CRISPRi (Perturb-seq) to identify their cis- and trans-regulatory target genes, we discovered 19 hDels controlling gene expression. We highlight two hDels, hDel_2247 and hDel_585, with tissue-specific activity in the liver and brain, respectively. Our findings reveal a molecular and cellular role for sequences lost in the human lineage and establish a framework for functionally interrogating human-specific genetic variants.
Collapse
Affiliation(s)
- Tyler Fair
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
- Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Bryan J Pavlovic
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Nathan K Schaefer
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Alex A Pollen
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| |
Collapse
|
8
|
Malinverni R, Corujo D, Gel B, Buschbeck M. regioneReloaded: evaluating the association of multiple genomic region sets. Bioinformatics 2023; 39:btad704. [PMID: 37988135 PMCID: PMC10681856 DOI: 10.1093/bioinformatics/btad704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 10/27/2023] [Accepted: 11/20/2023] [Indexed: 11/22/2023] Open
Abstract
MOTIVATION Next-generation sequencing methods continue improving the annotation of genomes in part by determining the distribution of features such as epigenetic marks. Evaluating and interpreting the association between genomic regions and their features has become a common and challenging analysis in genomic and epigenomic studies. RESULTS With regioneR we provided an R package allowing to assess the statistical significance of pairwise associations between genomic region sets using permutation tests. We now present the R package regioneReloaded that builds upon regioneR's statistical foundation and extends the functionality for the simultaneous analysis and visualization of the associations between multiple genomic region sets. Thus, we provide a novel discovery tool for the identification of significant associations that warrant to be tested for functional interdependence. AVAILABILITY AND IMPLEMENTATION regioneReloaded is an R package released under an Artistic-2.0 License. The source code and documentation are freely available through Bioconductor: http://www.bioconductor.org/packages/regioneReloaded.
Collapse
Affiliation(s)
- Roberto Malinverni
- Program of Myeloid Neoplasms, Program of Applied Epigenetics, Josep Carreras Leukaemia Research Institute (IJC), Campus Can Ruti Site, Badalona 08916, Spain
| | - David Corujo
- Program of Myeloid Neoplasms, Program of Applied Epigenetics, Josep Carreras Leukaemia Research Institute (IJC), Campus Can Ruti Site, Badalona 08916, Spain
| | - Bernat Gel
- Germans Trias i Pujol Research Institute (IGTP), Can Ruti Campus, Badalona, Barcelona 08916, Spain
| | - Marcus Buschbeck
- Program of Myeloid Neoplasms, Program of Applied Epigenetics, Josep Carreras Leukaemia Research Institute (IJC), Campus Can Ruti Site, Badalona 08916, Spain
- Germans Trias i Pujol Research Institute (IGTP), Can Ruti Campus, Badalona, Barcelona 08916, Spain
| |
Collapse
|
9
|
Maresca M, van den Brand T, Li H, Teunissen H, Davies J, de Wit E. Pioneer activity distinguishes activating from non-activating SOX2 binding sites. EMBO J 2023; 42:e113150. [PMID: 37691488 PMCID: PMC10577566 DOI: 10.15252/embj.2022113150] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Revised: 07/17/2023] [Accepted: 07/22/2023] [Indexed: 09/12/2023] Open
Abstract
Genome-wide transcriptional activity involves the binding of many transcription factors (TFs) to thousands of sites in the genome. Pioneer TFs are a class of TFs that maintain open chromatin and allow non-pioneer TFs access to their target sites. Determining which TF binding sites directly drive transcription remains a challenge. Here, we use acute protein depletion of the pioneer TF SOX2 to establish its functionality in maintaining chromatin accessibility. We show that thousands of accessible sites are lost within an hour of protein depletion, indicating rapid turnover of these sites in the absence of the pioneer factor. To understand the relationship with transcription, we performed nascent transcription analysis and found that open chromatin sites that are maintained by SOX2 are highly predictive of gene expression, in contrast to all other SOX2 binding sites. We use CRISPR-Cas9 genome editing in the Klf2 locus to functionally validate a predicted regulatory element. We conclude that the regulatory activity of SOX2 is exerted mainly at sites where it maintains accessibility and that other binding sites are largely dispensable for gene regulation.
Collapse
Affiliation(s)
- Michela Maresca
- Division of Gene RegulationThe Netherlands Cancer InstituteAmsterdamThe Netherlands
| | - Teun van den Brand
- Division of Gene RegulationThe Netherlands Cancer InstituteAmsterdamThe Netherlands
| | - Hangpeng Li
- MRC Molecular Haematology Unit, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of MedicineUniversity of OxfordOxfordUK
| | - Hans Teunissen
- Division of Gene RegulationThe Netherlands Cancer InstituteAmsterdamThe Netherlands
| | - James Davies
- MRC Molecular Haematology Unit, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of MedicineUniversity of OxfordOxfordUK
| | - Elzo de Wit
- Division of Gene RegulationThe Netherlands Cancer InstituteAmsterdamThe Netherlands
| |
Collapse
|
10
|
Mu W, Davis ES, Lee S, Dozmorov MG, Phanstiel DH, Love MI. bootRanges: flexible generation of null sets of genomic ranges for hypothesis testing. Bioinformatics 2023; 39:btad190. [PMID: 37042725 PMCID: PMC10159650 DOI: 10.1093/bioinformatics/btad190] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 01/06/2023] [Accepted: 03/28/2023] [Indexed: 04/13/2023] Open
Abstract
MOTIVATION Enrichment analysis is a widely utilized technique in genomic analysis that aims to determine if there is a statistically significant association between two sets of genomic features. To conduct this type of hypothesis testing, an appropriate null model is typically required. However, the null distribution that is commonly used can be overly simplistic and may result in inaccurate conclusions. RESULTS bootRanges provides fast functions for generation of block bootstrapped genomic ranges representing the null hypothesis in enrichment analysis. As part of a modular workflow, bootRanges offers greater flexibility for computing various test statistics leveraging other Bioconductor packages. We show that shuffling or permutation schemes may result in overly narrow test statistic null distributions and over-estimation of statistical significance, while creating new range sets with a block bootstrap preserves local genomic correlation structure and generates more reliable null distributions. It can also be used in more complex analyses, such as accessing correlations between cis-regulatory elements (CREs) and genes across cell types or providing optimized thresholds, e.g. log fold change (logFC) from differential analysis. AVAILABILITY AND IMPLEMENTATION bootRanges is freely available in the R/Bioconductor package nullranges hosted at https://bioconductor.org/packages/nullranges.
Collapse
Affiliation(s)
- Wancen Mu
- Department of Biostatistics, University of North Carolina, Chapel Hill 27514, United States
| | - Eric S Davis
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill 27514, United States
| | - Stuart Lee
- Genentech, South San Francisco, Western California 94080, United States
| | - Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA 23284, United States
- Department of Pathology, Virginia Commonwealth University, Richmond, VA 23284, United States
| | - Douglas H Phanstiel
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill 27514, United States
- Thurston Arthritis Research Center, University of North Carolina, Chapel Hill 27514, United States
- Department of Cell Biology and Physiology, University of North Carolina, Chapel Hill 27514, United States
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill 27514, United States
- Curriculum in Genetics and Molecular Biology, University of North Carolina, Chapel Hill 27514, United States
| | - Michael I Love
- Department of Biostatistics, University of North Carolina, Chapel Hill 27514, United States
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill 27514, United States
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill 27514, United States
- Department of Genetics, University of North Carolina, Chapel Hill 27514, United States
| |
Collapse
|