1
|
Hill SL, Rogan PK, Wang YX, Knoll JHM. Differentially accessible, single copy sequences form contiguous domains along metaphase chromosomes that are conserved among multiple tissues. Mol Cytogenet 2021; 14:49. [PMID: 34670606 PMCID: PMC8527651 DOI: 10.1186/s13039-021-00567-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 09/08/2021] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND During mitosis, chromatin engages in a dynamic cycle of condensation and decondensation. Condensation into distinct units to ensure high fidelity segregation is followed by rapid and reproducible decondensation to produce functional daughter cells. Factors contributing to the reproducibility of chromatin structure between cell generations are not well understood. We investigated local metaphase chromosome condensation along mitotic chromosomes within genomic intervals showing differential accessibility (DA) between homologs. DA was originally identified using short sequence-defined single copy (sc) DNA probes of < 5 kb in length by fluorescence in situ hybridization (scFISH) in peripheral lymphocytes. These structural differences between metaphase homologs are non-random, stable, and heritable epigenetic marks which have led to the proposed function of DA as a marker of chromatin memory. Here, we characterize the organization of DA intervals into chromosomal domains by identifying multiple DA loci in close proximity to each other and examine the conservation of DA between tissues. RESULTS We evaluated multiple adjacent scFISH probes at 6 different DA loci from chromosomal regions 2p23, 3p24, 12p12, 15q22, 15q24 and 20q13 within peripheral blood T-lymphocytes. DA was organized within domains that extend beyond the defined boundaries of individual scFISH probes. Based on hybridizations of 2 to 4 scFISH probes per domain, domains ranged in length from 16.0 kb to 129.6 kb. Transcriptionally inert chromosomal DA regions in T-lymphocytes also demonstrated conservation of DA in bone marrow and fibroblast cells. CONCLUSIONS We identified novel chromosomal regions with allelic differences in metaphase chromosome accessibility and demonstrated that these accessibility differences appear to be aggregated into contiguous domains extending beyond individual scFISH probes. These domains are encompassed by previously established topologically associated domain (TAD) boundaries. DA appears to be a conserved feature of human metaphase chromosomes across different stages of lymphocyte differentiation and germ cell origin, consistent with its proposed role in maintenance of intergenerational cellular chromosome memory.
Collapse
Affiliation(s)
- Seana L Hill
- Department of Pathology & Laboratory Medicine, Schulich School of Medicine & Dentistry, University of Western Ontario, London, Canada
| | - Peter K Rogan
- Departments of Biochemistry and Oncology, Schulich School of Medicine & Dentistry, University of Western Ontario, London, Canada
- Cytognomix Inc., London, ON, Canada
| | - Yi Xuan Wang
- Department of Pathology & Laboratory Medicine, Schulich School of Medicine & Dentistry, University of Western Ontario, London, Canada
| | - Joan H M Knoll
- Department of Pathology & Laboratory Medicine, Schulich School of Medicine & Dentistry, University of Western Ontario, London, Canada.
- Cytognomix Inc., London, ON, Canada.
| |
Collapse
|
2
|
Zrimec J, Buric F, Kokina M, Garcia V, Zelezniak A. Learning the Regulatory Code of Gene Expression. Front Mol Biosci 2021; 8:673363. [PMID: 34179082 PMCID: PMC8223075 DOI: 10.3389/fmolb.2021.673363] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 05/24/2021] [Indexed: 11/13/2022] Open
Abstract
Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology.
Collapse
Affiliation(s)
- Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Filip Buric
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Mariia Kokina
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Victor Garcia
- School of Life Sciences and Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
| | - Aleksej Zelezniak
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Science for Life Laboratory, Stockholm, Sweden
| |
Collapse
|
3
|
Floc'hlay S, Wong ES, Zhao B, Viales RR, Thomas-Chollier M, Thieffry D, Garfield DA, Furlong EEM. Cis-acting variation is common across regulatory layers but is often buffered during embryonic development. Genome Res 2021; 31:211-224. [PMID: 33310749 PMCID: PMC7849415 DOI: 10.1101/gr.266338.120] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 12/09/2020] [Indexed: 12/14/2022]
Abstract
Precise patterns of gene expression are driven by interactions between transcription factors, regulatory DNA sequences, and chromatin. How DNA mutations affecting any one of these regulatory "layers" are buffered or propagated to gene expression remains unclear. To address this, we quantified allele-specific changes in chromatin accessibility, histone modifications, and gene expression in F1 embryos generated from eight Drosophila crosses at three embryonic stages, yielding a comprehensive data set of 240 samples spanning multiple regulatory layers. Genetic variation (allelic imbalance) impacts gene expression more frequently than chromatin features, with metabolic and environmental response genes being most often affected. Allelic imbalance in cis-regulatory elements (enhancers) is common and highly heritable, yet its functional impact does not generally propagate to gene expression. When it does, genetic variation impacts RNA levels through two alternative mechanisms involving either H3K4me3 or chromatin accessibility and H3K27ac. Changes in RNA are more predictive of variation in H3K4me3 than vice versa, suggesting a role for H3K4me3 downstream from transcription. The impact of a substantial proportion of genetic variation is consistent across embryonic stages, with 50% of allelic imbalanced features at one stage being also imbalanced at subsequent developmental stages. Crucially, buffering, as well as the magnitude and evolutionary impact of genetic variants, is influenced by regulatory complexity (i.e., number of enhancers regulating a gene), with transcription factors being most robust to cis-acting, but most influenced by trans-acting, variation.
Collapse
Affiliation(s)
- Swann Floc'hlay
- Institut de Biologie de l'ENS (IBENS), École Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Emily S Wong
- Molecular, Structural and Computational Biology Division, Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales 2010, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, New South Wales 2052, Australia
| | - Bingqing Zhao
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany
| | - Rebecca R Viales
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany
| | - Morgane Thomas-Chollier
- Institut de Biologie de l'ENS (IBENS), École Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
- Institut Universitaire de France (IUF), 75005 Paris, France
| | - Denis Thieffry
- Institut de Biologie de l'ENS (IBENS), École Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - David A Garfield
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany
| | - Eileen E M Furlong
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany
| |
Collapse
|
4
|
Doria-Belenguer S, Youssef MK, Böttcher R, Malod-Dognin N, Pržulj N. Probabilistic graphlets capture biological function in probabilistic molecular networks. Bioinformatics 2020; 36:i804-i812. [PMID: 33381834 DOI: 10.1093/bioinformatics/btaa812] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/08/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Molecular interactions have been successfully modeled and analyzed as networks, where nodes represent molecules and edges represent the interactions between them. These networks revealed that molecules with similar local network structure also have similar biological functions. The most sensitive measures of network structure are based on graphlets. However, graphlet-based methods thus far are only applicable to unweighted networks, whereas real-world molecular networks may have weighted edges that can represent the probability of an interaction occurring in the cell. This information is commonly discarded when applying thresholds to generate unweighted networks, which may lead to information loss. RESULTS We introduce probabilistic graphlets as a tool for analyzing the local wiring patterns of probabilistic networks. To assess their performance compared to unweighted graphlets, we generate synthetic networks based on different well-known random network models and edge probability distributions and demonstrate that probabilistic graphlets outperform their unweighted counterparts in distinguishing network structures. Then we model different real-world molecular interaction networks as weighted graphs with probabilities as weights on edges and we analyze them with our new weighted graphlets-based methods. We show that due to their probabilistic nature, probabilistic graphlet-based methods more robustly capture biological information in these data, while simultaneously showing a higher sensitivity to identify condition-specific functions compared to their unweighted graphlet-based method counterparts. AVAILABILITYAND IMPLEMENTATION Our implementation of probabilistic graphlets is available at https://github.com/Serdobe/Probabilistic_Graphlets. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sergio Doria-Belenguer
- Barcelona Supercomputing Center, Barcelona 08034, Spain.,Universitat Politècnica de Catalunya (UPC), Barcelona 08034, Spain
| | - Markus K Youssef
- Barcelona Supercomputing Center, Barcelona 08034, Spain.,Universitat Politècnica de Catalunya (UPC), Barcelona 08034, Spain
| | - René Böttcher
- Barcelona Supercomputing Center, Barcelona 08034, Spain
| | - Noël Malod-Dognin
- Barcelona Supercomputing Center, Barcelona 08034, Spain.,Department of Computer Science, University College London, London WC1E 6BT, UK
| | - Nataša Pržulj
- Barcelona Supercomputing Center, Barcelona 08034, Spain.,Department of Computer Science, University College London, London WC1E 6BT, UK.,ICREA, Barcelona 08010, Spain
| |
Collapse
|
5
|
Rogan PK, Mucaki EJ, Shirley BC. A proposed molecular mechanism for pathogenesis of severe RNA-viral pulmonary infections. F1000Res 2020; 9:943. [PMID: 33299552 PMCID: PMC7676395 DOI: 10.12688/f1000research.25390.1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/23/2020] [Indexed: 12/19/2022] Open
Abstract
Background: Certain riboviruses can cause severe pulmonary complications leading to death in some infected patients. We propose that DNA damage induced-apoptosis accelerates viral release, triggered by depletion of host RNA binding proteins (RBPs) from nuclear RNA bound to replicating viral sequences. Methods: Information theory-based analysis of interactions between RBPs and individual sequences in the Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2), Influenza A (H3N1), HIV-1, and Dengue genomes identifies strong RBP binding sites in these viral genomes. Replication and expression of viral sequences is expected to increasingly sequester RBPs - SRSF1 and RNPS1. Ordinarily, RBPs bound to nascent host transcripts prevents their annealing to complementary DNA. Their depletion induces destabilizing R-loops. Chromosomal breakage occurs when an excess of unresolved R-loops collide with incoming replication forks, overwhelming the DNA repair machinery. We estimated stoichiometry of inhibition of RBPs in host nuclear RNA by counting competing binding sites in replicating viral genomes and host RNA. Results: Host RBP binding sites are frequent and conserved among different strains of RNA viral genomes. Similar binding motifs of SRSF1 and RNPS1 explain why DNA damage resulting from SRSF1 depletion is complemented by expression of RNPS1. Clustering of strong RBP binding sites coincides with the distribution of RNA-DNA hybridization sites across the genome. SARS-CoV-2 replication is estimated to require 32.5-41.8 hours to effectively compete for binding of an equal proportion of SRSF1 binding sites in host encoded nuclear RNAs. Significant changes in expression of transcripts encoding DNA repair and apoptotic proteins were found in an analysis of influenza A and Dengue-infected cells in some individuals. Conclusions: R-loop-induced apoptosis indirectly resulting from viral replication could release significant quantities of membrane-associated virions into neighboring alveoli. These could infect adjacent pneumocytes and other tissues, rapidly compromising lung function, causing multiorgan system failure and other described symptoms.
Collapse
Affiliation(s)
- Peter K. Rogan
- Biochemistry, University of Western Ontario, London, Ontario, N6A 2C8, Canada
- CytoGnomix Inc, London, Ontario, N5X 3X5, Canada
| | - Eliseos J. Mucaki
- Biochemistry, University of Western Ontario, London, Ontario, N6A 2C8, Canada
| | | |
Collapse
|
6
|
Rogan PK, Mucaki EJ, Shirley BC. A proposed molecular mechanism for pathogenesis of severe RNA-viral pulmonary infections. F1000Res 2020; 9:943. [PMID: 33299552 PMCID: PMC7676395 DOI: 10.12688/f1000research.25390.2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/16/2020] [Indexed: 12/19/2022] Open
Abstract
Background: Certain riboviruses can cause severe pulmonary complications leading to death in some infected patients. We propose that DNA damage induced-apoptosis accelerates viral release, triggered by depletion of host RNA binding proteins (RBPs) from nuclear RNA bound to replicating viral sequences. Methods: Information theory-based analysis of interactions between RBPs and individual sequences in the Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2), Influenza A (H3N2), HIV-1, and Dengue genomes identifies strong RBP binding sites in these viral genomes. Replication and expression of viral sequences is expected to increasingly sequester RBPs - SRSF1 and RNPS1. Ordinarily, RBPs bound to nascent host transcripts prevents their annealing to complementary DNA. Their depletion induces destabilizing R-loops. Chromosomal breakage occurs when an excess of unresolved R-loops collide with incoming replication forks, overwhelming the DNA repair machinery. We estimated stoichiometry of inhibition of RBPs in host nuclear RNA by counting competing binding sites in replicating viral genomes and host RNA. Results: Host RBP binding sites are frequent and conserved among different strains of RNA viral genomes. Similar binding motifs of SRSF1 and RNPS1 explain why DNA damage resulting from SRSF1 depletion is complemented by expression of RNPS1. Clustering of strong RBP binding sites coincides with the distribution of RNA-DNA hybridization sites across the genome. SARS-CoV-2 replication is estimated to require 32.5-41.8 hours to effectively compete for binding of an equal proportion of SRSF1 binding sites in host encoded nuclear RNAs. Significant changes in expression of transcripts encoding DNA repair and apoptotic proteins were found in an analysis of influenza A and Dengue-infected cells in some individuals. Conclusions: R-loop-induced apoptosis indirectly resulting from viral replication could release significant quantities of membrane-associated virions into neighboring alveoli. These could infect adjacent pneumocytes and other tissues, rapidly compromising lung function, causing multiorgan system failure and other described symptoms.
Collapse
Affiliation(s)
- Peter K. Rogan
- Biochemistry, University of Western Ontario, London, Ontario, N6A 2C8, Canada
- CytoGnomix Inc, London, Ontario, N5X 3X5, Canada
| | - Eliseos J. Mucaki
- Biochemistry, University of Western Ontario, London, Ontario, N6A 2C8, Canada
| | | |
Collapse
|
7
|
Rogan PK, Mucaki EJ, Lu R, Shirley BC, Waller E, Knoll JHM. Meeting radiation dosimetry capacity requirements of population-scale exposures by geostatistical sampling. PLoS One 2020; 15:e0232008. [PMID: 32330192 PMCID: PMC7182271 DOI: 10.1371/journal.pone.0232008] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 04/06/2020] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Accurate radiation dose estimates are critical for determining eligibility for therapies by timely triaging of exposed individuals after large-scale radiation events. However, the universal assessment of a large population subjected to a nuclear spill incident or detonation is not feasible. Even with high-throughput dosimetry analysis, test volumes far exceed the capacities of first responders to measure radiation exposures directly, or to acquire and process samples for follow-on biodosimetry testing. AIM To significantly reduce data acquisition and processing requirements for triaging of treatment-eligible exposures in population-scale radiation incidents. METHODS Physical radiation plumes modelled nuclear detonation scenarios of simulated exposures at 22 US locations. Models assumed only location of the epicenter and historical, prevailing wind directions/speeds. The spatial boundaries of graduated radiation exposures were determined by targeted, multistep geostatistical analysis of small population samples. Initially, locations proximate to these sites were randomly sampled (generally 0.1% of population). Empirical Bayesian kriging established radiation dose contour levels circumscribing these sites. Densification of each plume identified critical locations for additional sampling. After repeated kriging and densification, overlapping grids between each pair of contours of successive plumes were compared based on their diagonal Bray-Curtis distances and root-mean-square deviations, which provided criteria (<10% difference) to discontinue sampling. RESULTS/CONCLUSIONS We modeled 30 scenarios, including 22 urban/high-density and 2 rural/low-density scenarios under various weather conditions. Multiple (3-10) rounds of sampling and kriging were required for the dosimetry maps to converge, requiring between 58 and 347 samples for different scenarios. On average, 70±10% of locations where populations are expected to receive an exposure ≥2Gy were identified. Under sub-optimal sampling conditions, the number of iterations and samples were increased, and accuracy was reduced. Geostatistical mapping limits the number of required dose assessments, the time required, and radiation exposure to first responders. Geostatistical analysis will expedite triaging of acute radiation exposure in population-scale nuclear events.
Collapse
Affiliation(s)
- Peter K Rogan
- Department of Biochemistry, Schulich School of Medicine & Dentistry, University of Western Ontario, London, ON, Canada
- CytoGnomix Inc, London, ON, Canada
| | - Eliseos J Mucaki
- Department of Biochemistry, Schulich School of Medicine & Dentistry, University of Western Ontario, London, ON, Canada
| | - Ruipeng Lu
- Department of Pathology and Laboratory Medicine, Schulich School of Medicine & Dentistry, University of Western Ontario, London, ON, Canada
| | | | - Edward Waller
- Faculty of Energy Systems and Nuclear Science, OntarioTech University, Canada
| | - Joan H M Knoll
- CytoGnomix Inc, London, ON, Canada
- Department of Pathology and Laboratory Medicine, Schulich School of Medicine & Dentistry, University of Western Ontario, London, ON, Canada
| |
Collapse
|
8
|
Devyatkin VA, Muraleva NA, Kolosova NG. Identification of Single-Nucleotide Polymorphisms in Mitochondria-Associated Genes Capable of Affecting the Development of Hypertrophic Cardiomyopathy in Senescence-Accelerated OXYS Rats. ADVANCES IN GERONTOLOGY 2020. [DOI: 10.1134/s2079057020020058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
9
|
Mucaki EJ, Shirley BC, Rogan PK. Expression Changes Confirm Genomic Variants Predicted to Result in Allele-Specific, Alternative mRNA Splicing. Front Genet 2020; 11:109. [PMID: 32211018 PMCID: PMC7066660 DOI: 10.3389/fgene.2020.00109] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 01/30/2020] [Indexed: 12/11/2022] Open
Abstract
Splice isoform structure and abundance can be affected by either noncoding or masquerading coding variants that alter the structure or abundance of transcripts. When these variants are common in the population, these nonconstitutive transcripts are sufficiently frequent so as to resemble naturally occurring, alternative mRNA splicing. Prediction of the effects of such variants has been shown to be accurate using information theory-based methods. Single nucleotide polymorphisms (SNPs) predicted to significantly alter natural and/or cryptic splice site strength were shown to affect gene expression. Splicing changes for known SNP genotypes were confirmed in HapMap lymphoblastoid cell lines with gene expression microarrays and custom designed q-RT-PCR or TaqMan assays. The majority of these SNPs (15 of 22) as well as an independent set of 24 variants were then subjected to RNAseq analysis using the ValidSpliceMut web beacon (http://validsplicemut.cytognomix.com), which is based on data from the Cancer Genome Atlas and International Cancer Genome Consortium. SNPs from different genes analyzed with gene expression microarray and q-RT-PCR exhibited significant changes in affected splice site use. Thirteen SNPs directly affected exon inclusion and 10 altered cryptic site use. Homozygous SNP genotypes resulting in stronger splice sites exhibited higher levels of processed mRNA than alleles associated with weaker sites. Four SNPs exhibited variable expression among individuals with the same genotypes, masking statistically significant expression differences between alleles. Genome-wide information theory and expression analyses (RNAseq) in tumor exomes and genomes confirmed splicing effects for 7 of the HapMap SNP and 14 SNPs identified from tumor genomes. q-RT-PCR resolved rare splice isoforms with read abundance too low for statistical significance in ValidSpliceMut. Nevertheless, the web-beacon provides evidence of unanticipated splicing outcomes, for example, intron retention due to compromised recognition of constitutive splice sites. Thus, ValidSpliceMut and q-RT-PCR represent complementary resources for identification of allele-specific, alternative splicing.
Collapse
Affiliation(s)
- Eliseos J Mucaki
- Department of Biochemistry, University of Western Ontario, London, ON, Canada
| | | | - Peter K Rogan
- Department of Biochemistry, University of Western Ontario, London, ON, Canada.,CytoGnomix, London, ON, Canada.,Department of Oncology University of Western Ontario, London, ON, Canada.,Department of Computer Science, University of Western Ontario, London, ON, Canada
| |
Collapse
|