1
|
Infection induced inflammation impairs wound healing through IL-1β signaling. iScience 2024; 27:109532. [PMID: 38577110 PMCID: PMC10993181 DOI: 10.1016/j.isci.2024.109532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 01/14/2024] [Accepted: 03/16/2024] [Indexed: 04/06/2024] Open
Abstract
Wound healing is impaired by infection; however, how microbe-induced inflammation modulates tissue repair remains unclear. We took advantage of the optical transparency of zebrafish and a genetically tractable microbe, Listeria monocytogenes, to probe the role of infection and inflammation in wound healing. Infection with bacteria engineered to activate the inflammasome, Lm-Pyro, induced persistent inflammation and impaired healing despite low bacterial burden. Inflammatory infections induced il1b expression and blocking IL-1R signaling partially rescued wound healing in the presence of persistent infection. We found a critical window of microbial clearance necessary to limit persistent inflammation and enable efficient wound repair. Taken together, our findings suggest that the dynamics of microbe-induced tissue inflammation impacts repair in complex tissue damage independent of bacterial load, with a critical early window for efficient tissue repair.
Collapse
|
2
|
Cell Type-Specific Transcriptome Profiling Reveals a Role for Thioredoxin During Tumor Initiation. Front Immunol 2022; 13:818893. [PMID: 35250998 PMCID: PMC8891495 DOI: 10.3389/fimmu.2022.818893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 01/25/2022] [Indexed: 01/27/2023] Open
Abstract
Neutrophils in the tumor microenvironment exhibit altered functions. However, the changes in neutrophil behavior during tumor initiation remain unclear. Here we used Translating Ribosomal Affinity Purification (TRAP) and RNA sequencing to identify neutrophil, macrophage and transformed epithelial cell transcriptional changes induced by oncogenic RasG12V in larval zebrafish. We found that transformed epithelial cells and neutrophils, but not macrophages, had significant changes in gene expression in larval zebrafish. Interestingly, neutrophils had more significantly down-regulated genes, whereas gene expression was primarily upregulated in transformed epithelial cells. The antioxidant, thioredoxin (txn), a small thiol that regulates reduction-oxidation (redox) balance, was upregulated in transformed keratinocytes and neutrophils in response to oncogenic Ras. To determine the role of thioredoxin during tumor initiation, we generated a zebrafish thioredoxin mutant. We observed an increase in wound-induced reactive oxygen species signaling and neutrophil recruitment in thioredoxin-deficient zebrafish. Transformed keratinocytes also showed increased proliferation and reduced apoptosis in thioredoxin-deficient larvae. Using live imaging, we visualized neutrophil behavior near transformed cells and found increased neutrophil recruitment and altered motility dynamics. Finally, in the absence of neutrophils, transformed keratinocytes no longer exhibited increased proliferation in thioredoxin mutants. Taken together, our findings demonstrate that tumor initiation induces changes in neutrophil gene expression and behavior that can impact proliferation of transformed cells in the early tumor microenvironment.
Collapse
|
3
|
Abstract
Cell type annotation is important in the analysis of single-cell RNA-seq data. CellO is a machine-learning-based tool for annotating cells using the Cell Ontology, a rich hierarchy of known cell types. We provide a protocol for using the CellO Python package to annotate human cells. We demonstrate how to use CellO in conjunction with Scanpy, a Python library for performing single-cell analysis, annotate a lung tissue data set, interpret its hierarchically structured cell type annotations, and create publication-ready figures. For complete details on the use and execution of this protocol, please refer to Bernstein et al. (2021). CellO is a Python package for annotating cell types in single-cell RNA-seq data CellO classifies cells against the hierarchically structured Cell Ontology CellO can be integrated into single-cell analysis pipelines implemented with Scanpy We present a tutorial that classifies cells in an existing lung tumor data set
Collapse
|
4
|
RNA-regulatory exosome complex confers cellular survival to promote erythropoiesis. Nucleic Acids Res 2021; 49:9007-9025. [PMID: 34059908 PMCID: PMC8450083 DOI: 10.1093/nar/gkab367] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 03/29/2021] [Accepted: 05/27/2021] [Indexed: 01/03/2023] Open
Abstract
Cellular differentiation requires vast remodeling of transcriptomes, and therefore machinery mediating remodeling controls differentiation. Relative to transcriptional mechanisms governing differentiation, post-transcriptional processes are less well understood. As an important post-transcriptional determinant of transcriptomes, the RNA exosome complex (EC) mediates processing and/or degradation of select RNAs. During erythropoiesis, the erythroid transcription factor GATA1 represses EC subunit genes. Depleting EC structural subunits prior to GATA1-mediated repression is deleterious to erythroid progenitor cells. To assess the importance of the EC catalytic subunits Dis3 and Exosc10 in this dynamic process, we asked if these subunits function non-redundantly to control erythropoiesis. Dis3 or Exosc10 depletion in primary murine hematopoietic progenitor cells reduced erythroid progenitors and their progeny, while sparing myeloid cells. Dis3 loss severely compromised erythroid progenitor and erythroblast survival, rendered erythroblasts hypersensitive to apoptosis-inducing stimuli and induced γ-H2AX, indicative of DNA double-stranded breaks. Dis3 loss-of-function phenotypes were more severe than those caused by Exosc10 depletion. We innovated a genetic rescue system to compare human Dis3 with multiple myeloma-associated Dis3 mutants S447R and R750K, and only wild type Dis3 was competent to rescue progenitors. Thus, Dis3 establishes a disease mutation-sensitive, cell type-specific survival mechanism to enable a differentiation program.
Collapse
|
5
|
PLK1 and NOTCH Positively Correlate in Melanoma and Their Combined Inhibition Results in Synergistic Modulations of Key Melanoma Pathways. Mol Cancer Ther 2021; 20:161-172. [PMID: 33177155 PMCID: PMC7790869 DOI: 10.1158/1535-7163.mct-20-0654] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Revised: 09/24/2020] [Accepted: 10/23/2020] [Indexed: 11/16/2022]
Abstract
Melanoma is one of the most serious forms of skin cancer, and its increasing incidence coupled with nonlasting therapeutic options for metastatic disease highlights the need for additional novel approaches for its management. In this study, we determined the potential interactions between polo-like kinase 1 (PLK1, a serine/threonine kinase involved in mitotic regulation) and NOTCH1 (a type I transmembrane protein deciding cell fate during development) in melanoma. Employing an in-house human melanoma tissue microarray (TMA) containing multiple cases of melanomas and benign nevi, coupled with high-throughput, multispectral quantitative fluorescence imaging analysis, we found a positive correlation between PLK1 and NOTCH1 in melanoma. Furthermore, The Cancer Genome Atlas database analysis of patients with melanoma showed an association of higher mRNA levels of PLK1 and NOTCH1 with poor overall, as well as disease-free, survival. Next, utilizing small-molecule inhibitors of PLK1 and NOTCH (BI 6727 and MK-0752, respectively), we found a synergistic antiproliferative response of combined treatment in multiple human melanoma cells. To determine the molecular targets of the overall and synergistic responses of combined PLK1 and NOTCH inhibition, we conducted RNA-sequencing analysis employing a unique regression model with interaction terms. We identified the modulations of several key genes relevant to melanoma progression/metastasis, including MAPK, PI3K, and RAS, as well as some new genes such as Apobec3G, BTK, and FCER1G, which have not been well studied in melanoma. In conclusion, our study demonstrated a synergistic antiproliferative response of concomitant targeting of PLK1 and NOTCH in melanoma, unraveling a potential novel therapeutic approach for detailed preclinical/clinical evaluation.
Collapse
|
6
|
CellO: comprehensive and hierarchical cell type classification of human cells with the Cell Ontology. iScience 2020; 24:101913. [PMID: 33364592 PMCID: PMC7753962 DOI: 10.1016/j.isci.2020.101913] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 10/28/2020] [Accepted: 12/02/2020] [Indexed: 12/15/2022] Open
Abstract
Cell type annotation is a fundamental task in the analysis of single-cell RNA-sequencing data. In this work, we present CellO, a machine learning-based tool for annotating human RNA-seq data with the Cell Ontology. CellO enables accurate and standardized cell type classification of cell clusters by considering the rich hierarchical structure of known cell types. Furthermore, CellO comes pre-trained on a comprehensive data set of human, healthy, untreated primary samples in the Sequence Read Archive. CellO's comprehensive training set enables it to run out of the box on diverse cell types and achieves competitive or even superior performance when compared to existing state-of-the-art methods. Lastly, CellO's linear models are easily interpreted, thereby enabling exploration of cell-type-specific expression signatures across the ontology. To this end, we also present the CellO Viewer: a web application for exploring CellO's models across the ontology.
Collapse
|
7
|
PRAM: a novel pooling approach for discovering intergenic transcripts from large-scale RNA sequencing experiments. Genome Res 2020; 30:1655-1666. [PMID: 32958497 PMCID: PMC7605252 DOI: 10.1101/gr.252445.119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2019] [Accepted: 08/27/2020] [Indexed: 11/25/2022]
Abstract
Publicly available RNA-seq data is routinely used for retrospective analysis to elucidate new biology. Novel transcript discovery enabled by joint analysis of large collections of RNA-seq data sets has emerged as one such analysis. Current methods for transcript discovery rely on a '2-Step' approach where the first step encompasses building transcripts from individual data sets, followed by the second step that merges predicted transcripts across data sets. To increase the power of transcript discovery from large collections of RNA-seq data sets, we developed a novel '1-Step' approach named Pooling RNA-seq and Assembling Models (PRAM) that builds transcript models from pooled RNA-seq data sets. We demonstrate in a computational benchmark that 1-Step outperforms 2-Step approaches in predicting overall transcript structures and individual splice junctions, while performing competitively in detecting exonic nucleotides. Applying PRAM to 30 human ENCODE RNA-seq data sets identified unannotated transcripts with epigenetic and RAMPAGE signatures similar to those of recently annotated transcripts. In a case study, we discovered and experimentally validated new transcripts through the application of PRAM to mouse hematopoietic RNA-seq data sets. We uncovered new transcripts that share a differential expression pattern with a neighboring gene Pik3cg implicated in human hematopoietic phenotypes, and we provided evidence for the conservation of this relationship in human. PRAM is implemented as an R/Bioconductor package.
Collapse
|
8
|
Abstract 222: RNA-seq analysis of differential gene expression in melanoma cells after combined inhibition of Plk1 and Notch. Cancer Res 2020. [DOI: 10.1158/1538-7445.am2020-222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Melanoma is one of the most brutal forms of skin cancer, and its increasing incidence coupled with non-lasting therapeutic options for metastatic tumor highlight the need for additional strategies for the management of this neoplasm. Using tissue microarray analysis, we previously found that the expression of polo-like kinase 1 (Plk1, a serine/threonine kinase involved in mitotic regulation) and Notch1 (a type I transmembrane protein deciding cell fate during development) were positively correlated in melanoma (Cancer Res 2018; 78 [13 Suppl]: Abstract nr 2530), and their combined inhibition resulted in a synergistic anti-proliferative response in human melanoma cells (Cancer Res 2019; 79 [13 Suppl]: Abstract nr 302). In this study, to determine the possible mechanisms behind this observed synergism, we used RNA-seq technology to obtain the differential gene expression following treatment of SK-MEL-2 human metastatic melanoma cells with Plk1 inhibitor volasertib (BI6727, 20 nM) and Notch1 inhibitor MK-0752 (100 μM) for 48 h. After data pre-processing by RSEM algorithm, the DESeq2 package was implemented to identify differentially expressed genes (DEGs, |log2-fold change| >= 1, false positive rate ⇐ 0.05) when comparing the individual and combined treatments to vehicle (DMSO), as well as the interaction between volasertib:MK-0752. As a result, we identified 909 DEGs from volasertib treatment, 675 DEGs from MK-0752 treatment, 2142 genes from the combined treatment of volasertib and MK-0752, as well as 304 DEGs from the interaction of volasertib and MK-0752. In addition, employing GOstats and KEGGprofile packages in R programming, we conducted Gene Ontology (GO) and KEGG pathway analysis of the various DEGs. In GO analysis (counts >= 2, p ⇐ 10−5), we identified 202 downregulated GO terms affected by the combined inhibition of Plk1 and Notch1, including metabolism, cell proliferation, and migration. In KEGG pathway analysis, the combined inhibition of Plk1 and Notch1 was found to be associated with downregulation of several pathways shared with single drug treatments, such as PI3K-Akt, extracellular matrix receptor interaction, and protein digestion and absorption, as well as some novel pathways that were only affected by combined treatment, such as MAPK, Ras, and Rap1 pathways. Interestingly, our analysis predicted that the combined inhibition of Plk1 and Notch may make the melanoma cells more sensitive to immune responses. Overall, our data demonstrated that not only does targeting both Plk1 and Notch1 signaling pathways alters multiple melanoma progression pathways, but it may also potentially result in an increased sensitivity to other therapeutic targets, such as immune checkpoint blockade. However, these mechanistic findings need to be validated further in other relevant in vitro and in vivo models.
Citation Format: Shengqin Su, Gagan Chhabra, Mary A. Ndiaye, Chandra K. Singh, Colin N. Dewey, Nihal Ahmad. RNA-seq analysis of differential gene expression in melanoma cells after combined inhibition of Plk1 and Notch [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 222.
Collapse
|
9
|
Giant Island Mice Exhibit Widespread Gene Expression Changes in Key Metabolic Organs. Genome Biol Evol 2020; 12:1277-1301. [PMID: 32531054 PMCID: PMC7487164 DOI: 10.1093/gbe/evaa118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/06/2020] [Indexed: 12/02/2022] Open
Abstract
Island populations repeatedly evolve extreme body sizes, but the genomic basis of this pattern remains largely unknown. To understand how organisms on islands evolve gigantism, we compared genome-wide patterns of gene expression in Gough Island mice, the largest wild house mice in the world, and mainland mice from the WSB/EiJ wild-derived inbred strain. We used RNA-seq to quantify differential gene expression in three key metabolic organs: gonadal adipose depot, hypothalamus, and liver. Between 4,000 and 8,800 genes were significantly differentially expressed across the evaluated organs, representing between 20% and 50% of detected transcripts, with 20% or more of differentially expressed transcripts in each organ exhibiting expression fold changes of at least 2×. A minimum of 73 candidate genes for extreme size evolution, including Irs1 and Lrp1, were identified by considering differential expression jointly with other data sets: 1) genomic positions of published quantitative trait loci for body weight and growth rate, 2) whole-genome sequencing of 16 wild-caught Gough Island mice that revealed fixed single-nucleotide differences between the strains, and 3) publicly available tissue-specific regulatory elements. Additionally, patterns of differential expression across three time points in the liver revealed that Arid5b potentially regulates hundreds of genes. Functional enrichment analyses pointed to cell cycling, mitochondrial function, signaling pathways, inflammatory response, and nutrient metabolism as potential causes of weight accumulation in Gough Island mice. Collectively, our results indicate that extensive gene regulatory evolution in metabolic organs accompanied the rapid evolution of gigantism during the short time house mice have inhabited Gough Island.
Collapse
|
10
|
Abstract
Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make most effective use of our rapidly growing databases of whole genomes.
Collapse
|
11
|
MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive. Bioinformatics 2018; 33:2914-2923. [PMID: 28535296 PMCID: PMC5870770 DOI: 10.1093/bioinformatics/btx334] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2016] [Accepted: 05/21/2017] [Indexed: 01/31/2023] Open
Abstract
Motivation The NCBI’s Sequence Read Archive (SRA) promises great biological insight if one could analyze the data in the aggregate; however, the data remain largely underutilized, in part, due to the poor structure of the metadata associated with each sample. The rules governing submissions to the SRA do not dictate a standardized set of terms that should be used to describe the biological samples from which the sequencing data are derived. As a result, the metadata include many synonyms, spelling variants and references to outside sources of information. Furthermore, manual annotation of the data remains intractable due to the large number of samples in the archive. For these reasons, it has been difficult to perform large-scale analyses that study the relationships between biomolecular processes and phenotype across diverse diseases, tissues and cell types present in the SRA. Results We present MetaSRA, a database of normalized SRA human sample-specific metadata following a schema inspired by the metadata organization of the ENCODE project. This schema involves mapping samples to terms in biomedical ontologies, labeling each sample with a sample-type category, and extracting real-valued properties. We automated these tasks via a novel computational pipeline. Availability and implementation The MetaSRA is available at metasra.biostat.wisc.edu via both a searchable web interface and bulk downloads. Software implementing our computational pipeline is available at http://github.com/deweylab/metasra-pipeline Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
12
|
GATA Factor-Regulated Samd14 Enhancer Confers Red Blood Cell Regeneration and Survival in Severe Anemia. Dev Cell 2017; 42:213-225.e4. [PMID: 28787589 DOI: 10.1016/j.devcel.2017.07.009] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Revised: 05/05/2017] [Accepted: 07/11/2017] [Indexed: 12/31/2022]
Abstract
An enhancer with amalgamated E-box and GATA motifs (+9.5) controls expression of the regulator of hematopoiesis GATA-2. While similar GATA-2-occupied elements are common in the genome, occupancy does not predict function, and GATA-2-dependent genetic networks are incompletely defined. A "+9.5-like" element resides in an intron of Samd14 (Samd14-Enh) encoding a sterile alpha motif (SAM) domain protein. Deletion of Samd14-Enh in mice strongly decreased Samd14 expression in bone marrow and spleen. Although steady-state hematopoiesis was normal, Samd14-Enh-/- mice died in response to severe anemia. Samd14-Enh stimulated stem cell factor/c-Kit signaling, which promotes erythrocyte regeneration. Anemia activated Samd14-Enh by inducing enhancer components and enhancer chromatin accessibility. Thus, a GATA-2/anemia-regulated enhancer controls expression of an SAM domain protein that confers survival in anemia. We propose that Samd14-Enh and an ensemble of anemia-responsive enhancers are essential for erythrocyte regeneration in stress erythropoiesis, a vital process in pathologies, including β-thalassemia, myelodysplastic syndrome, and viral infection.
Collapse
|
13
|
Zebrafish zic2 controls formation of periocular neural crest and choroid fissure morphogenesis. Dev Biol 2017; 429:92-104. [PMID: 28689736 DOI: 10.1016/j.ydbio.2017.07.003] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 05/30/2017] [Accepted: 07/06/2017] [Indexed: 12/31/2022]
Abstract
The vertebrate retina develops in close proximity to the forebrain and neural crest-derived cartilages of the face and jaw. Coloboma, a congenital eye malformation, is associated with aberrant forebrain development (holoprosencephaly) and with craniofacial defects (frontonasal dysplasia) in humans, suggesting a critical role for cross-lineage interactions during retinal morphogenesis. ZIC2, a zinc-finger transcription factor, is linked to human holoprosencephaly. We have previously used morpholino assays to show zebrafish zic2 functions in the developing forebrain, retina and craniofacial cartilage. We now report that zebrafish with genetic lesions in zebrafish zic2 orthologs, zic2a and zic2b, develop with retinal coloboma and craniofacial anomalies. We demonstrate a requirement for zic2 in restricting pax2a expression and show evidence that zic2 function limits Hh signaling. RNA-seq transcriptome analysis identified an early requirement for zic2 in periocular neural crest as an activator of alx1, a transcription factor with essential roles in craniofacial and ocular morphogenesis in human and zebrafish. Collectively, these data establish zic2 mutant zebrafish as a powerful new genetic model for in-depth dissection of cell interactions and genetic controls during craniofacial complex development.
Collapse
|
14
|
Integrative analysis with ChIP-seq advances the limits of transcript quantification from RNA-seq. Genome Res 2016; 26:1124-33. [PMID: 27405803 PMCID: PMC4971760 DOI: 10.1101/gr.199174.115] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2015] [Accepted: 06/13/2016] [Indexed: 11/24/2022]
Abstract
RNA-seq is currently the technology of choice for global measurement of transcript abundances in cells. Despite its successes, isoform-level quantification remains difficult because short RNA-seq reads are often compatible with multiple alternatively spliced isoforms. Existing methods rely heavily on uniquely mapping reads, which are not available for numerous isoforms that lack regions of unique sequence. To improve quantification accuracy in such difficult cases, we developed a novel computational method, prior-enhanced RSEM (pRSEM), which uses a complementary data type in addition to RNA-seq data. We found that ChIP-seq data of RNA polymerase II and histone modifications were particularly informative in this approach. In qRT-PCR validations, pRSEM was shown to be superior than competing methods in estimating relative isoform abundances within or across conditions. Data-driven simulations suggested that pRSEM has a greatly decreased false-positive rate at the expense of a small increase in false-negative rate. In aggregate, our study demonstrates that pRSEM transforms existing capacity to precisely estimate transcript abundances, especially at the isoform level.
Collapse
|
15
|
Analysis of embryonic development in the unsequenced axolotl: Waves of transcriptomic upheaval and stability. Dev Biol 2016; 426:143-154. [PMID: 27475628 DOI: 10.1016/j.ydbio.2016.05.024] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Revised: 05/20/2016] [Accepted: 05/21/2016] [Indexed: 12/14/2022]
Abstract
The axolotl (Ambystoma mexicanum) has long been the subject of biological research, primarily owing to its outstanding regenerative capabilities. However, the gene expression programs governing its embryonic development are particularly underexplored, especially when compared to other amphibian model species. Therefore, we performed whole transcriptome polyA+ RNA sequencing experiments on 17 stages of embryonic development. As the axolotl genome is unsequenced and its gene annotation is incomplete, we built de novo transcriptome assemblies for each stage and garnered functional annotation by comparing expressed contigs with known genes in other organisms. In evaluating the number of differentially expressed genes over time, we identify three waves of substantial transcriptome upheaval each followed by a period of relative transcriptome stability. The first wave of upheaval is between the one and two cell stage. We show that the number of differentially expressed genes per unit time is higher between the one and two cell stage than it is across the mid-blastula transition (MBT), the period of zygotic genome activation. We use total RNA sequencing to demonstrate that the vast majority of genes with increasing polyA+ signal between the one and two cell stage result from polyadenylation rather than de novo transcription. The first stable phase begins after the two cell stage and continues until the mid-blastula transition, corresponding with the pre-MBT phase of transcriptional quiescence in amphibian development. Following this is a peak of differential gene expression corresponding with the activation of the zygotic genome and a phase of transcriptomic stability from stages 9-11. We observe a third wave of transcriptomic change between stages 11 and 14, followed by a final stable period. The last two stable phases have not been documented in amphibians previously and correspond to times of major morphogenic change in the axolotl embryo: gastrulation and neurulation. These results yield new insights into global gene expression during early stages of amphibian embryogenesis and will help to further develop the axolotl as a model species for developmental and regenerative biology.
Collapse
|
16
|
Mechanism governing heme synthesis reveals a GATA factor/heme circuit that controls differentiation. EMBO Rep 2015; 17:249-65. [PMID: 26698166 DOI: 10.15252/embr.201541465] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2015] [Accepted: 11/24/2015] [Indexed: 12/18/2022] Open
Abstract
Metal ion-containing macromolecules have fundamental roles in essentially all biological processes throughout the evolutionary tree. For example, iron-containing heme is a cofactor in enzyme catalysis and electron transfer and an essential hemoglobin constituent. To meet the intense demand for hemoglobin assembly in red blood cells, the cell type-specific factor GATA-1 activates transcription of Alas2, encoding the rate-limiting enzyme in heme biosynthesis, 5-aminolevulinic acid synthase-2 (ALAS-2). Using genetic editing to unravel mechanisms governing heme biosynthesis, we discovered a GATA factor- and heme-dependent circuit that establishes the erythroid cell transcriptome. CRISPR/Cas9-mediated ablation of two Alas2 intronic cis elements strongly reduces GATA-1-induced Alas2 transcription, heme biosynthesis, and surprisingly, GATA-1 regulation of other vital constituents of the erythroid cell transcriptome. Bypassing ALAS-2 function in Alas2 cis element-mutant cells by providing its catalytic product 5-aminolevulinic acid rescues heme biosynthesis and the GATA-1-dependent genetic network. Heme amplifies GATA-1 function by downregulating the heme-sensing transcriptional repressor Bach1 and via a Bach1-insensitive mechanism. Through this dual mechanism, heme and a master regulator collaborate to orchestrate a cell type-specific transcriptional program that promotes cellular differentiation.
Collapse
|
17
|
Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping. PLoS Comput Biol 2015; 11:e1004491. [PMID: 26484757 PMCID: PMC4618727 DOI: 10.1371/journal.pcbi.1004491] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2014] [Accepted: 08/06/2015] [Indexed: 11/19/2022] Open
Abstract
Segmental duplications and other highly repetitive regions of genomes contribute significantly to cells' regulatory programs. Advancements in next generation sequencing enabled genome-wide profiling of protein-DNA interactions by chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq). However, interactions in highly repetitive regions of genomes have proven difficult to map since short reads of 50-100 base pairs (bps) from these regions map to multiple locations in reference genomes. Standard analytical methods discard such multi-mapping reads and the few that can accommodate them are prone to large false positive and negative rates. We developed Perm-seq, a prior-enhanced read allocation method for ChIP-seq experiments, that can allocate multi-mapping reads in highly repetitive regions of the genomes with high accuracy. We comprehensively evaluated Perm-seq, and found that our prior-enhanced approach significantly improves multi-read allocation accuracy over approaches that do not utilize additional data types. The statistical formalism underlying our approach facilitates supervising of multi-read allocation with a variety of data sources including histone ChIP-seq. We applied Perm-seq to 64 ENCODE ChIP-seq datasets from GM12878 and K562 cells and identified many novel protein-DNA interactions in segmental duplication regions. Our analysis reveals that although the protein-DNA interactions sites are evolutionarily less conserved in repetitive regions, they share the overall sequence characteristics of the protein-DNA interactions in non-repetitive regions.
Collapse
|
18
|
Cis-regulatory mechanisms governing stem and progenitor cell transitions. SCIENCE ADVANCES 2015; 1:e1500503. [PMID: 26601269 PMCID: PMC4643771 DOI: 10.1126/sciadv.1500503] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Accepted: 06/20/2015] [Indexed: 05/25/2023]
Abstract
Cis-element encyclopedias provide information on phenotypic diversity and disease mechanisms. Although cis-element polymorphisms and mutations are instructive, deciphering function remains challenging. Mutation of an intronic GATA motif (+9.5) in GATA2, encoding a master regulator of hematopoiesis, underlies an immunodeficiency associated with myelodysplastic syndrome (MDS) and acute myeloid leukemia (AML). Whereas an inversion relocalizes another GATA2 cis-element (-77) to the proto-oncogene EVI1, inducing EVI1 expression and AML, whether this reflects ectopic or physiological activity is unknown. We describe a mouse strain that decouples -77 function from proto-oncogene deregulation. The -77(-/-) mice exhibited a novel phenotypic constellation including late embryonic lethality and anemia. The -77 established a vital sector of the myeloid progenitor transcriptome, conferring multipotentiality. Unlike the +9.5(-/-) embryos, hematopoietic stem cell genesis was unaffected in -77(-/-) embryos. These results illustrate a paradigm in which cis-elements in a locus differentially control stem and progenitor cell transitions, and therefore the individual cis-element alterations cause unique and overlapping disease phenotypes.
Collapse
|
19
|
Declined presentation hematopoietic signaling mechanism revealed from a stem/progenitor cell cistrome. Exp Hematol 2015. [DOI: 10.1016/j.exphem.2015.06.141] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
20
|
Linking heme biosynthesis with a GATA factor-regulated genetic network that controls cellular differentiation. Exp Hematol 2015. [DOI: 10.1016/j.exphem.2015.06.270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
21
|
Hematopoietic Signaling Mechanism Revealed from a Stem/Progenitor Cell Cistrome. Mol Cell 2015; 59:62-74. [PMID: 26073540 DOI: 10.1016/j.molcel.2015.05.020] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Revised: 04/27/2015] [Accepted: 05/07/2015] [Indexed: 11/17/2022]
Abstract
Thousands of cis-elements in genomes are predicted to have vital functions. Although conservation, activity in surrogate assays, polymorphisms, and disease mutations provide functional clues, deletion from endogenous loci constitutes the gold-standard test. A GATA-2-binding, Gata2 intronic cis-element (+9.5) required for hematopoietic stem cell genesis in mice is mutated in a human immunodeficiency syndrome. Because +9.5 is the only cis-element known to mediate stem cell genesis, we devised a strategy to identify functionally comparable enhancers ("+9.5-like") genome-wide. Gene editing revealed +9.5-like activity to mediate GATA-2 occupancy, chromatin opening, and transcriptional activation. A +9.5-like element resided in Samd14, which encodes a protein of unknown function. Samd14 increased hematopoietic progenitor levels/activity and promoted signaling by a pathway vital for hematopoietic stem/progenitor cell regulation (stem cell factor/c-Kit), and c-Kit rescued Samd14 loss-of-function phenotypes. Thus, the hematopoietic stem/progenitor cell cistrome revealed a mediator of a signaling pathway that has broad importance for stem/progenitor cell biology.
Collapse
|
22
|
EBSeq-HMM: a Bayesian approach for identifying gene-expression changes in ordered RNA-seq experiments. Bioinformatics 2015; 31:2614-22. [PMID: 25847007 PMCID: PMC4528625 DOI: 10.1093/bioinformatics/btv193] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2014] [Accepted: 03/30/2015] [Indexed: 01/08/2023] Open
Abstract
Motivation: With improvements in next-generation sequencing technologies and reductions in price, ordered RNA-seq experiments are becoming common. Of primary interest in these experiments is identifying genes that are changing over time or space, for example, and then characterizing the specific expression changes. A number of robust statistical methods are available to identify genes showing differential expression among multiple conditions, but most assume conditions are exchangeable and thereby sacrifice power and precision when applied to ordered data. Results: We propose an empirical Bayes mixture modeling approach called EBSeq-HMM. In EBSeq-HMM, an auto-regressive hidden Markov model is implemented to accommodate dependence in gene expression across ordered conditions. As demonstrated in simulation and case studies, the output proves useful in identifying differentially expressed genes and in specifying gene-specific expression paths. EBSeq-HMM may also be used for inference regarding isoform expression. Availability and implementation: An R package containing examples and sample datasets is available at Bioconductor. Contact:kendzior@biostat.wisc.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
23
|
Abstract
De novo RNA-Seq assembly facilitates the study of transcriptomes for species without sequenced genomes, but it is challenging to select the most accurate assembly in this context. To address this challenge, we developed a model-based score, RSEM-EVAL, for evaluating assemblies when the ground truth is unknown. We show that RSEM-EVAL correctly reflects assembly accuracy, as measured by REF-EVAL, a refined set of ground-truth-based scores that we also developed. Guided by RSEM-EVAL, we assembled the transcriptome of the regenerating axolotl limb; this assembly compares favorably to a previous assembly. A software package implementing our methods, DETONATE, is freely available at http://deweylab.biostat.wisc.edu/detonate.
Collapse
|
24
|
Evaluation of de novo transcriptome assemblies from RNA-Seq data. Genome Biol 2014; 15:553. [PMID: 25608678 PMCID: PMC4298084 DOI: 10.1186/s13059-014-0553-5] [Citation(s) in RCA: 196] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2014] [Accepted: 10/30/2014] [Indexed: 01/16/2023] Open
Abstract
De novo RNA-Seq assembly facilitates the study of transcriptomes for species without sequenced genomes, but it is challenging to select the most accurate assembly in this context. To address this challenge, we developed a model-based score, RSEM-EVAL, for evaluating assemblies when the ground truth is unknown. We show that RSEM-EVAL correctly reflects assembly accuracy, as measured by REF-EVAL, a refined set of ground-truth-based scores that we also developed. Guided by RSEM-EVAL, we assembled the transcriptome of the regenerating axolotl limb; this assembly compares favorably to a previous assembly. A software package implementing our methods, DETONATE, is freely available at http://deweylab.biostat.wisc.edu/detonate.
Collapse
|
25
|
Gata2 cis-element is required for hematopoietic stem cell generation in the mammalian embryo. ACTA ACUST UNITED AC 2013; 210:2833-42. [PMID: 24297994 PMCID: PMC3865483 DOI: 10.1084/jem.20130733] [Citation(s) in RCA: 108] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Cis-element requirement for the emergence of HSCs in the AGM and for hemogenic endothelium to generate HSC-containing c-Kit+ cell clusters. The generation of hematopoietic stem cells (HSCs) from hemogenic endothelium within the aorta, gonad, mesonephros (AGM) region of the mammalian embryo is crucial for development of the adult hematopoietic system. We described a deletion of a Gata2 cis-element (+9.5) that depletes fetal liver HSCs, is lethal at E13–14 of embryogenesis, and is mutated in an immunodeficiency that progresses to myelodysplasia/leukemia. Here, we demonstrate that the +9.5 element enhances Gata2 expression and is required to generate long-term repopulating HSCs in the AGM. Deletion of the +9.5 element abrogated the capacity of hemogenic endothelium to generate HSC-containing clusters in the aorta. Genomic analyses indicated that the +9.5 element regulated a rich ensemble of genes that control hemogenic endothelium and HSCs, as well as genes not implicated in hematopoiesis. These results reveal a mechanism that controls stem cell emergence from hemogenic endothelium to establish the adult hematopoietic system.
Collapse
|
26
|
Abstract
The Xenopus Cripto-1 protein is confined to the cells of the animal hemisphere during early embryogenesis where it regulates the formation of anterior structures. Cripto-1 protein accumulates only in animal cells because cripto-1 mRNA in cells of the vegetal hemisphere is translationally repressed. Here, we show that the RNA binding protein, Bicaudal-C (Bic-C), functioned directly in this vegetal cell-specific repression. While Bic-C protein is normally confined to vegetal cells, ectopic expression of Bic-C in animal cells repressed a cripto-1 mRNA reporter and associated with endogenous cripto-1 mRNA. Repression by Bic-C required its N-terminal domain, comprised of multiple KH motifs, for specific binding to relevant control elements within the cripto-1 mRNA and a functionally separable C-terminal translation repression domain. Bic-C-mediated repression required the 5' CAP and translation initiation factors, but not a poly(A) tail or the conserved SAM domain within Bic-C. Bic-C-directed immunoprecipitation followed by deep sequencing of associated mRNAs identified multiple Bic-C-regulated mRNA targets, including cripto-1 mRNA, providing new insights and tools for understanding the role of Bic-C in vertebrate development.
Collapse
|
27
|
De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 2013; 8:1494-1512. [PMID: 23845962 DOI: 10.1038/nprot.2013.084.de] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.
Collapse
|
28
|
De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 2013; 8:1494-512. [PMID: 23845962 PMCID: PMC3875132 DOI: 10.1038/nprot.2013.084] [Citation(s) in RCA: 5295] [Impact Index Per Article: 481.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.
Collapse
|
29
|
De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 2013. [PMID: 23845962 DOI: 10.1038/nprot.2013.084.] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.
Collapse
|
30
|
Abstract
Motivation: Alternative splicing and other processes that allow for different transcripts to be derived from the same gene are significant forces in the eukaryotic cell. RNA-Seq is a promising technology for analyzing alternative transcripts, as it does not require prior knowledge of transcript structures or genome sequences. However, analysis of RNA-Seq data in the presence of genes with large numbers of alternative transcripts is currently challenging due to efficiency, identifiability and representation issues. Results: We present RNA-Seq models and associated inference algorithms based on the concept of probabilistic splice graphs, which alleviate these issues. We prove that our models are often identifiable and demonstrate that our inference methods for quantification and differential processing detection are efficient and accurate. Availability: Software implementing our methods is available at http://deweylab.biostat.wisc.edu/psginfer. Contact:cdewey@biostat.wisc.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
31
|
Comparative RNA-seq analysis in the unsequenced axolotl: the oncogene burst highlights early gene expression in the blastema. PLoS Comput Biol 2013; 9:e1002936. [PMID: 23505351 PMCID: PMC3591270 DOI: 10.1371/journal.pcbi.1002936] [Citation(s) in RCA: 101] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2012] [Accepted: 01/08/2013] [Indexed: 01/09/2023] Open
Abstract
The salamander has the remarkable ability to regenerate its limb after amputation. Cells at the site of amputation form a blastema and then proliferate and differentiate to regrow the limb. To better understand this process, we performed deep RNA sequencing of the blastema over a time course in the axolotl, a species whose genome has not been sequenced. Using a novel comparative approach to analyzing RNA-seq data, we characterized the transcriptional dynamics of the regenerating axolotl limb with respect to the human gene set. This approach involved de novo assembly of axolotl transcripts, RNA-seq transcript quantification without a reference genome, and transformation of abundances from axolotl contigs to human genes. We found a prominent burst in oncogene expression during the first day and blastemal/limb bud genes peaking at 7 to 14 days. In addition, we found that limb patterning genes, SALL genes, and genes involved in angiogenesis, wound healing, defense/immunity, and bone development are enriched during blastema formation and development. Finally, we identified a category of genes with no prior literature support for limb regeneration that are candidates for further evaluation based on their expression pattern during the regenerative process.
Collapse
|
32
|
Abstract
Titin, a sarcomeric protein expressed primarily in striated muscles, is responsible for maintaining the structure and biomechanical properties of muscle cells. Cardiac titin undergoes developmental size reduction from 3.7 megadaltons in neonates to primarily 2.97 megadaltons in the adult. This size reduction results from gradually increased exon skipping between exons 50 and 219 of titin mRNA. Our previous study reported that Rbm20 is the splicing factor responsible for this process. In this work, we investigated its molecular mechanism. We demonstrate that Rbm20 mediates exon skipping by binding to titin pre-mRNA to repress the splicing of some regions; the exons/introns in these Rbm20-repressed regions are ultimately skipped. Rbm20 was also found to mediate intron retention and exon shuffling. The two Rbm20 speckles found in nuclei from muscle tissues were identified as aggregates of Rbm20 protein on the partially processed titin pre-mRNAs. Cooperative repression and alternative 3' splice site selection were found to be used by Rbm20 to skip different subsets of titin exons, and the splicing pathway selected depended on the ratio of Rbm20 to other splicing factors that vary with tissue type and developmental age.
Collapse
|
33
|
Genomic variation in natural populations of Drosophila melanogaster. Genetics 2012; 192:533-98. [PMID: 22673804 PMCID: PMC3454882 DOI: 10.1534/genetics.112.142018] [Citation(s) in RCA: 242] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Accepted: 05/24/2012] [Indexed: 02/07/2023] Open
Abstract
This report of independent genome sequences of two natural populations of Drosophila melanogaster (37 from North America and 6 from Africa) provides unique insight into forces shaping genomic polymorphism and divergence. Evidence of interactions between natural selection and genetic linkage is abundant not only in centromere- and telomere-proximal regions, but also throughout the euchromatic arms. Linkage disequilibrium, which decays within 1 kbp, exhibits a strong bias toward coupling of the more frequent alleles and provides a high-resolution map of recombination rate. The juxtaposition of population genetics statistics in small genomic windows with gene structures and chromatin states yields a rich, high-resolution annotation, including the following: (1) 5'- and 3'-UTRs are enriched for regions of reduced polymorphism relative to lineage-specific divergence; (2) exons overlap with windows of excess relative polymorphism; (3) epigenetic marks associated with active transcription initiation sites overlap with regions of reduced relative polymorphism and relatively reduced estimates of the rate of recombination; (4) the rate of adaptive nonsynonymous fixation increases with the rate of crossing over per base pair; and (5) both duplications and deletions are enriched near origins of replication and their density correlates negatively with the rate of crossing over. Available demographic models of X and autosome descent cannot account for the increased divergence on the X and loss of diversity associated with the out-of-Africa migration. Comparison of the variation among these genomes to variation among genomes from D. simulans suggests that many targets of directional selection are shared between these species.
Collapse
|
34
|
Abstract
Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction, and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses, such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make the most effective use of our rapidly growing databases of whole genomes.
Collapse
|
35
|
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 2011; 12:323. [PMID: 21816040 DOI: 10.1007/978-1-4939-0512-63] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2011] [Accepted: 08/04/2011] [Indexed: 05/28/2023] Open
Abstract
BACKGROUND RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. RESULTS We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. CONCLUSIONS RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
Collapse
|
36
|
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 2011; 12:323. [PMID: 21816040 PMCID: PMC3163565 DOI: 10.1186/1471-2105-12-323] [Citation(s) in RCA: 12282] [Impact Index Per Article: 944.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2011] [Accepted: 08/04/2011] [Indexed: 02/07/2023] Open
Abstract
Background RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. Results We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. Conclusions RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
Collapse
|
37
|
Abstract
BACKGROUND RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. RESULTS We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. CONCLUSIONS RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
Collapse
|
38
|
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 2011. [PMID: 21816040 DOI: 10.1186/1471‐2105‐12‐323] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. RESULTS We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. CONCLUSIONS RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
Collapse
|
39
|
Abstract
Orthology is a powerful refinement of homology that allows us to describe more precisely the evolution of genomes and understand the function of the genes they contain. However, because orthology is not concerned with genomic position, it is limited in its ability to describe genes that are likely to have equivalent roles in different genomes. Because of this limitation, the concept of ‘positional orthology’ has emerged, which describes the relation between orthologous genes that retain their ancestral genomic positions. In this review, we formally define this concept, for which we introduce the shorter term ‘toporthology’, with respect to the evolutionary events experienced by a gene’s ancestors. Through a discussion of recent studies on the role of genomic context in gene evolution, we show that the distinction between orthology and toporthology is biologically significant. We then review a number of orthology prediction methods that take genomic context into account and thus that may be used to infer the important relation of toporthology.
Collapse
|
40
|
Abstract
MOTIVATION BUCKy is a C++ program that implements Bayesian concordance analysis. The method uses a non-parametric clustering of genes with compatible trees, and reconstructs the primary concordance tree from clades supported by the largest proportions of genes. A population tree with branch lengths in coalescent units is estimated from quartet concordance factors. AVAILABILITY BUCKy is open source and distributed under the GNU general public license at www.stat.wisc.edu/∼ane/bucky/.
Collapse
|
41
|
Abstract
Motivation: RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which they are derived, a single read may map to multiple genes and isoforms, complicating expression analyses. Previous computational methods either discard reads that map to multiple locations or allocate them to genes heuristically. Results: We present a generative statistical model and associated inference methods that handle read mapping uncertainty in a principled manner. Through simulations parameterized by real RNA-Seq data, we show that our method is more accurate than previous methods. Our improved accuracy is the result of handling read mapping uncertainty with a statistical model and the estimation of gene expression levels as the sum of isoform expression levels. Unlike previous methods, our method is capable of modeling non-uniform read distributions. Simulations with our method indicate that a read length of 20–25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed. Availability: An initial C++ implementation of our method that was used for the results presented in this article is available at http://deweylab.biostat.wisc.edu/rsem. Contact:cdewey@biostat.wisc.edu Supplementary information:Supplementary data are available at Bioinformatics on
Collapse
|
42
|
Abstract
Population genetic theory predicts discordance in the true phylogeny of different genomic regions when studying recently diverged species. Despite this expectation, genome-wide discordance in young species groups has rarely been statistically quantified. The house mouse subspecies group provides a model system for examining phylogenetic discordance. House mouse subspecies are recently derived, suggesting that even if there has been a simple tree-like population history, gene trees could disagree with the population history due to incomplete lineage sorting. Subspecies of house mice also hybridize in nature, raising the possibility that recent introgression might lead to additional phylogenetic discordance. Single-locus approaches have revealed support for conflicting topologies, resulting in a subspecies tree often summarized as a polytomy. To analyze phylogenetic histories on a genomic scale, we applied a recently developed method, Bayesian concordance analysis, to dense SNP data from three closely related subspecies of house mice: Mus musculus musculus, M. m. castaneus, and M. m. domesticus. We documented substantial variation in phylogenetic history across the genome. Although each of the three possible topologies was strongly supported by a large number of loci, there was statistical evidence for a primary phylogenetic history in which M. m. musculus and M. m. castaneus are sister subspecies. These results underscore the importance of measuring phylogenetic discordance in other recently diverged groups using methods such as Bayesian concordance analysis, which are designed for this purpose.
Collapse
|
43
|
Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res 2007; 17:760-74. [PMID: 17567995 PMCID: PMC1891336 DOI: 10.1101/gr.6034307] [Citation(s) in RCA: 170] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization.
Collapse
|
44
|
Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 2007; 447:799-816. [PMID: 17571346 PMCID: PMC2212820 DOI: 10.1038/nature05874] [Citation(s) in RCA: 3782] [Impact Index Per Article: 222.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
Collapse
|
45
|
Abstract
The availability of an increasing number of whole genome sequences presents us with the need for tools to quickly put them into a nucleotide-level multiple alignment. Mercator and MAVID are two programs that can be combined to accomplish this task. Given multiple whole genomes as input, Mercator is first used to construct an orthology map, which is then used to guide nucleotide-level multiple alignments produced by MAVID. These programs are both fast and freely available, allowing researchers to perform genome alignments on a single laptop. This tutorial will guide the researcher through the steps required for whole-genome alignment with Mercator and MAVID.
Collapse
|
46
|
Evolution at the nucleotide level: the problem of multiple whole-genome alignment. Hum Mol Genet 2006; 15 Spec No 1:R51-6. [PMID: 16651369 DOI: 10.1093/hmg/ddl056] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
With the genome sequences of numerous species at hand, we have the opportunity to discover how evolution has acted at each and every nucleotide in our genome. To this end, we must identify sets of nucleotides that have descended from a common ancestral nucleotide. The problem of identifying evolutionary-related nucleotides is that of sequence alignment. When the sequences under consideration are entire genomes, we have the problem of multiple whole-genome alignment. In this paper, we first state a series of definitions for homology and its subrelations between single nucleotides. Within this framework, we review the current methods available for the alignment of multiple large genomes. We then describe a subset of tools that make biological inferences from multiple whole-genome alignments.
Collapse
|
47
|
Abstract
The classic algorithms of Needleman-Wunsch and Smith-Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). To process large genomes that have undergone complex genome rearrangements, almost all existing whole genome alignment methods apply fast heuristics to divide genomes into small pieces that are suitable for Needleman-Wunsch alignment. In these alignment methods, it is standard practice to fix the parameters and to produce a single alignment for subsequent analysis by biologists. As the number of alignment programs applied on a whole genome scale continues to increase, so does the disagreement in their results. The alignments produced by different programs vary greatly, especially in non-coding regions of eukaryotic genomes where the biologically correct alignment is hard to find. Parametric alignment is one possible remedy. This methodology resolves the issue of robustness to changes in parameters by finding all optimal alignments for all possible parameters in a PHMM. Our main result is the construction of a whole genome parametric alignment of Drosophila melanogaster and Drosophila pseudoobscura. This alignment draws on existing heuristics for dividing whole genomes into small pieces for alignment, and it relies on advances we have made in computing convex polytopes that allow us to parametrically align non-coding regions using biologically realistic models. We demonstrate the utility of our parametric alignment for biological inference by showing that cis-regulatory elements are more conserved between Drosophila melanogaster and Drosophila pseudoobscura than previously thought. We also show how whole genome parametric alignment can be used to quantitatively assess the dependence of branch length estimates on alignment parameters.
Collapse
|
48
|
A genome-wide map of conserved microRNA targets in C. elegans. Curr Biol 2006; 16:460-71. [PMID: 16458514 DOI: 10.1016/j.cub.2006.01.050] [Citation(s) in RCA: 346] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2005] [Revised: 01/19/2006] [Accepted: 01/24/2006] [Indexed: 12/19/2022]
Abstract
BACKGROUND Metazoan miRNAs regulate protein-coding genes by binding the 3' UTR of cognate mRNAs. Identifying targets for the 115 known C. elegans miRNAs is essential for understanding their function. RESULTS By using a new version of PicTar and sequence alignments of three nematodes, we predict that miRNAs regulate at least 10% of C. elegans genes through conserved interactions. We have developed a new experimental pipeline to assay 3' UTR-mediated posttranscriptional gene regulation via an endogenous reporter expression system amenable to high-throughput cloning, demonstrating the utility of this system using one of the most intensely studied miRNAs, let-7. Our expression analyses uncover several new potential let-7 targets and suggest a new let-7 activity in head muscle and neurons. To explore genome-wide trends in miRNA function, we analyzed functional categories of predicted target genes, finding that one-third of C. elegans miRNAs target gene sets are enriched for specific functional annotations. We have also integrated miRNA target predictions with other functional genomic data from C. elegans. CONCLUSIONS At least 10% of C. elegans genes are predicted miRNA targets, and a number of nematode miRNAs seem to regulate biological processes by targeting functionally related genes. We have also developed and successfully utilized an in vivo system for testing miRNA target predictions in likely endogenous expression domains. The thousands of genome-wide miRNA target predictions for nematodes, humans, and flies are available from the PicTar website and are linked to an accessible graphical network-browsing tool allowing exploration of miRNA target predictions in the context of various functional genomic data resources.
Collapse
|