Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

48
(from Reference Citation Analysis)

Article PDFs (21)

Cited by > 0 (40)

Searched Name

Colin N. Dewey

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	Infection induced inflammation impairs wound healing through IL-1β signaling. iScience 2024;27:109532. [PMID: 38577110 PMCID: PMC10993181 DOI: 10.1016/j.isci.2024.109532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 01/14/2024] [Accepted: 03/16/2024] [Indexed: 04/06/2024] Open Abstract Wound healing is impaired by infection; however, how microbe-induced inflammation modulates tissue repair remains unclear. We took advantage of the optical transparency of zebrafish and a genetically tractable microbe, Listeria monocytogenes, to probe the role of infection and inflammation in wound healing. Infection with bacteria engineered to activate the inflammasome, Lm-Pyro, induced persistent inflammation and impaired healing despite low bacterial burden. Inflammatory infections induced il1b expression and blocking IL-1R signaling partially rescued wound healing in the presence of persistent infection. We found a critical window of microbial clearance necessary to limit persistent inflammation and enable efficient wound repair. Taken together, our findings suggest that the dynamics of microbe-induced tissue inflammation impacts repair in complex tissue damage independent of bacterial load, with a critical early window for efficient tissue repair. Collapse Key Words Biological sciences Immune response Immunology Microbiology Collapse MESH Headings Collapse Grants Collapse
2	Cell Type-Specific Transcriptome Profiling Reveals a Role for Thioredoxin During Tumor Initiation. Front Immunol 2022;13:818893. [PMID: 35250998 PMCID: PMC8891495 DOI: 10.3389/fimmu.2022.818893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 01/25/2022] [Indexed: 01/27/2023] Open Abstract Neutrophils in the tumor microenvironment exhibit altered functions. However, the changes in neutrophil behavior during tumor initiation remain unclear. Here we used Translating Ribosomal Affinity Purification (TRAP) and RNA sequencing to identify neutrophil, macrophage and transformed epithelial cell transcriptional changes induced by oncogenic RasG12V in larval zebrafish. We found that transformed epithelial cells and neutrophils, but not macrophages, had significant changes in gene expression in larval zebrafish. Interestingly, neutrophils had more significantly down-regulated genes, whereas gene expression was primarily upregulated in transformed epithelial cells. The antioxidant, thioredoxin (txn), a small thiol that regulates reduction-oxidation (redox) balance, was upregulated in transformed keratinocytes and neutrophils in response to oncogenic Ras. To determine the role of thioredoxin during tumor initiation, we generated a zebrafish thioredoxin mutant. We observed an increase in wound-induced reactive oxygen species signaling and neutrophil recruitment in thioredoxin-deficient zebrafish. Transformed keratinocytes also showed increased proliferation and reduced apoptosis in thioredoxin-deficient larvae. Using live imaging, we visualized neutrophil behavior near transformed cells and found increased neutrophil recruitment and altered motility dynamics. Finally, in the absence of neutrophils, transformed keratinocytes no longer exhibited increased proliferation in thioredoxin mutants. Taken together, our findings demonstrate that tumor initiation induces changes in neutrophil gene expression and behavior that can impact proliferation of transformed cells in the early tumor microenvironment. Collapse Key Words gene expression keratinocyte migration neutrophil thioredoxin (txn) tumor initiation Collapse MESH Headings Animals Cell Transformation, Neoplastic Gene Expression Profiling Larva/genetics Larva/metabolism Thioredoxins/genetics Thioredoxins/metabolism Tumor Microenvironment/genetics Zebrafish/genetics Zebrafish Proteins/genetics Collapse Grants P30 CA014520 NCI NIH HHS R01 CA085862 NCI NIH HHS T32 AI055397 NIAID NIH HHS T32 CA009135 NCI NIH HHS Office of Extramural Research, National Institutes of Health Collapse
3	Annotating cell types in human single-cell RNA-seq data with CellO. STAR Protoc 2021;2:100705. [PMID: 34458864 PMCID: PMC8379521 DOI: 10.1016/j.xpro.2021.100705] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open Abstract Cell type annotation is important in the analysis of single-cell RNA-seq data. CellO is a machine-learning-based tool for annotating cells using the Cell Ontology, a rich hierarchy of known cell types. We provide a protocol for using the CellO Python package to annotate human cells. We demonstrate how to use CellO in conjunction with Scanpy, a Python library for performing single-cell analysis, annotate a lung tissue data set, interpret its hierarchically structured cell type annotations, and create publication-ready figures. For complete details on the use and execution of this protocol, please refer to Bernstein et al. (2021). • CellO is a Python package for annotating cell types in single-cell RNA-seq data • CellO classifies cells against the hierarchically structured Cell Ontology • CellO can be integrated into single-cell analysis pipelines implemented with Scanpy • We present a tutorial that classifies cells in an existing lung tumor data set Collapse Key Words Bioinformatics RNAseq Collapse MESH Headings Collapse Grants Collapse
4	RNA-regulatory exosome complex confers cellular survival to promote erythropoiesis. Nucleic Acids Res 2021;49:9007-9025. [PMID: 34059908 PMCID: PMC8450083 DOI: 10.1093/nar/gkab367] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 03/29/2021] [Accepted: 05/27/2021] [Indexed: 01/03/2023] Open Abstract Cellular differentiation requires vast remodeling of transcriptomes, and therefore machinery mediating remodeling controls differentiation. Relative to transcriptional mechanisms governing differentiation, post-transcriptional processes are less well understood. As an important post-transcriptional determinant of transcriptomes, the RNA exosome complex (EC) mediates processing and/or degradation of select RNAs. During erythropoiesis, the erythroid transcription factor GATA1 represses EC subunit genes. Depleting EC structural subunits prior to GATA1-mediated repression is deleterious to erythroid progenitor cells. To assess the importance of the EC catalytic subunits Dis3 and Exosc10 in this dynamic process, we asked if these subunits function non-redundantly to control erythropoiesis. Dis3 or Exosc10 depletion in primary murine hematopoietic progenitor cells reduced erythroid progenitors and their progeny, while sparing myeloid cells. Dis3 loss severely compromised erythroid progenitor and erythroblast survival, rendered erythroblasts hypersensitive to apoptosis-inducing stimuli and induced γ-H2AX, indicative of DNA double-stranded breaks. Dis3 loss-of-function phenotypes were more severe than those caused by Exosc10 depletion. We innovated a genetic rescue system to compare human Dis3 with multiple myeloma-associated Dis3 mutants S447R and R750K, and only wild type Dis3 was competent to rescue progenitors. Thus, Dis3 establishes a disease mutation-sensitive, cell type-specific survival mechanism to enable a differentiation program. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
5	PLK1 and NOTCH Positively Correlate in Melanoma and Their Combined Inhibition Results in Synergistic Modulations of Key Melanoma Pathways. Mol Cancer Ther 2021;20:161-172. [PMID: 33177155 PMCID: PMC7790869 DOI: 10.1158/1535-7163.mct-20-0654] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Revised: 09/24/2020] [Accepted: 10/23/2020] [Indexed: 11/16/2022] Abstract Melanoma is one of the most serious forms of skin cancer, and its increasing incidence coupled with nonlasting therapeutic options for metastatic disease highlights the need for additional novel approaches for its management. In this study, we determined the potential interactions between polo-like kinase 1 (PLK1, a serine/threonine kinase involved in mitotic regulation) and NOTCH1 (a type I transmembrane protein deciding cell fate during development) in melanoma. Employing an in-house human melanoma tissue microarray (TMA) containing multiple cases of melanomas and benign nevi, coupled with high-throughput, multispectral quantitative fluorescence imaging analysis, we found a positive correlation between PLK1 and NOTCH1 in melanoma. Furthermore, The Cancer Genome Atlas database analysis of patients with melanoma showed an association of higher mRNA levels of PLK1 and NOTCH1 with poor overall, as well as disease-free, survival. Next, utilizing small-molecule inhibitors of PLK1 and NOTCH (BI 6727 and MK-0752, respectively), we found a synergistic antiproliferative response of combined treatment in multiple human melanoma cells. To determine the molecular targets of the overall and synergistic responses of combined PLK1 and NOTCH inhibition, we conducted RNA-sequencing analysis employing a unique regression model with interaction terms. We identified the modulations of several key genes relevant to melanoma progression/metastasis, including MAPK, PI3K, and RAS, as well as some new genes such as Apobec3G, BTK, and FCER1G, which have not been well studied in melanoma. In conclusion, our study demonstrated a synergistic antiproliferative response of concomitant targeting of PLK1 and NOTCH in melanoma, unraveling a potential novel therapeutic approach for detailed preclinical/clinical evaluation. Collapse Key Words plk1 notch melanoma tissue microarray rna-sequencing Collapse MESH Headings Apoptosis/drug effects Cell Cycle Proteins/antagonists & inhibitors Cell Cycle Proteins/metabolism Cell Line, Tumor Cell Proliferation/drug effects Drug Synergism Gene Expression Regulation, Neoplastic/drug effects Gene Ontology Genetic Pleiotropy Humans Melanoma/genetics Melanoma/metabolism Protein Serine-Threonine Kinases/antagonists & inhibitors Protein Serine-Threonine Kinases/metabolism Proto-Oncogene Proteins/antagonists & inhibitors Proto-Oncogene Proteins/metabolism Receptors, Notch/antagonists & inhibitors Receptors, Notch/metabolism Signal Transduction/drug effects Signal Transduction/genetics Small Molecule Libraries/pharmacology Survival Analysis Polo-Like Kinase 1 Collapse Grants P30 AR066524 NIAMS NIH HHS I01 BX004921 BLRD VA S10 OD023526 NIH HHS R21 CA125091 NCI NIH HHS I01 CX001441 CSRD VA I01 BX001008 BLRD VA R01 CA176748 NCI NIH HHS R01 AR059130 NIAMS NIH HHS IK6 BX003780 BLRD VA P30 CA014520 NCI NIH HHS Collapse
6	CellO: comprehensive and hierarchical cell type classification of human cells with the Cell Ontology. iScience 2020;24:101913. [PMID: 33364592 PMCID: PMC7753962 DOI: 10.1016/j.isci.2020.101913] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 10/28/2020] [Accepted: 12/02/2020] [Indexed: 12/15/2022] Open Abstract Cell type annotation is a fundamental task in the analysis of single-cell RNA-sequencing data. In this work, we present CellO, a machine learning-based tool for annotating human RNA-seq data with the Cell Ontology. CellO enables accurate and standardized cell type classification of cell clusters by considering the rich hierarchical structure of known cell types. Furthermore, CellO comes pre-trained on a comprehensive data set of human, healthy, untreated primary samples in the Sequence Read Archive. CellO's comprehensive training set enables it to run out of the box on diverse cell types and achieves competitive or even superior performance when compared to existing state-of-the-art methods. Lastly, CellO's linear models are easily interpreted, thereby enabling exploration of cell-type-specific expression signatures across the ontology. To this end, we also present the CellO Viewer: a web application for exploring CellO's models across the ontology. Collapse Key Words Classification of Bioinformatical Subject Genomic Analysis Genomics Collapse MESH Headings Collapse Grants Collapse
7	PRAM: a novel pooling approach for discovering intergenic transcripts from large-scale RNA sequencing experiments. Genome Res 2020;30:1655-1666. [PMID: 32958497 PMCID: PMC7605252 DOI: 10.1101/gr.252445.119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2019] [Accepted: 08/27/2020] [Indexed: 11/25/2022] Abstract Publicly available RNA-seq data is routinely used for retrospective analysis to elucidate new biology. Novel transcript discovery enabled by joint analysis of large collections of RNA-seq data sets has emerged as one such analysis. Current methods for transcript discovery rely on a '2-Step' approach where the first step encompasses building transcripts from individual data sets, followed by the second step that merges predicted transcripts across data sets. To increase the power of transcript discovery from large collections of RNA-seq data sets, we developed a novel '1-Step' approach named Pooling RNA-seq and Assembling Models (PRAM) that builds transcript models from pooled RNA-seq data sets. We demonstrate in a computational benchmark that 1-Step outperforms 2-Step approaches in predicting overall transcript structures and individual splice junctions, while performing competitively in detecting exonic nucleotides. Applying PRAM to 30 human ENCODE RNA-seq data sets identified unannotated transcripts with epigenetic and RAMPAGE signatures similar to those of recently annotated transcripts. In a case study, we discovered and experimentally validated new transcripts through the application of PRAM to mouse hematopoietic RNA-seq data sets. We uncovered new transcripts that share a differential expression pattern with a neighboring gene Pik3cg implicated in human hematopoietic phenotypes, and we provided evidence for the conservation of this relationship in human. PRAM is implemented as an R/Bioconductor package. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
8	Abstract 222: RNA-seq analysis of differential gene expression in melanoma cells after combined inhibition of Plk1 and Notch. Cancer Res 2020. [DOI: 10.1158/1538-7445.am2020-222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract Abstract Melanoma is one of the most brutal forms of skin cancer, and its increasing incidence coupled with non-lasting therapeutic options for metastatic tumor highlight the need for additional strategies for the management of this neoplasm. Using tissue microarray analysis, we previously found that the expression of polo-like kinase 1 (Plk1, a serine/threonine kinase involved in mitotic regulation) and Notch1 (a type I transmembrane protein deciding cell fate during development) were positively correlated in melanoma (Cancer Res 2018; 78 [13 Suppl]: Abstract nr 2530), and their combined inhibition resulted in a synergistic anti-proliferative response in human melanoma cells (Cancer Res 2019; 79 [13 Suppl]: Abstract nr 302). In this study, to determine the possible mechanisms behind this observed synergism, we used RNA-seq technology to obtain the differential gene expression following treatment of SK-MEL-2 human metastatic melanoma cells with Plk1 inhibitor volasertib (BI6727, 20 nM) and Notch1 inhibitor MK-0752 (100 μM) for 48 h. After data pre-processing by RSEM algorithm, the DESeq2 package was implemented to identify differentially expressed genes (DEGs, \|log2-fold change\| >= 1, false positive rate ⇐ 0.05) when comparing the individual and combined treatments to vehicle (DMSO), as well as the interaction between volasertib:MK-0752. As a result, we identified 909 DEGs from volasertib treatment, 675 DEGs from MK-0752 treatment, 2142 genes from the combined treatment of volasertib and MK-0752, as well as 304 DEGs from the interaction of volasertib and MK-0752. In addition, employing GOstats and KEGGprofile packages in R programming, we conducted Gene Ontology (GO) and KEGG pathway analysis of the various DEGs. In GO analysis (counts >= 2, p ⇐ 10−5), we identified 202 downregulated GO terms affected by the combined inhibition of Plk1 and Notch1, including metabolism, cell proliferation, and migration. In KEGG pathway analysis, the combined inhibition of Plk1 and Notch1 was found to be associated with downregulation of several pathways shared with single drug treatments, such as PI3K-Akt, extracellular matrix receptor interaction, and protein digestion and absorption, as well as some novel pathways that were only affected by combined treatment, such as MAPK, Ras, and Rap1 pathways. Interestingly, our analysis predicted that the combined inhibition of Plk1 and Notch may make the melanoma cells more sensitive to immune responses. Overall, our data demonstrated that not only does targeting both Plk1 and Notch1 signaling pathways alters multiple melanoma progression pathways, but it may also potentially result in an increased sensitivity to other therapeutic targets, such as immune checkpoint blockade. However, these mechanistic findings need to be validated further in other relevant in vitro and in vivo models. Citation Format: Shengqin Su, Gagan Chhabra, Mary A. Ndiaye, Chandra K. Singh, Colin N. Dewey, Nihal Ahmad. RNA-seq analysis of differential gene expression in melanoma cells after combined inhibition of Plk1 and Notch [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 222. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
9	Giant Island Mice Exhibit Widespread Gene Expression Changes in Key Metabolic Organs. Genome Biol Evol 2020;12:1277-1301. [PMID: 32531054 PMCID: PMC7487164 DOI: 10.1093/gbe/evaa118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/06/2020] [Indexed: 12/02/2022] Open Abstract Island populations repeatedly evolve extreme body sizes, but the genomic basis of this pattern remains largely unknown. To understand how organisms on islands evolve gigantism, we compared genome-wide patterns of gene expression in Gough Island mice, the largest wild house mice in the world, and mainland mice from the WSB/EiJ wild-derived inbred strain. We used RNA-seq to quantify differential gene expression in three key metabolic organs: gonadal adipose depot, hypothalamus, and liver. Between 4,000 and 8,800 genes were significantly differentially expressed across the evaluated organs, representing between 20% and 50% of detected transcripts, with 20% or more of differentially expressed transcripts in each organ exhibiting expression fold changes of at least 2×. A minimum of 73 candidate genes for extreme size evolution, including Irs1 and Lrp1, were identified by considering differential expression jointly with other data sets: 1) genomic positions of published quantitative trait loci for body weight and growth rate, 2) whole-genome sequencing of 16 wild-caught Gough Island mice that revealed fixed single-nucleotide differences between the strains, and 3) publicly available tissue-specific regulatory elements. Additionally, patterns of differential expression across three time points in the liver revealed that Arid5b potentially regulates hundreds of genes. Functional enrichment analyses pointed to cell cycling, mitochondrial function, signaling pathways, inflammatory response, and nutrient metabolism as potential causes of weight accumulation in Gough Island mice. Collectively, our results indicate that extensive gene regulatory evolution in metabolic organs accompanied the rapid evolution of gigantism during the short time house mice have inhabited Gough Island. Collapse Key Words body size gene regulatory evolution house mouse island evolution island rule Collapse MESH Headings Animals Biological Evolution Body Size/genetics Female Gene Expression Hypothalamus/metabolism Liver/growth & development Liver/metabolism Male Mice/genetics Mice/growth & development Mice/metabolism Quantitative Trait Loci Collapse Grants R01 GM100426 NIGMS NIH HHS R01 HG005232 NHGRI NIH HHS T32 HG002760 NHGRI NIH HHS UL1 TR002373 NCATS NIH HHS Collapse
10	Whole-Genome Alignment. Methods Mol Biol 2019;1910:121-147. [PMID: 31278663 DOI: 10.1007/978-1-4939-9074-0_4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023] Abstract Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make most effective use of our rapidly growing databases of whole genomes. Collapse Key Words Comparative genomics Genome evolution Homology map Sequence alignment Toporthology Whole-genome alignment Collapse MESH Headings Algorithms Computational Biology/methods Databases, Genetic Evolution, Molecular Genome Genome-Wide Association Study Genomics/methods Sequence Alignment/methods Collapse Grants Collapse
11	MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive. Bioinformatics 2018;33:2914-2923. [PMID: 28535296 PMCID: PMC5870770 DOI: 10.1093/bioinformatics/btx334] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2016] [Accepted: 05/21/2017] [Indexed: 01/31/2023] Open Abstract Motivation The NCBI’s Sequence Read Archive (SRA) promises great biological insight if one could analyze the data in the aggregate; however, the data remain largely underutilized, in part, due to the poor structure of the metadata associated with each sample. The rules governing submissions to the SRA do not dictate a standardized set of terms that should be used to describe the biological samples from which the sequencing data are derived. As a result, the metadata include many synonyms, spelling variants and references to outside sources of information. Furthermore, manual annotation of the data remains intractable due to the large number of samples in the archive. For these reasons, it has been difficult to perform large-scale analyses that study the relationships between biomolecular processes and phenotype across diverse diseases, tissues and cell types present in the SRA. Results We present MetaSRA, a database of normalized SRA human sample-specific metadata following a schema inspired by the metadata organization of the ENCODE project. This schema involves mapping samples to terms in biomedical ontologies, labeling each sample with a sample-type category, and extracting real-valued properties. We automated these tasks via a novel computational pipeline. Availability and implementation The MetaSRA is available at metasra.biostat.wisc.edu via both a searchable web interface and bulk downloads. Software implementing our computational pipeline is available at http://github.com/deweylab/metasra-pipeline Supplementary information Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Biological Ontologies Databases, Genetic High-Throughput Nucleotide Sequencing/methods Humans Metadata Sequence Analysis, DNA/methods Sequence Analysis, RNA/methods Software Vocabulary, Controlled Collapse Grants R01 HG003747 NHGRI NIH HHS T15 LM007359 NLM NIH HHS U01 HG007019 NHGRI NIH HHS U54 AI117924 NIAID NIH HHS Collapse
12	GATA Factor-Regulated Samd14 Enhancer Confers Red Blood Cell Regeneration and Survival in Severe Anemia. Dev Cell 2017;42:213-225.e4. [PMID: 28787589 DOI: 10.1016/j.devcel.2017.07.009] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Revised: 05/05/2017] [Accepted: 07/11/2017] [Indexed: 12/31/2022] Abstract An enhancer with amalgamated E-box and GATA motifs (+9.5) controls expression of the regulator of hematopoiesis GATA-2. While similar GATA-2-occupied elements are common in the genome, occupancy does not predict function, and GATA-2-dependent genetic networks are incompletely defined. A "+9.5-like" element resides in an intron of Samd14 (Samd14-Enh) encoding a sterile alpha motif (SAM) domain protein. Deletion of Samd14-Enh in mice strongly decreased Samd14 expression in bone marrow and spleen. Although steady-state hematopoiesis was normal, Samd14-Enh^-/- mice died in response to severe anemia. Samd14-Enh stimulated stem cell factor/c-Kit signaling, which promotes erythrocyte regeneration. Anemia activated Samd14-Enh by inducing enhancer components and enhancer chromatin accessibility. Thus, a GATA-2/anemia-regulated enhancer controls expression of an SAM domain protein that confers survival in anemia. We propose that Samd14-Enh and an ensemble of anemia-responsive enhancers are essential for erythrocyte regeneration in stress erythropoiesis, a vital process in pathologies, including β-thalassemia, myelodysplastic syndrome, and viral infection. Collapse Key Words GATA-2 anemia enhancer erythroid hematopoiesis regeneration Collapse MESH Headings Collapse Grants Collapse
13	Zebrafish zic2 controls formation of periocular neural crest and choroid fissure morphogenesis. Dev Biol 2017;429:92-104. [PMID: 28689736 DOI: 10.1016/j.ydbio.2017.07.003] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 05/30/2017] [Accepted: 07/06/2017] [Indexed: 12/31/2022] Abstract The vertebrate retina develops in close proximity to the forebrain and neural crest-derived cartilages of the face and jaw. Coloboma, a congenital eye malformation, is associated with aberrant forebrain development (holoprosencephaly) and with craniofacial defects (frontonasal dysplasia) in humans, suggesting a critical role for cross-lineage interactions during retinal morphogenesis. ZIC2, a zinc-finger transcription factor, is linked to human holoprosencephaly. We have previously used morpholino assays to show zebrafish zic2 functions in the developing forebrain, retina and craniofacial cartilage. We now report that zebrafish with genetic lesions in zebrafish zic2 orthologs, zic2a and zic2b, develop with retinal coloboma and craniofacial anomalies. We demonstrate a requirement for zic2 in restricting pax2a expression and show evidence that zic2 function limits Hh signaling. RNA-seq transcriptome analysis identified an early requirement for zic2 in periocular neural crest as an activator of alx1, a transcription factor with essential roles in craniofacial and ocular morphogenesis in human and zebrafish. Collectively, these data establish zic2 mutant zebrafish as a powerful new genetic model for in-depth dissection of cell interactions and genetic controls during craniofacial complex development. Collapse Key Words Alx1 Coloboma Hedgehog signaling Zebrafish Zic2 Collapse MESH Headings Collapse Grants Collapse
14	Integrative analysis with ChIP-seq advances the limits of transcript quantification from RNA-seq. Genome Res 2016;26:1124-33. [PMID: 27405803 PMCID: PMC4971760 DOI: 10.1101/gr.199174.115] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2015] [Accepted: 06/13/2016] [Indexed: 11/24/2022] Abstract RNA-seq is currently the technology of choice for global measurement of transcript abundances in cells. Despite its successes, isoform-level quantification remains difficult because short RNA-seq reads are often compatible with multiple alternatively spliced isoforms. Existing methods rely heavily on uniquely mapping reads, which are not available for numerous isoforms that lack regions of unique sequence. To improve quantification accuracy in such difficult cases, we developed a novel computational method, prior-enhanced RSEM (pRSEM), which uses a complementary data type in addition to RNA-seq data. We found that ChIP-seq data of RNA polymerase II and histone modifications were particularly informative in this approach. In qRT-PCR validations, pRSEM was shown to be superior than competing methods in estimating relative isoform abundances within or across conditions. Data-driven simulations suggested that pRSEM has a greatly decreased false-positive rate at the expense of a small increase in false-negative rate. In aggregate, our study demonstrates that pRSEM transforms existing capacity to precisely estimate transcript abundances, especially at the isoform level. Collapse Key Words Collapse MESH Headings Algorithms Alternative Splicing/genetics Computational Biology/methods Gene Expression Profiling High-Throughput Nucleotide Sequencing/methods Humans RNA/genetics RNA Polymerase II/genetics Sequence Analysis, RNA/methods Software Collapse Grants U54 AI117924 NIAID NIH HHS R01 HG003747 NHGRI NIH HHS R01 DK050107 NIDDK NIH HHS U01 HG007019 NHGRI NIH HHS R37 DK050107 NIDDK NIH HHS Collapse
15	Analysis of embryonic development in the unsequenced axolotl: Waves of transcriptomic upheaval and stability. Dev Biol 2016;426:143-154. [PMID: 27475628 DOI: 10.1016/j.ydbio.2016.05.024] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Revised: 05/20/2016] [Accepted: 05/21/2016] [Indexed: 12/14/2022] Abstract The axolotl (Ambystoma mexicanum) has long been the subject of biological research, primarily owing to its outstanding regenerative capabilities. However, the gene expression programs governing its embryonic development are particularly underexplored, especially when compared to other amphibian model species. Therefore, we performed whole transcriptome polyA+ RNA sequencing experiments on 17 stages of embryonic development. As the axolotl genome is unsequenced and its gene annotation is incomplete, we built de novo transcriptome assemblies for each stage and garnered functional annotation by comparing expressed contigs with known genes in other organisms. In evaluating the number of differentially expressed genes over time, we identify three waves of substantial transcriptome upheaval each followed by a period of relative transcriptome stability. The first wave of upheaval is between the one and two cell stage. We show that the number of differentially expressed genes per unit time is higher between the one and two cell stage than it is across the mid-blastula transition (MBT), the period of zygotic genome activation. We use total RNA sequencing to demonstrate that the vast majority of genes with increasing polyA+ signal between the one and two cell stage result from polyadenylation rather than de novo transcription. The first stable phase begins after the two cell stage and continues until the mid-blastula transition, corresponding with the pre-MBT phase of transcriptional quiescence in amphibian development. Following this is a peak of differential gene expression corresponding with the activation of the zygotic genome and a phase of transcriptomic stability from stages 9-11. We observe a third wave of transcriptomic change between stages 11 and 14, followed by a final stable period. The last two stable phases have not been documented in amphibians previously and correspond to times of major morphogenic change in the axolotl embryo: gastrulation and neurulation. These results yield new insights into global gene expression during early stages of amphibian embryogenesis and will help to further develop the axolotl as a model species for developmental and regenerative biology. Collapse Key Words Axolotl Development Transcriptome Collapse MESH Headings Collapse Grants Collapse
16	Mechanism governing heme synthesis reveals a GATA factor/heme circuit that controls differentiation. EMBO Rep 2015;17:249-65. [PMID: 26698166 DOI: 10.15252/embr.201541465] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2015] [Accepted: 11/24/2015] [Indexed: 12/18/2022] Open Abstract Metal ion-containing macromolecules have fundamental roles in essentially all biological processes throughout the evolutionary tree. For example, iron-containing heme is a cofactor in enzyme catalysis and electron transfer and an essential hemoglobin constituent. To meet the intense demand for hemoglobin assembly in red blood cells, the cell type-specific factor GATA-1 activates transcription of Alas2, encoding the rate-limiting enzyme in heme biosynthesis, 5-aminolevulinic acid synthase-2 (ALAS-2). Using genetic editing to unravel mechanisms governing heme biosynthesis, we discovered a GATA factor- and heme-dependent circuit that establishes the erythroid cell transcriptome. CRISPR/Cas9-mediated ablation of two Alas2 intronic cis elements strongly reduces GATA-1-induced Alas2 transcription, heme biosynthesis, and surprisingly, GATA-1 regulation of other vital constituents of the erythroid cell transcriptome. Bypassing ALAS-2 function in Alas2 cis element-mutant cells by providing its catalytic product 5-aminolevulinic acid rescues heme biosynthesis and the GATA-1-dependent genetic network. Heme amplifies GATA-1 function by downregulating the heme-sensing transcriptional repressor Bach1 and via a Bach1-insensitive mechanism. Through this dual mechanism, heme and a master regulator collaborate to orchestrate a cell type-specific transcriptional program that promotes cellular differentiation. Collapse Key Words Bach1 GATA factor heme network transcriptome Collapse MESH Headings Collapse Grants Collapse
17	Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping. PLoS Comput Biol 2015;11:e1004491. [PMID: 26484757 PMCID: PMC4618727 DOI: 10.1371/journal.pcbi.1004491] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2014] [Accepted: 08/06/2015] [Indexed: 11/19/2022] Open Abstract Segmental duplications and other highly repetitive regions of genomes contribute significantly to cells' regulatory programs. Advancements in next generation sequencing enabled genome-wide profiling of protein-DNA interactions by chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq). However, interactions in highly repetitive regions of genomes have proven difficult to map since short reads of 50-100 base pairs (bps) from these regions map to multiple locations in reference genomes. Standard analytical methods discard such multi-mapping reads and the few that can accommodate them are prone to large false positive and negative rates. We developed Perm-seq, a prior-enhanced read allocation method for ChIP-seq experiments, that can allocate multi-mapping reads in highly repetitive regions of the genomes with high accuracy. We comprehensively evaluated Perm-seq, and found that our prior-enhanced approach significantly improves multi-read allocation accuracy over approaches that do not utilize additional data types. The statistical formalism underlying our approach facilitates supervising of multi-read allocation with a variety of data sources including histone ChIP-seq. We applied Perm-seq to 64 ENCODE ChIP-seq datasets from GM12878 and K562 cells and identified many novel protein-DNA interactions in segmental duplication regions. Our analysis reveals that although the protein-DNA interactions sites are evolutionarily less conserved in repetitive regions, they share the overall sequence characteristics of the protein-DNA interactions in non-repetitive regions. Collapse Key Words Collapse MESH Headings Algorithms Base Sequence Chromatin Immunoprecipitation/methods Chromosome Mapping/methods DNA/chemistry DNA/genetics DNA-Binding Proteins/chemistry DNA-Binding Proteins/genetics High-Throughput Nucleotide Sequencing/methods Humans K562 Cells Molecular Sequence Data Protein Interaction Mapping/methods Repetitive Sequences, Nucleic Acid/genetics Segmental Duplications, Genomic/genetics Collapse Grants R01 HG003747 NHGRI NIH HHS U01 HG007019 NHGRI NIH HHS HG003747 NHGRI NIH HHS Collapse
18	Cis-regulatory mechanisms governing stem and progenitor cell transitions. SCIENCE ADVANCES 2015;1:e1500503. [PMID: 26601269 PMCID: PMC4643771 DOI: 10.1126/sciadv.1500503] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Accepted: 06/20/2015] [Indexed: 05/25/2023] Abstract Cis-element encyclopedias provide information on phenotypic diversity and disease mechanisms. Although cis-element polymorphisms and mutations are instructive, deciphering function remains challenging. Mutation of an intronic GATA motif (+9.5) in GATA2, encoding a master regulator of hematopoiesis, underlies an immunodeficiency associated with myelodysplastic syndrome (MDS) and acute myeloid leukemia (AML). Whereas an inversion relocalizes another GATA2 cis-element (-77) to the proto-oncogene EVI1, inducing EVI1 expression and AML, whether this reflects ectopic or physiological activity is unknown. We describe a mouse strain that decouples -77 function from proto-oncogene deregulation. The -77(-/-) mice exhibited a novel phenotypic constellation including late embryonic lethality and anemia. The -77 established a vital sector of the myeloid progenitor transcriptome, conferring multipotentiality. Unlike the +9.5(-/-) embryos, hematopoietic stem cell genesis was unaffected in -77(-/-) embryos. These results illustrate a paradigm in which cis-elements in a locus differentially control stem and progenitor cell transitions, and therefore the individual cis-element alterations cause unique and overlapping disease phenotypes. Collapse Key Words GATA factor GATA-2 cis-element differentiation hematopoiesis myeloid progenitor stem cell transcriptome Collapse MESH Headings Collapse Grants R01 HL113066 NHLBI NIH HHS R01 DK050107 NIDDK NIH HHS P30 CA014520 NCI NIH HHS R01 DK068634 NIDDK NIH HHS U01 HG007019 NHGRI NIH HHS T32 HL007899 NHLBI NIH HHS R37 DK050107 NIDDK NIH HHS R56 DK068634 NIDDK NIH HHS R01 CA152108 NCI NIH HHS National Institutes of Health Collapse
19	Declined presentation hematopoietic signaling mechanism revealed from a stem/progenitor cell cistrome. Exp Hematol 2015. [DOI: 10.1016/j.exphem.2015.06.141] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
20	Linking heme biosynthesis with a GATA factor-regulated genetic network that controls cellular differentiation. Exp Hematol 2015. [DOI: 10.1016/j.exphem.2015.06.270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
21	Hematopoietic Signaling Mechanism Revealed from a Stem/Progenitor Cell Cistrome. Mol Cell 2015;59:62-74. [PMID: 26073540 DOI: 10.1016/j.molcel.2015.05.020] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Revised: 04/27/2015] [Accepted: 05/07/2015] [Indexed: 11/17/2022] Abstract Thousands of cis-elements in genomes are predicted to have vital functions. Although conservation, activity in surrogate assays, polymorphisms, and disease mutations provide functional clues, deletion from endogenous loci constitutes the gold-standard test. A GATA-2-binding, Gata2 intronic cis-element (+9.5) required for hematopoietic stem cell genesis in mice is mutated in a human immunodeficiency syndrome. Because +9.5 is the only cis-element known to mediate stem cell genesis, we devised a strategy to identify functionally comparable enhancers ("+9.5-like") genome-wide. Gene editing revealed +9.5-like activity to mediate GATA-2 occupancy, chromatin opening, and transcriptional activation. A +9.5-like element resided in Samd14, which encodes a protein of unknown function. Samd14 increased hematopoietic progenitor levels/activity and promoted signaling by a pathway vital for hematopoietic stem/progenitor cell regulation (stem cell factor/c-Kit), and c-Kit rescued Samd14 loss-of-function phenotypes. Thus, the hematopoietic stem/progenitor cell cistrome revealed a mediator of a signaling pathway that has broad importance for stem/progenitor cell biology. Collapse Key Words Collapse MESH Headings Amino Acid Sequence Animals Cell Differentiation/genetics Cell Line GATA2 Transcription Factor/genetics Hematopoietic Stem Cells/metabolism Mice Molecular Sequence Data Proteins/genetics Proteins/metabolism Proto-Oncogene Proteins c-kit/genetics RNA Interference RNA, Small Interfering Signal Transduction Transcription, Genetic/genetics Transcriptional Activation/genetics Collapse Grants U54 AI117924 NIAID NIH HHS DK68634 NIDDK NIH HHS R01 HG003747 NHGRI NIH HHS R01 DK050107 NIDDK NIH HHS P30 CA014520 NCI NIH HHS P30CA014520 NCI NIH HHS DK50107 NIDDK NIH HHS R01 DK068634 NIDDK NIH HHS U01 HG007019 NHGRI NIH HHS T32 HL007899 NHLBI NIH HHS HG0070019 NHGRI NIH HHS R37 DK050107 NIDDK NIH HHS R56 DK068634 NIDDK NIH HHS Collapse
22	EBSeq-HMM: a Bayesian approach for identifying gene-expression changes in ordered RNA-seq experiments. Bioinformatics 2015;31:2614-22. [PMID: 25847007 PMCID: PMC4528625 DOI: 10.1093/bioinformatics/btv193] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2014] [Accepted: 03/30/2015] [Indexed: 01/08/2023] Open Abstract Motivation: With improvements in next-generation sequencing technologies and reductions in price, ordered RNA-seq experiments are becoming common. Of primary interest in these experiments is identifying genes that are changing over time or space, for example, and then characterizing the specific expression changes. A number of robust statistical methods are available to identify genes showing differential expression among multiple conditions, but most assume conditions are exchangeable and thereby sacrifice power and precision when applied to ordered data. Results: We propose an empirical Bayes mixture modeling approach called EBSeq-HMM. In EBSeq-HMM, an auto-regressive hidden Markov model is implemented to accommodate dependence in gene expression across ordered conditions. As demonstrated in simulation and case studies, the output proves useful in identifying differentially expressed genes and in specifying gene-specific expression paths. EBSeq-HMM may also be used for inference regarding isoform expression. Availability and implementation: An R package containing examples and sample datasets is available at Bioconductor. Contact:kendzior@biostat.wisc.edu Supplementary information:Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
23	Evaluation of de novo transcriptome assemblies from RNA-Seq data. Genome Biol 2014. [PMID: 25608678 DOI: 10.1101/006338] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2023] Open Abstract De novo RNA-Seq assembly facilitates the study of transcriptomes for species without sequenced genomes, but it is challenging to select the most accurate assembly in this context. To address this challenge, we developed a model-based score, RSEM-EVAL, for evaluating assemblies when the ground truth is unknown. We show that RSEM-EVAL correctly reflects assembly accuracy, as measured by REF-EVAL, a refined set of ground-truth-based scores that we also developed. Guided by RSEM-EVAL, we assembled the transcriptome of the regenerating axolotl limb; this assembly compares favorably to a previous assembly. A software package implementing our methods, DETONATE, is freely available at http://deweylab.biostat.wisc.edu/detonate. Collapse Key Words Collapse MESH Headings Algorithms Animals Computational Biology/methods Computer Simulation Gene Expression Profiling/methods Sequence Analysis, RNA/methods Software Collapse Grants R01 HG005232 NHGRI NIH HHS T15 LM007359 NLM NIH HHS R01HG005232 NHGRI NIH HHS T15LM007359 NLM NIH HHS Collapse
24	Evaluation of de novo transcriptome assemblies from RNA-Seq data. Genome Biol 2014;15:553. [PMID: 25608678 PMCID: PMC4298084 DOI: 10.1186/s13059-014-0553-5] [Citation(s) in RCA: 196] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2014] [Accepted: 10/30/2014] [Indexed: 01/16/2023] Open Abstract De novo RNA-Seq assembly facilitates the study of transcriptomes for species without sequenced genomes, but it is challenging to select the most accurate assembly in this context. To address this challenge, we developed a model-based score, RSEM-EVAL, for evaluating assemblies when the ground truth is unknown. We show that RSEM-EVAL correctly reflects assembly accuracy, as measured by REF-EVAL, a refined set of ground-truth-based scores that we also developed. Guided by RSEM-EVAL, we assembled the transcriptome of the regenerating axolotl limb; this assembly compares favorably to a previous assembly. A software package implementing our methods, DETONATE, is freely available at http://deweylab.biostat.wisc.edu/detonate. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
25	Gata2 cis-element is required for hematopoietic stem cell generation in the mammalian embryo. ACTA ACUST UNITED AC 2013;210:2833-42. [PMID: 24297994 PMCID: PMC3865483 DOI: 10.1084/jem.20130733] [Citation(s) in RCA: 108] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Abstract Cis-element requirement for the emergence of HSCs in the AGM and for hemogenic endothelium to generate HSC-containing c-Kit⁺ cell clusters. The generation of hematopoietic stem cells (HSCs) from hemogenic endothelium within the aorta, gonad, mesonephros (AGM) region of the mammalian embryo is crucial for development of the adult hematopoietic system. We described a deletion of a Gata2 cis-element (+9.5) that depletes fetal liver HSCs, is lethal at E13–14 of embryogenesis, and is mutated in an immunodeficiency that progresses to myelodysplasia/leukemia. Here, we demonstrate that the +9.5 element enhances Gata2 expression and is required to generate long-term repopulating HSCs in the AGM. Deletion of the +9.5 element abrogated the capacity of hemogenic endothelium to generate HSC-containing clusters in the aorta. Genomic analyses indicated that the +9.5 element regulated a rich ensemble of genes that control hemogenic endothelium and HSCs, as well as genes not implicated in hematopoiesis. These results reveal a mechanism that controls stem cell emergence from hemogenic endothelium to establish the adult hematopoietic system. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
26	Bicaudal-C spatially controls translation of vertebrate maternal mRNAs. RNA (NEW YORK, N.Y.) 2013;19:1575-82. [PMID: 24062572 PMCID: PMC3851724 DOI: 10.1261/rna.041665.113] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023] Abstract The Xenopus Cripto-1 protein is confined to the cells of the animal hemisphere during early embryogenesis where it regulates the formation of anterior structures. Cripto-1 protein accumulates only in animal cells because cripto-1 mRNA in cells of the vegetal hemisphere is translationally repressed. Here, we show that the RNA binding protein, Bicaudal-C (Bic-C), functioned directly in this vegetal cell-specific repression. While Bic-C protein is normally confined to vegetal cells, ectopic expression of Bic-C in animal cells repressed a cripto-1 mRNA reporter and associated with endogenous cripto-1 mRNA. Repression by Bic-C required its N-terminal domain, comprised of multiple KH motifs, for specific binding to relevant control elements within the cripto-1 mRNA and a functionally separable C-terminal translation repression domain. Bic-C-mediated repression required the 5' CAP and translation initiation factors, but not a poly(A) tail or the conserved SAM domain within Bic-C. Bic-C-directed immunoprecipitation followed by deep sequencing of associated mRNAs identified multiple Bic-C-regulated mRNA targets, including cripto-1 mRNA, providing new insights and tools for understanding the role of Bic-C in vertebrate development. Collapse Key Words Bicaudal-C Xenopus maternal mRNAs translation Collapse MESH Headings 3' Untranslated Regions Animals Base Sequence GPI-Linked Proteins/biosynthesis GPI-Linked Proteins/genetics GPI-Linked Proteins/metabolism Intercellular Signaling Peptides and Proteins/biosynthesis Intercellular Signaling Peptides and Proteins/genetics Intercellular Signaling Peptides and Proteins/metabolism Membrane Proteins Protein Biosynthesis Protein Structure, Tertiary RNA, Messenger, Stored/genetics RNA, Messenger, Stored/metabolism RNA-Binding Proteins/chemistry RNA-Binding Proteins/genetics RNA-Binding Proteins/metabolism Sequence Analysis, RNA Xenopus Proteins/biosynthesis Xenopus Proteins/chemistry Xenopus Proteins/genetics Xenopus Proteins/metabolism Xenopus laevis/genetics Xenopus laevis/metabolism Collapse Grants T32 GM007215 NIGMS NIH HHS HG005232 NHGRI NIH HHS R37 GM031892 NIGMS NIH HHS GM31892 NIGMS NIH HHS T32GM07215 NIGMS NIH HHS R01 HG005232 NHGRI NIH HHS R01 GM031892 NIGMS NIH HHS GM50942 NIGMS NIH HHS R01 GM050942 NIGMS NIH HHS Collapse
27	De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 2013;8:1494-1512. [PMID: 23845962 DOI: 10.1038/nprot.2013.084.de] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/23/2023] Abstract De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h. Collapse Key Words Collapse MESH Headings Base Sequence Gene Expression Profiling/methods RNA/chemistry Schizosaccharomyces/genetics Schizosaccharomyces pombe Proteins/chemistry Schizosaccharomyces pombe Proteins/genetics Sequence Analysis, RNA/methods Software Transcriptome Collapse Grants HHSN272200900018C NIAID NIH HHS Howard Hughes Medical Institute 1R01HG005232-01A1 NHGRI NIH HHS P50 HG006193 NHGRI NIH HHS 5P50HG006193-02 NHGRI NIH HHS R01 HG005232 NHGRI NIH HHS Collapse
28	De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 2013;8:1494-512. [PMID: 23845962 PMCID: PMC3875132 DOI: 10.1038/nprot.2013.084] [Citation(s) in RCA: 5295] [Impact Index Per Article: 481.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Abstract De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h. Collapse Key Words Collapse MESH Headings Base Sequence Gene Expression Profiling/methods RNA/chemistry Schizosaccharomyces/genetics Schizosaccharomyces pombe Proteins/chemistry Schizosaccharomyces pombe Proteins/genetics Sequence Analysis, RNA/methods Software Transcriptome Collapse Grants HHSN272200900018C NIAID NIH HHS Howard Hughes Medical Institute 1R01HG005232-01A1 NHGRI NIH HHS P50 HG006193 NHGRI NIH HHS 5P50HG006193-02 NHGRI NIH HHS R01 HG005232 NHGRI NIH HHS Collapse
29	De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 2013. [PMID: 23845962 DOI: 10.1038/nprot.2013.084.] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Abstract De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
30	Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs. ACTA ACUST UNITED AC 2013;29:2300-10. [PMID: 23846746 PMCID: PMC3753571 DOI: 10.1093/bioinformatics/btt396] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Abstract Motivation: Alternative splicing and other processes that allow for different transcripts to be derived from the same gene are significant forces in the eukaryotic cell. RNA-Seq is a promising technology for analyzing alternative transcripts, as it does not require prior knowledge of transcript structures or genome sequences. However, analysis of RNA-Seq data in the presence of genes with large numbers of alternative transcripts is currently challenging due to efficiency, identifiability and representation issues. Results: We present RNA-Seq models and associated inference algorithms based on the concept of probabilistic splice graphs, which alleviate these issues. We prove that our models are often identifiable and demonstrate that our inference methods for quantification and differential processing detection are efficient and accurate. Availability: Software implementing our methods is available at http://deweylab.biostat.wisc.edu/psginfer. Contact:cdewey@biostat.wisc.edu Supplementary information:Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
31	Comparative RNA-seq analysis in the unsequenced axolotl: the oncogene burst highlights early gene expression in the blastema. PLoS Comput Biol 2013;9:e1002936. [PMID: 23505351 PMCID: PMC3591270 DOI: 10.1371/journal.pcbi.1002936] [Citation(s) in RCA: 101] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2012] [Accepted: 01/08/2013] [Indexed: 01/09/2023] Open Abstract The salamander has the remarkable ability to regenerate its limb after amputation. Cells at the site of amputation form a blastema and then proliferate and differentiate to regrow the limb. To better understand this process, we performed deep RNA sequencing of the blastema over a time course in the axolotl, a species whose genome has not been sequenced. Using a novel comparative approach to analyzing RNA-seq data, we characterized the transcriptional dynamics of the regenerating axolotl limb with respect to the human gene set. This approach involved de novo assembly of axolotl transcripts, RNA-seq transcript quantification without a reference genome, and transformation of abundances from axolotl contigs to human genes. We found a prominent burst in oncogene expression during the first day and blastemal/limb bud genes peaking at 7 to 14 days. In addition, we found that limb patterning genes, SALL genes, and genes involved in angiogenesis, wound healing, defense/immunity, and bone development are enriched during blastema formation and development. Finally, we identified a category of genes with no prior literature support for limb regeneration that are candidates for further evaluation based on their expression pattern during the regenerative process. Collapse Key Words Collapse MESH Headings Ambystoma mexicanum/genetics Ambystoma mexicanum/physiology Amputation, Surgical Animals Cluster Analysis Extremities/injuries Extremities/physiology Gene Expression Profiling/methods Gene Expression Regulation Oncogenes Regeneration/genetics Regeneration/physiology Sequence Analysis, RNA/methods Up-Regulation Wound Healing/genetics Wound Healing/physiology Collapse Grants R01 HG005232 NHGRI NIH HHS Collapse
32	Rbm20 regulates titin alternative splicing as a splicing repressor. Nucleic Acids Res 2013;41:2659-72. [PMID: 23307558 PMCID: PMC3575840 DOI: 10.1093/nar/gks1362] [Citation(s) in RCA: 94] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open Abstract Titin, a sarcomeric protein expressed primarily in striated muscles, is responsible for maintaining the structure and biomechanical properties of muscle cells. Cardiac titin undergoes developmental size reduction from 3.7 megadaltons in neonates to primarily 2.97 megadaltons in the adult. This size reduction results from gradually increased exon skipping between exons 50 and 219 of titin mRNA. Our previous study reported that Rbm20 is the splicing factor responsible for this process. In this work, we investigated its molecular mechanism. We demonstrate that Rbm20 mediates exon skipping by binding to titin pre-mRNA to repress the splicing of some regions; the exons/introns in these Rbm20-repressed regions are ultimately skipped. Rbm20 was also found to mediate intron retention and exon shuffling. The two Rbm20 speckles found in nuclei from muscle tissues were identified as aggregates of Rbm20 protein on the partially processed titin pre-mRNAs. Cooperative repression and alternative 3' splice site selection were found to be used by Rbm20 to skip different subsets of titin exons, and the splicing pathway selected depended on the ratio of Rbm20 to other splicing factors that vary with tissue type and developmental age. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
33	Genomic variation in natural populations of Drosophila melanogaster. Genetics 2012;192:533-98. [PMID: 22673804 PMCID: PMC3454882 DOI: 10.1534/genetics.112.142018] [Citation(s) in RCA: 242] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Accepted: 05/24/2012] [Indexed: 02/07/2023] Open Abstract This report of independent genome sequences of two natural populations of Drosophila melanogaster (37 from North America and 6 from Africa) provides unique insight into forces shaping genomic polymorphism and divergence. Evidence of interactions between natural selection and genetic linkage is abundant not only in centromere- and telomere-proximal regions, but also throughout the euchromatic arms. Linkage disequilibrium, which decays within 1 kbp, exhibits a strong bias toward coupling of the more frequent alleles and provides a high-resolution map of recombination rate. The juxtaposition of population genetics statistics in small genomic windows with gene structures and chromatin states yields a rich, high-resolution annotation, including the following: (1) 5'- and 3'-UTRs are enriched for regions of reduced polymorphism relative to lineage-specific divergence; (2) exons overlap with windows of excess relative polymorphism; (3) epigenetic marks associated with active transcription initiation sites overlap with regions of reduced relative polymorphism and relatively reduced estimates of the rate of recombination; (4) the rate of adaptive nonsynonymous fixation increases with the rate of crossing over per base pair; and (5) both duplications and deletions are enriched near origins of replication and their density correlates negatively with the rate of crossing over. Available demographic models of X and autosome descent cannot account for the increased divergence on the X and loss of diversity associated with the out-of-Africa migration. Comparison of the variation among these genomes to variation among genomes from D. simulans suggests that many targets of directional selection are shared between these species. Collapse Key Words Collapse MESH Headings Africa Animals Centromere/genetics Chromatin/genetics Chromosome Mapping Drosophila melanogaster/genetics Drosophila melanogaster/physiology Genetic Linkage Genetic Variation Genetics, Population Genome Linkage Disequilibrium Selection, Genetic Species Specificity Telomere/genetics Untranslated Regions/genetics X Chromosome/genetics Collapse Grants R00 GM080099 NIGMS NIH HHS R01 HG002942 NHGRI NIH HHS HG02942 NHGRI NIH HHS R01 GM094402 NIGMS NIH HHS R00-GM080099 NIGMS NIH HHS R01-GM094402 NIGMS NIH HHS Collapse
34	Whole-genome alignment. Methods Mol Biol 2012;855:237-57. [PMID: 22407711 DOI: 10.1007/978-1-61779-582-4_8] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Abstract Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction, and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses, such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make the most effective use of our rapidly growing databases of whole genomes. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
35	RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 2011;12:323. [PMID: 21816040 DOI: 10.1007/978-1-4939-0512-63] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2011] [Accepted: 08/04/2011] [Indexed: 05/28/2023] Open Abstract BACKGROUND RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. RESULTS We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. CONCLUSIONS RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive. Collapse Key Words Collapse MESH Headings Animals Computer Simulation Gene Expression Profiling/methods Humans Mice Protein Isoforms/genetics RNA/genetics Sequence Analysis, RNA/methods Software Collapse Grants R01 HG005232 NHGRI NIH HHS 1R01HG005232-01A1 NHGRI NIH HHS Collapse
36	RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 2011;12:323. [PMID: 21816040 PMCID: PMC3163565 DOI: 10.1186/1471-2105-12-323] [Citation(s) in RCA: 12282] [Impact Index Per Article: 944.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2011] [Accepted: 08/04/2011] [Indexed: 02/07/2023] Open Abstract Background RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. Results We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. Conclusions RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
37	RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 2011. [PMID: 21816040 DOI: 10.1186/1471-2105-12-323%0afile:///users/edong/documents/scientificpapers/library.papers3/articles/2011/li/li_2011_bmc_bioinformatics.pdf%0apapers3://publication/doi/1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open Abstract BACKGROUND RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. RESULTS We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. CONCLUSIONS RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
38	RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 2011. [PMID: 21816040 DOI: 10.1186/1471‐2105‐12‐323] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open Abstract BACKGROUND RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. RESULTS We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. CONCLUSIONS RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
39	Positional orthology: putting genomic evolutionary relationships into context. Brief Bioinform 2011;12:401-12. [PMID: 21705766 PMCID: PMC3178058 DOI: 10.1093/bib/bbr040] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open Abstract Orthology is a powerful refinement of homology that allows us to describe more precisely the evolution of genomes and understand the function of the genes they contain. However, because orthology is not concerned with genomic position, it is limited in its ability to describe genes that are likely to have equivalent roles in different genomes. Because of this limitation, the concept of ‘positional orthology’ has emerged, which describes the relation between orthologous genes that retain their ancestral genomic positions. In this review, we formally define this concept, for which we introduce the shorter term ‘toporthology’, with respect to the evolutionary events experienced by a gene’s ancestors. Through a discussion of recent studies on the role of genomic context in gene evolution, we show that the distinction between orthology and toporthology is biologically significant. We then review a number of orthology prediction methods that take genomic context into account and thus that may be used to infer the important relation of toporthology. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
40	BUCKy: gene tree/species tree reconciliation with Bayesian concordance analysis. ACTA ACUST UNITED AC 2010;26:2910-1. [PMID: 20861028 DOI: 10.1093/bioinformatics/btq539] [Citation(s) in RCA: 332] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Abstract MOTIVATION BUCKy is a C++ program that implements Bayesian concordance analysis. The method uses a non-parametric clustering of genes with compatible trees, and reconstructs the primary concordance tree from clades supported by the largest proportions of genes. A population tree with branch lengths in coalescent units is estimated from quartet concordance factors. AVAILABILITY BUCKy is open source and distributed under the GNU general public license at www.stat.wisc.edu/∼ane/bucky/. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
41	RNA-Seq gene expression estimation with read mapping uncertainty. ACTA ACUST UNITED AC 2009;26:493-500. [PMID: 20022975 PMCID: PMC2820677 DOI: 10.1093/bioinformatics/btp692] [Citation(s) in RCA: 790] [Impact Index Per Article: 52.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Abstract Motivation: RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which they are derived, a single read may map to multiple genes and isoforms, complicating expression analyses. Previous computational methods either discard reads that map to multiple locations or allocate them to genes heuristically. Results: We present a generative statistical model and associated inference methods that handle read mapping uncertainty in a principled manner. Through simulations parameterized by real RNA-Seq data, we show that our method is more accurate than previous methods. Our improved accuracy is the result of handling read mapping uncertainty with a statistical model and the estimation of gene expression levels as the sum of isoform expression levels. Unlike previous methods, our method is capable of modeling non-uniform read distributions. Simulations with our method indicate that a read length of 20–25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed. Availability: An initial C++ implementation of our method that was used for the results presented in this article is available at http://deweylab.biostat.wisc.edu/rsem. Contact:cdewey@biostat.wisc.edu Supplementary information:Supplementary data are available at Bioinformatics on Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
42	Fine-scale phylogenetic discordance across the house mouse genome. PLoS Genet 2009;5:e1000729. [PMID: 19936022 PMCID: PMC2770633 DOI: 10.1371/journal.pgen.1000729] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2009] [Accepted: 10/19/2009] [Indexed: 11/18/2022] Open Abstract Population genetic theory predicts discordance in the true phylogeny of different genomic regions when studying recently diverged species. Despite this expectation, genome-wide discordance in young species groups has rarely been statistically quantified. The house mouse subspecies group provides a model system for examining phylogenetic discordance. House mouse subspecies are recently derived, suggesting that even if there has been a simple tree-like population history, gene trees could disagree with the population history due to incomplete lineage sorting. Subspecies of house mice also hybridize in nature, raising the possibility that recent introgression might lead to additional phylogenetic discordance. Single-locus approaches have revealed support for conflicting topologies, resulting in a subspecies tree often summarized as a polytomy. To analyze phylogenetic histories on a genomic scale, we applied a recently developed method, Bayesian concordance analysis, to dense SNP data from three closely related subspecies of house mice: Mus musculus musculus, M. m. castaneus, and M. m. domesticus. We documented substantial variation in phylogenetic history across the genome. Although each of the three possible topologies was strongly supported by a large number of loci, there was statistical evidence for a primary phylogenetic history in which M. m. musculus and M. m. castaneus are sister subspecies. These results underscore the importance of measuring phylogenetic discordance in other recently diverged groups using methods such as Bayesian concordance analysis, which are designed for this purpose. Collapse Key Words Collapse MESH Headings Animals Base Sequence Bias Chromosomes, Mammalian/genetics Computer Simulation Genetic Loci/genetics Genome/genetics Mice/genetics Phylogeny Polymorphism, Single Nucleotide/genetics Species Specificity X Chromosome/genetics Collapse Grants T15 LM007359 NLM NIH HHS 2T15LM007359 NLM NIH HHS Collapse
43	Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res 2007;17:760-74. [PMID: 17567995 PMCID: PMC1891336 DOI: 10.1101/gr.6034307] [Citation(s) in RCA: 170] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Abstract A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization. Collapse Key Words Collapse MESH Headings Animals Evolution, Molecular Genome, Human Human Genome Project Humans Mammals/genetics Open Reading Frames Phylogeny Sequence Alignment Collapse Grants P41 HG002371 NHGRI NIH HHS R01 GM076705 NIGMS NIH HHS Intramural NIH HHS R43 HG002632 NHGRI NIH HHS U01 HG003150 NHGRI NIH HHS R01 HG002238 NHGRI NIH HHS Collapse
44	Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 2007;447:799-816. [PMID: 17571346 PMCID: PMC2212820 DOI: 10.1038/nature05874] [Citation(s) in RCA: 3782] [Impact Index Per Article: 222.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Abstract We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function. Collapse Key Words Collapse MESH Headings Chromatin/genetics Chromatin/metabolism Chromatin Immunoprecipitation Conserved Sequence/genetics DNA Replication Evolution, Molecular Exons/genetics Genetic Variation/genetics Genome, Human/genetics Genomics Heterozygote Histones/metabolism Humans Pilot Projects Protein Binding RNA, Messenger/genetics RNA, Untranslated/genetics Regulatory Sequences, Nucleic Acid/genetics Transcription Factors/metabolism Transcription Initiation Site Transcription, Genetic/genetics Collapse Grants R01 HG003541 NHGRI NIH HHS U01 HG003162-03 NHGRI NIH HHS U01 HG002523-01 NHGRI NIH HHS U54 HG003079-01 NHGRI NIH HHS R01 HG003521-01 NHGRI NIH HHS R01 HG003143 NHGRI NIH HHS P41 HG002371 NHGRI NIH HHS U01 HG003161-03 NHGRI NIH HHS U01 HG003156 NHGRI NIH HHS R01 HG003541-03 NHGRI NIH HHS 077198 Wellcome Trust U01 HG003157 NHGRI NIH HHS R01 HG003110 NHGRI NIH HHS U01 HG003161 NHGRI NIH HHS U54 HG003067-01 NHGRI NIH HHS P41 HG002371-03S1 NHGRI NIH HHS U01 HG003157-03 NHGRI NIH HHS U01 HG003147 NHGRI NIH HHS U01 HG003168-02 NHGRI NIH HHS U54 HG003067 NHGRI NIH HHS R01 HG003110-03 NHGRI NIH HHS U01 HG003156-03 NHGRI NIH HHS R01 HG003143-04 NHGRI NIH HHS U01 HG003150-03 NHGRI NIH HHS U01 HG003147-02 NHGRI NIH HHS R01 HG003532-01 NHGRI NIH HHS R01 HG003521 NHGRI NIH HHS U54 HG003273 NHGRI NIH HHS R01 HG003532 NHGRI NIH HHS R01 HG002238-15 NHGRI NIH HHS U01 HG003162 NHGRI NIH HHS K22 HG003169 NHGRI NIH HHS K22 HG003169-01A1 NHGRI NIH HHS F32 CA108313 NCI NIH HHS U54 HG003079 NHGRI NIH HHS U54 HG003273-01 NHGRI NIH HHS U01 HG003151 NHGRI NIH HHS Wellcome Trust 062023 Wellcome Trust U01 HG003151-03 NHGRI NIH HHS U01 HG002523 NHGRI NIH HHS R01 HG003129-03 NHGRI NIH HHS U01 HG003150 NHGRI NIH HHS R01 HG002238 NHGRI NIH HHS Collapse
45	Aligning multiple whole genomes with Mercator and MAVID. Methods Mol Biol 2007;395:221-36. [PMID: 17993677 DOI: 10.1007/978-1-59745-514-5_14] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Abstract The availability of an increasing number of whole genome sequences presents us with the need for tools to quickly put them into a nucleotide-level multiple alignment. Mercator and MAVID are two programs that can be combined to accomplish this task. Given multiple whole genomes as input, Mercator is first used to construct an orthology map, which is then used to guide nucleotide-level multiple alignments produced by MAVID. These programs are both fast and freely available, allowing researchers to perform genome alignments on a single laptop. This tutorial will guide the researcher through the steps required for whole-genome alignment with Mercator and MAVID. Collapse Key Words Collapse MESH Headings Databases, Genetic Genome Internet Sequence Alignment Collapse Grants Collapse
46	Evolution at the nucleotide level: the problem of multiple whole-genome alignment. Hum Mol Genet 2006;15 Spec No 1:R51-6. [PMID: 16651369 DOI: 10.1093/hmg/ddl056] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open Abstract With the genome sequences of numerous species at hand, we have the opportunity to discover how evolution has acted at each and every nucleotide in our genome. To this end, we must identify sets of nucleotides that have descended from a common ancestral nucleotide. The problem of identifying evolutionary-related nucleotides is that of sequence alignment. When the sequences under consideration are entire genomes, we have the problem of multiple whole-genome alignment. In this paper, we first state a series of definitions for homology and its subrelations between single nucleotides. Within this framework, we review the current methods available for the alignment of multiple large genomes. We then describe a subset of tools that make biological inferences from multiple whole-genome alignments. Collapse Key Words Collapse MESH Headings Animals Computational Biology Evolution, Molecular Genomics/methods Humans Models, Biological Nucleotides/analysis Nucleotides/metabolism Sequence Alignment/methods Sequence Homology Collapse Grants HG003150 NHGRI NIH HHS R01-HG2362-3 NHGRI NIH HHS Collapse
47	Parametric alignment of Drosophila genomes. PLoS Comput Biol 2006;2:e73. [PMID: 16789815 PMCID: PMC1480539 DOI: 10.1371/journal.pcbi.0020073] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2005] [Accepted: 05/10/2006] [Indexed: 12/29/2022] Open Abstract The classic algorithms of Needleman-Wunsch and Smith-Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). To process large genomes that have undergone complex genome rearrangements, almost all existing whole genome alignment methods apply fast heuristics to divide genomes into small pieces that are suitable for Needleman-Wunsch alignment. In these alignment methods, it is standard practice to fix the parameters and to produce a single alignment for subsequent analysis by biologists. As the number of alignment programs applied on a whole genome scale continues to increase, so does the disagreement in their results. The alignments produced by different programs vary greatly, especially in non-coding regions of eukaryotic genomes where the biologically correct alignment is hard to find. Parametric alignment is one possible remedy. This methodology resolves the issue of robustness to changes in parameters by finding all optimal alignments for all possible parameters in a PHMM. Our main result is the construction of a whole genome parametric alignment of Drosophila melanogaster and Drosophila pseudoobscura. This alignment draws on existing heuristics for dividing whole genomes into small pieces for alignment, and it relies on advances we have made in computing convex polytopes that allow us to parametrically align non-coding regions using biologically realistic models. We demonstrate the utility of our parametric alignment for biological inference by showing that cis-regulatory elements are more conserved between Drosophila melanogaster and Drosophila pseudoobscura than previously thought. We also show how whole genome parametric alignment can be used to quantitatively assess the dependence of branch length estimates on alignment parameters. Collapse Key Words Collapse MESH Headings Algorithms Animals Base Sequence Chromosome Mapping/methods Conserved Sequence Drosophila/genetics Genome, Insect/genetics Molecular Sequence Data Sequence Alignment/methods Sequence Analysis, DNA/methods Sequence Homology, Nucleic Acid Collapse Grants R01 HG002362 NHGRI NIH HHS U01 HG003150 NHGRI NIH HHS HG003150 NHGRI NIH HHS R01-HG2362-3 NHGRI NIH HHS Collapse
48	A genome-wide map of conserved microRNA targets in C. elegans. Curr Biol 2006;16:460-71. [PMID: 16458514 DOI: 10.1016/j.cub.2006.01.050] [Citation(s) in RCA: 346] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2005] [Revised: 01/19/2006] [Accepted: 01/24/2006] [Indexed: 12/19/2022] Abstract BACKGROUND Metazoan miRNAs regulate protein-coding genes by binding the 3' UTR of cognate mRNAs. Identifying targets for the 115 known C. elegans miRNAs is essential for understanding their function. RESULTS By using a new version of PicTar and sequence alignments of three nematodes, we predict that miRNAs regulate at least 10% of C. elegans genes through conserved interactions. We have developed a new experimental pipeline to assay 3' UTR-mediated posttranscriptional gene regulation via an endogenous reporter expression system amenable to high-throughput cloning, demonstrating the utility of this system using one of the most intensely studied miRNAs, let-7. Our expression analyses uncover several new potential let-7 targets and suggest a new let-7 activity in head muscle and neurons. To explore genome-wide trends in miRNA function, we analyzed functional categories of predicted target genes, finding that one-third of C. elegans miRNAs target gene sets are enriched for specific functional annotations. We have also integrated miRNA target predictions with other functional genomic data from C. elegans. CONCLUSIONS At least 10% of C. elegans genes are predicted miRNA targets, and a number of nematode miRNAs seem to regulate biological processes by targeting functionally related genes. We have also developed and successfully utilized an in vivo system for testing miRNA target predictions in likely endogenous expression domains. The thousands of genome-wide miRNA target predictions for nematodes, humans, and flies are available from the PicTar website and are linked to an accessible graphical network-browsing tool allowing exploration of miRNA target predictions in the context of various functional genomic data resources. Collapse Key Words Collapse MESH Headings Animals Base Sequence Caenorhabditis elegans/anatomy & histology Caenorhabditis elegans/genetics Caenorhabditis elegans/metabolism Chromosome Mapping/methods Computational Biology/methods Conserved Sequence Gene Expression Profiling/methods Gene Expression Regulation Genes, Reporter Genome, Helminth Genomics/methods MicroRNAs/physiology Molecular Sequence Data Sequence Alignment Collapse Grants R01-HD046236 NICHD NIH HHS R01-HG02362 NHGRI NIH HHS R21-HD049435 NICHD NIH HHS Collapse