1
|
Ahn JY, Kim S, Rok Kim C, Lee JH, Kim JM, Klompstra TM, Ha Choi Y, Jeon Y, Na Y, Kim JS, Okada Y, Lee H, Kim IS, Kim JK, Koo BK, Baek SH. Dual function of PHF16 in reinstating homeostasis of murine intestinal epithelium after crypt regeneration. Dev Cell 2024; 59:3089-3105.e7. [PMID: 39232563 DOI: 10.1016/j.devcel.2024.08.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Revised: 10/24/2023] [Accepted: 08/08/2024] [Indexed: 09/06/2024]
Abstract
Intestinal stem cells (ISCs) are highly vulnerable to damage, being in a constant state of proliferation. Reserve stem cells repair the intestinal epithelium following damage-induced ablation of ISCs. Here, we report that the epigenetic regulator plant homology domain (PHD) finger protein 16 (PHF16) restores homeostasis of the intestinal epithelium after initial damage-induced repair. In Phf16-/Y mice, revival stem cells (revSCs) showed defects in exiting the regenerative state, and intestinal crypt regeneration failed even though revSCs were still induced in response to tissue damage, as observed by single-cell RNA sequencing (scRNA-seq). Analysis of Phf16-/Y intestinal organoids by RNA sequencing (RNA-seq) and ATAC sequencing identified that PHF16 restores homeostasis of the intestinal epithelium by inducing retinoic acid receptor (RAR)/retinoic X receptor (RXR) target genes through HBO1-mediated histone H3K14 acetylation, while at the same time counteracting YAP/TAZ activity by ubiquitination of CDC73. Together, our findings demonstrate the importance of timely suppression of regenerative activity by PHF16 for the restoration of gut homeostasis after acute tissue injury.
Collapse
Affiliation(s)
- Jun-Yeong Ahn
- Creative Research Initiatives Center for Epigenetic Code and Diseases, Seoul National University, Seoul 08826, South Korea; School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
| | - Somi Kim
- Department of Life Sciences, Pohang University of Science and Technology (POSTECH), Pohang 37673, South Korea
| | - Chang Rok Kim
- Creative Research Initiatives Center for Epigenetic Code and Diseases, Seoul National University, Seoul 08826, South Korea; School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
| | - Ji-Hyun Lee
- Center for Genome Engineering, Institute for Basic Science, 55, Expo-ro, Yuseong-gu, Daejeon 34126, South Korea
| | - Jong Min Kim
- Creative Research Initiatives Center for Epigenetic Code and Diseases, Seoul National University, Seoul 08826, South Korea; School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
| | - Thomas M Klompstra
- Center for Genome Engineering, Institute for Basic Science, 55, Expo-ro, Yuseong-gu, Daejeon 34126, South Korea
| | - Yoon Ha Choi
- Department of Life Sciences, Pohang University of Science and Technology (POSTECH), Pohang 37673, South Korea
| | - Yoon Jeon
- Department of Cancer Biomedical Science, Graduate School of Cancer Science and Policy, National Cancer Center, Goyang 10408, South Korea
| | - Yongwoo Na
- School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
| | - Jong-Seo Kim
- School of Biological Sciences, Seoul National University, Seoul 08826, South Korea; Center for RNA Research, Institute for Basic Science, School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
| | - Yuki Okada
- Laboratory of Pathology and Development, Institute for Quantitative Biosciences, The University of Tokyo, Tokyo 113-0032, Japan
| | - Ho Lee
- Department of Cancer Biomedical Science, Graduate School of Cancer Science and Policy, National Cancer Center, Goyang 10408, South Korea
| | - Ik Soo Kim
- Department of Microbiology, Gachon University College of Medicine, Incheon 21999, South Korea.
| | - Jong Kyoung Kim
- Department of Life Sciences, Pohang University of Science and Technology (POSTECH), Pohang 37673, South Korea; Institute for Convergence Research and Education in Advanced Technology, Yonsei University, Seoul 03722, South Korea.
| | - Bon-Kyoung Koo
- Department of Life Sciences, Pohang University of Science and Technology (POSTECH), Pohang 37673, South Korea; Center for Genome Engineering, Institute for Basic Science, 55, Expo-ro, Yuseong-gu, Daejeon 34126, South Korea.
| | - Sung Hee Baek
- Creative Research Initiatives Center for Epigenetic Code and Diseases, Seoul National University, Seoul 08826, South Korea; School of Biological Sciences, Seoul National University, Seoul 08826, South Korea.
| |
Collapse
|
2
|
Stanley S, Silva-Costa C, Gomes-Silva J, Melo-Cristino J, Malley R, Ramirez M. CC180 clade dynamics do not universally explain Streptococcus pneumoniae serotype 3 persistence post-vaccine: a global comparative population genomics study. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.08.29.24312665. [PMID: 39252931 PMCID: PMC11383505 DOI: 10.1101/2024.08.29.24312665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Background Clonal complex 180 (CC180) is currently the major clone of serotype 3 Streptococcus pneumoniae (Spn). The 13-valent pneumococcal conjugate vaccine (PCV13) does not have significant efficacy against serotype 3 despite polysaccharide inclusion in the vaccine. It was hypothesized that PCV13 may effectively control Clade I of CC180 but that Clades III and IV are resistant, provoking a population shift that enables serotype 3 persistence. This has been observed in the United States, England, and Wales but not Spain. We tested this hypothesis further utilizing a dataset from Portugal. Methods We whole-genome sequenced (WGS) 501 serotype 3 strains from Portugal isolated from patients with pneumococcal infections between 1999-2020. The draft genomes underwent phylogenetic analyses, pangenome profiling, and a genome-wide association study (GWAS). We also completed antibiotic susceptibility testing and compiled over 2,600 serotype 3 multilocus sequence type 180 (MLST180) WGSs to perform global comparative genomics. Findings CC180 Clades I, II, III, IV, and VI distributions were similar when comparing non-invasive pneumonia isolates and invasive disease isolates (Fisher's exact test, P=0.29), and adult and pediatric cases (Fisher's exact test, P=0.074). The serotype 3 CCs shifted post-PCV13 (Fisher's exact test, P<0.0001) and Clade I became dominant. Clade I is largely antibiotic-sensitive and carries the phiOXC141 prophage but the pangenome is heterogenous. Strains from Portugal and Spain, where Clade I remains dominant post-PCV13, have larger pangenomes and are associated with the presence of two genes encoding hypothetical proteins. Interpretation Clade I became dominant in Portugal post-PCV13, despite the burden of the prophage and antibiotic sensitivity. The accessory genome content may mitigate these fitness costs. Regional differences in Clade I prevalence and pangenome heterogeneity suggest that clade dynamics is not a generalizable approach to understanding serotype 3 vaccine escape. Funding National Institute of Child Health and Human Development, Pfizer, and Merck Sharp & Dohme.
Collapse
|
3
|
Pahlevan Kakhki M, Giordano A, Starvaggi Cucuzza C, Venkata S Badam T, Samudyata S, Lemée MV, Stridh P, Gkogka A, Shchetynsky K, Harroud A, Gyllenberg A, Liu Y, Boddul S, James T, Sorosina M, Filippi M, Esposito F, Wermeling F, Gustafsson M, Casaccia P, Hillert J, Olsson T, Kockum I, Sellgren CM, Golzio C, Kular L, Jagodic M. A genetic-epigenetic interplay at 1q21.1 locus underlies CHD1L-mediated vulnerability to primary progressive multiple sclerosis. Nat Commun 2024; 15:6419. [PMID: 39079955 PMCID: PMC11289459 DOI: 10.1038/s41467-024-50794-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 07/21/2024] [Indexed: 08/02/2024] Open
Abstract
Multiple Sclerosis (MS) is a heterogeneous inflammatory and neurodegenerative disease with an unpredictable course towards progressive disability. Treating progressive MS is challenging due to limited insights into the underlying mechanisms. We examined the molecular changes associated with primary progressive MS (PPMS) using a cross-tissue (blood and post-mortem brain) and multilayered data (genetic, epigenetic, transcriptomic) from independent cohorts. In PPMS, we found hypermethylation of the 1q21.1 locus, controlled by PPMS-specific genetic variations and influencing the expression of proximal genes (CHD1L, PRKAB2) in the brain. Evidence from reporter assay and CRISPR/dCas9 experiments supports a causal link between methylation and expression and correlation network analysis further implicates these genes in PPMS brain processes. Knock-down of CHD1L in human iPSC-derived neurons and knock-out of chd1l in zebrafish led to developmental and functional deficits of neurons. Thus, several lines of evidence suggest a distinct genetic-epigenetic-transcriptional interplay in the 1q21.1 locus potentially contributing to PPMS pathogenesis.
Collapse
Affiliation(s)
- Majid Pahlevan Kakhki
- Department of Clinical Neuroscience, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, Sweden
| | - Antonino Giordano
- Department of Clinical Neuroscience, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, Sweden
- Neurology and Neurorehabilitation Units, IRCCS San Raffaele Hospital, Milan, Italy
- Laboratory of Human Genetics of Neurological Disorders, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
- Università Vita-Salute San Raffaele, Milan, Italy
| | - Chiara Starvaggi Cucuzza
- Department of Clinical Neuroscience, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, Sweden
- Center for Neurology, Academic Specialist Center, Stockholm, Sweden
| | - Tejaswi Venkata S Badam
- Department of Clinical Neuroscience, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, Sweden
- Department of Bioinformatics, Institute for Physics chemistry and Biology (IFM), Linköping university, Linköping, Sweden
| | - Samudyata Samudyata
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| | - Marianne Victoria Lemée
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Illkirch, France
- Centre National de la Recherche Scientifique, UMR7104, Illkirch, France
- Institut National de la Santé et de la Recherche Médicale, U1258, Illkirch, France
- Université de Strasbourg, Strasbourg, France
| | - Pernilla Stridh
- Department of Clinical Neuroscience, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, Sweden
| | - Asimenia Gkogka
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| | - Klementy Shchetynsky
- Department of Clinical Neuroscience, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, Sweden
| | - Adil Harroud
- The Neuro (Montreal Neurological Institute-Hospital), Montréal, QC, Canada
- Department of Neurology and Neurosurgery, McGill University, Montréal, QC, Canada
- Department of Human Genetics, McGill University, Montréal, QC, Canada
| | - Alexandra Gyllenberg
- Department of Clinical Neuroscience, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, Sweden
| | - Yun Liu
- MOE Key Laboratory of Metabolism and Molecular Medicine, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences and Shanghai Xuhui Central Hospital, Fudan University, Shanghai, China
| | - Sanjaykumar Boddul
- Department of Medicine, Solna, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, Sweden
| | - Tojo James
- Department of Clinical Neuroscience, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, Sweden
| | - Melissa Sorosina
- Laboratory of Human Genetics of Neurological Disorders, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Massimo Filippi
- Neurology and Neurorehabilitation Units, IRCCS San Raffaele Hospital, Milan, Italy
- Università Vita-Salute San Raffaele, Milan, Italy
- Neurophysiology Unit, IRCCS San Raffaele Hospital, Milan, Italy
- Neuroimaging Research Unit, Division of Neuroscience, San Raffaele Scientific Institute, Milan, Italy
| | - Federica Esposito
- Neurology and Neurorehabilitation Units, IRCCS San Raffaele Hospital, Milan, Italy
- Laboratory of Human Genetics of Neurological Disorders, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Fredrik Wermeling
- Department of Medicine, Solna, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, Sweden
| | - Mika Gustafsson
- Department of Bioinformatics, Institute for Physics chemistry and Biology (IFM), Linköping university, Linköping, Sweden
| | - Patrizia Casaccia
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Jan Hillert
- Department of Clinical Neuroscience, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, Sweden
| | - Tomas Olsson
- Department of Clinical Neuroscience, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, Sweden
| | - Ingrid Kockum
- Department of Clinical Neuroscience, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, Sweden
| | - Carl M Sellgren
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
- Center for Psychiatry Research, Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
- Stockholm Health Care Services, Stockholm County Council, Stockholm, Sweden
| | - Christelle Golzio
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Illkirch, France
- Centre National de la Recherche Scientifique, UMR7104, Illkirch, France
- Institut National de la Santé et de la Recherche Médicale, U1258, Illkirch, France
- Université de Strasbourg, Strasbourg, France
| | - Lara Kular
- Department of Clinical Neuroscience, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, Sweden.
| | - Maja Jagodic
- Department of Clinical Neuroscience, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, Sweden.
| |
Collapse
|
4
|
Ryšavý P, Železný F. Reference-free phylogeny from sequencing data. BioData Min 2023; 16:13. [PMID: 36973746 PMCID: PMC10045052 DOI: 10.1186/s13040-023-00329-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 03/09/2023] [Indexed: 03/29/2023] Open
Abstract
Abstract
Motivation
Clustering of genetic sequences is one of the key parts of bioinformatics analyses. Resulting phylogenetic trees are beneficial for solving many research questions, including tracing the history of species, studying migration in the past, or tracing a source of a virus outbreak. At the same time, biologists provide more data in the raw form of reads or only on contig-level assembly. Therefore, tools that are able to process those data without supervision need to be developed.
Results
In this paper, we present a tool for reference-free phylogeny capable of handling data where no mature-level assembly is available. The tool allows distance calculation for raw reads, contigs, and the combination of the latter. The tool provides an estimation of the Levenshtein distance between the sequences, which in turn estimates the number of mutations between the organisms. Compared to the previous research, the novelty of the method lies in a newly proposed combination of the read and contig measures, a new method for read-contig mapping, and an efficient embedding of contigs.
Collapse
|
5
|
Napoli FR, Daly CM, Neal S, McCulloch KJ, Zaloga AR, Liu A, Koenig KM. Cephalopod retinal development shows vertebrate-like mechanisms of neurogenesis. Curr Biol 2022; 32:5045-5056.e3. [PMID: 36356573 PMCID: PMC9729453 DOI: 10.1016/j.cub.2022.10.027] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 09/30/2022] [Accepted: 10/14/2022] [Indexed: 11/10/2022]
Abstract
Coleoid cephalopods, including squid, cuttlefish, and octopus, have large and complex nervous systems and high-acuity, camera-type eyes. These traits are comparable only to features that are independently evolved in the vertebrate lineage. The size of animal nervous systems and the diversity of their constituent cell types is a result of the tight regulation of cellular proliferation and differentiation in development. Changes in the process of development during evolution that result in a diversity of neural cell types and variable nervous system size are not well understood. Here, we have pioneered live-imaging techniques and performed functional interrogation to show that the squid Doryteuthis pealeii utilizes mechanisms during retinal neurogenesis that are hallmarks of vertebrate processes. We find that retinal progenitor cells in the squid undergo nuclear migration until they exit the cell cycle. We identify retinal organization corresponding to progenitor, post-mitotic, and differentiated cells. Finally, we find that Notch signaling may regulate both retinal cell cycle and cell fate. Given the convergent evolution of elaborate visual systems in cephalopods and vertebrates, these results reveal common mechanisms that underlie the growth of highly proliferative neurogenic primordia. This work highlights mechanisms that may alter ontogenetic allometry and contribute to the evolution of complexity and growth in animal nervous systems.
Collapse
Affiliation(s)
- Francesca R Napoli
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA 02138, USA; Department of Organismic and Evolutionary Biology, Harvard University, Harvard University, Cambridge, MA 02138, USA
| | - Christina M Daly
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA 02138, USA; Department of Organismic and Evolutionary Biology, Harvard University, Harvard University, Cambridge, MA 02138, USA
| | - Stephanie Neal
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA 02138, USA; Department of Organismic and Evolutionary Biology, Harvard University, Harvard University, Cambridge, MA 02138, USA
| | - Kyle J McCulloch
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA 02138, USA; Department of Organismic and Evolutionary Biology, Harvard University, Harvard University, Cambridge, MA 02138, USA
| | - Alexandra R Zaloga
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA 02138, USA; Department of Organismic and Evolutionary Biology, Harvard University, Harvard University, Cambridge, MA 02138, USA
| | - Alicia Liu
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA 02138, USA; Department of Organismic and Evolutionary Biology, Harvard University, Harvard University, Cambridge, MA 02138, USA
| | - Kristen M Koenig
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA 02138, USA; Department of Organismic and Evolutionary Biology, Harvard University, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
6
|
Chakraborty S, Hossain A, Cao T, Gnanagobal H, Segovia C, Hill S, Monk J, Porter J, Boyce D, Hall JR, Bindea G, Kumar S, Santander J. Multi-Organ Transcriptome Response of Lumpfish ( Cyclopterus lumpus) to Aeromonas salmonicida Subspecies salmonicida Systemic Infection. Microorganisms 2022; 10:2113. [PMID: 36363710 PMCID: PMC9692985 DOI: 10.3390/microorganisms10112113] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 10/17/2022] [Accepted: 10/21/2022] [Indexed: 09/10/2023] Open
Abstract
Lumpfish is utilized as a cleaner fish to biocontrol sealice infestations in Atlantic salmon farms. Aeromonas salmonicida, a Gram-negative facultative intracellular pathogen, is the causative agent of furunculosis in several fish species, including lumpfish. In this study, lumpfish were intraperitoneally injected with different doses of A. salmonicida to calculate the LD50. Samples of blood, head-kidney, spleen, and liver were collected at different time points to determine the infection kinetics. We determined that A. salmonicida LD50 is 102 CFU per dose. We found that the lumpfish head-kidney is the primary target organ of A. salmonicida. Triplicate biological samples were collected from head-kidney, spleen, and liver pre-infection and at 3- and 10-days post-infection for RNA-sequencing. The reference genome-guided transcriptome assembly resulted in 6246 differentially expressed genes. The de novo assembly resulted in 403,204 transcripts, which added 1307 novel genes not identified by the reference genome-guided transcriptome. Differential gene expression and gene ontology enrichment analyses suggested that A. salmonicida induces lethal infection in lumpfish by uncontrolled and detrimental blood coagulation, complement activation, inflammation, DNA damage, suppression of the adaptive immune system, and prevention of cytoskeleton formation.
Collapse
Affiliation(s)
- Setu Chakraborty
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Ahmed Hossain
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Trung Cao
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Hajarooba Gnanagobal
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Cristopher Segovia
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Stephen Hill
- Cold-Ocean Deep-Sea Research Facility, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Jennifer Monk
- Dr. Joe Brown Aquatic Research Building, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Jillian Porter
- Dr. Joe Brown Aquatic Research Building, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Danny Boyce
- Dr. Joe Brown Aquatic Research Building, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Jennifer R. Hall
- Aquatic Research Cluster, CREAIT Network, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Gabriela Bindea
- INSERM, Laboratory of Integrative Cancer Immunology, 75006 Paris, France
- Equipe Labellisée Ligue Contre Le Cancer, 75013 Paris, France
- Centre de Recherche des Cordeliers, Sorbonne Université, Université de Paris, 75006 Paris, France
| | - Surendra Kumar
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
- Ocean Frontier Institute, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Javier Santander
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| |
Collapse
|
7
|
Yang T, Henao R. TAMC: A deep-learning approach to predict motif-centric transcriptional factor binding activity based on ATAC-seq profile. PLoS Comput Biol 2022; 18:e1009921. [PMID: 36094959 PMCID: PMC9499209 DOI: 10.1371/journal.pcbi.1009921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 09/22/2022] [Accepted: 08/24/2022] [Indexed: 11/18/2022] Open
Abstract
Determining transcriptional factor binding sites (TFBSs) is critical for understanding the molecular mechanisms regulating gene expression in different biological conditions. Biological assays designed to directly mapping TFBSs require large sample size and intensive resources. As an alternative, ATAC-seq assay is simple to conduct and provides genomic cleavage profiles that contain rich information for imputing TFBSs indirectly. Previous footprint-based tools are inheritably limited by the accuracy of their bias correction algorithms and the efficiency of their feature extraction models. Here we introduce TAMC (Transcriptional factor binding prediction from ATAC-seq profile at Motif-predicted binding sites using Convolutional neural networks), a deep-learning approach for predicting motif-centric TF binding activity from paired-end ATAC-seq data. TAMC does not require bias correction during signal processing. By leveraging a one-dimensional convolutional neural network (1D-CNN) model, TAMC make predictions based on both footprint and non-footprint features at binding sites for each TF and outperforms existing footprinting tools in TFBS prediction particularly for ATAC-seq data with limited sequencing depth.
Collapse
Affiliation(s)
- Tianqi Yang
- Department of Pharmacology and Cancer Biology, Duke University School of Medicine, Durham, North Carolina, United States of America
- Department of Cell Biology, Duke University School of Medicine, Durham, North Carolina, United States of America
- * E-mail: (TY); (RH)
| | - Ricardo Henao
- Center for Applied Genomics and Precision Medicine, Duke University School of Medicine, Durham, North Carolina, United States of America
- Department of Biostatistics and Informatics, Duke University, Durham, North Carolina, United States of America
- * E-mail: (TY); (RH)
| |
Collapse
|
8
|
Yang T, Ou J, Yildirim E. Xist exerts gene-specific silencing during XCI maintenance and impacts lineage-specific cell differentiation and proliferation during hematopoiesis. Nat Commun 2022; 13:4464. [PMID: 35915095 PMCID: PMC9343370 DOI: 10.1038/s41467-022-32273-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 07/21/2022] [Indexed: 11/12/2022] Open
Abstract
X chromosome inactivation (XCI) is a dosage compensation phenomenon that occurs in females. Initiation of XCI depends on Xist RNA, which triggers silencing of one of the two X chromosomes, except for XCI escape genes that continue to be biallelically expressed. In the soma XCI is stably maintained with continuous Xist expression. How Xist impacts XCI maintenance remains an open question. Here we conditionally delete Xist in hematopoietic system of mice and report differentiation and cell cycle defects in female hematopoietic stem and progenitor cells (HSPCs). By utilizing female HSPCs and mouse embryonic fibroblasts, we find that X-linked genes show variable tolerance to Xist loss. Specifically, XCI escape genes exhibit preferential transcriptional upregulation, which associates with low H3K27me3 occupancy and high chromatin accessibility that accommodates preexisting binding of transcription factors such as Yin Yang 1 (YY1) at the basal state. We conclude that Xist is necessary for gene-specific silencing during XCI maintenance and impacts lineage-specific cell differentiation and proliferation during hematopoiesis. Here the authors investigate the functional relevance of X-chromosome inactivation (XCI) regulator Xist in hematopoiesis. They find that Xist loss leads to changes in the ratio of hematopoietic progenitor cells and results in chromatin accessibility and transcriptional upregulation on the inactive X chromosome, including XCI escape genes known to be associated with cell cycle and immune response.
Collapse
Affiliation(s)
- Tianqi Yang
- Department of Cell Biology, Duke University Medical Center, Durham, NC, 27710, USA.,Department of Pharmacology and Cancer Biology, Duke University Medical Center, Durham, NC, 27710, USA.,Duke Regeneration Center, Duke University, Durham, NC, 27710, USA.,Duke Cancer Institute, Duke University Medical Center, Durham, NC, 27710, USA
| | - Jianhong Ou
- Department of Cell Biology, Duke University Medical Center, Durham, NC, 27710, USA.,Duke Regeneration Center, Duke University, Durham, NC, 27710, USA
| | - Eda Yildirim
- Department of Cell Biology, Duke University Medical Center, Durham, NC, 27710, USA. .,Duke Regeneration Center, Duke University, Durham, NC, 27710, USA. .,Duke Cancer Institute, Duke University Medical Center, Durham, NC, 27710, USA.
| |
Collapse
|
9
|
Barber AM, Helms A, Thompson R, Whitlock BK, Steffen DJ, Petersen JL. Whole-genome sequencing to investigate a possible genetic basis of perosomus elumbis in a calf resulting from a consanguineous mating. Transl Anim Sci 2021. [DOI: 10.1093/tas/txab171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Alexa M Barber
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, NE 68503, USA
| | - Alyssa Helms
- Department of Large Animal Clinical Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Riley Thompson
- College of Veterinary Medicine and Biomedical Sciences, Colorado State University, Fort Collins, CO 80523, USA
| | - Brian K Whitlock
- Department of Large Animal Clinical Sciences, University of Tennessee, Knoxville, TN 37996, USA
| | - David J Steffen
- School of Veterinary Medicine and Biomedical Science, University of Nebraska – Lincoln, Lincoln, NE 68503, USA
| | - Jessica L Petersen
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, NE 68503, USA
| |
Collapse
|
10
|
Wen G, Li M, Li F, Yang Z, Zhou T, Gu W. AQUARIUM: accurate quantification of circular isoforms using model-based strategy. Bioinformatics 2021; 37:4879-4881. [PMID: 34115093 DOI: 10.1093/bioinformatics/btab435] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 05/11/2021] [Accepted: 06/08/2021] [Indexed: 01/22/2023] Open
Abstract
SUMMARY Currently, most computational methods estimate the expression of circular RNAs (circRNAs) using the number of sequencing reads that support back-splicing junctions (BSJ) in RNA-seq data, which may introduce biased estimation of circRNA expression due to the uneven distribution of sequencing reads. To overcome this, we previously developed a model-based strategy for circRNA quantification, enabling consideration of sequencing reads from the entire transcript. Yet, the lack of exact transcript structure of circRNAs may limit its accuracy. Here, we proposed a substantially improved circRNA quantification tool, AQUARIUM, by introducing the full-length RNA structure of circular isoforms. We assessed its performance in circRNA quantification using both biological and simulated rRNA-depleted RNA-seq datasets, and demonstrated its superior performance at both BSJ and isoform level. AVAILABILITY AND IMPLEMENTATION AQUARIUM is freely available at https://github.com/wanjun-group-seu/AQUARIUM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guoxia Wen
- State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing, Jiangsu, 210096, China
| | - Musheng Li
- Department of Physiology and Cell Biology, University of Nevada, Reno School of Medicine, Reno, Nevada, 89557, USA
| | - Fuyu Li
- State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing, Jiangsu, 210096, China
| | - Zengyan Yang
- State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing, Jiangsu, 210096, China
| | - Tong Zhou
- Department of Physiology and Cell Biology, University of Nevada, Reno School of Medicine, Reno, Nevada, 89557, USA
| | - Wanjun Gu
- State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing, Jiangsu, 210096, China.,Collaborative Innovation Center of Jiangsu Province of Cancer Prevention and Treatment of Chinese Medicine, Nanjing, Jiangsu, 210023, China.,School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing, Jiangsu, 210023, China
| |
Collapse
|
11
|
Davies P, Jones M, Liu J, Hebenstreit D. Anti-bias training for (sc)RNA-seq: experimental and computational approaches to improve precision. Brief Bioinform 2021; 22:6265204. [PMID: 33959753 PMCID: PMC8574610 DOI: 10.1093/bib/bbab148] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 03/10/2021] [Accepted: 03/26/2021] [Indexed: 12/29/2022] Open
Abstract
RNA-seq, including single cell RNA-seq (scRNA-seq), is plagued by insufficient sensitivity and lack of precision. As a result, the full potential of (sc)RNA-seq is limited. Major factors in this respect are the presence of global bias in most datasets, which affects detection and quantitation of RNA in a length-dependent fashion. In particular, scRNA-seq is affected by technical noise and a high rate of dropouts, where the vast majority of original transcripts is not converted into sequencing reads. We discuss these biases origins and implications, bioinformatics approaches to correct for them, and how biases can be exploited to infer characteristics of the sample preparation process, which in turn can be used to improve library preparation.
Collapse
Affiliation(s)
- Philip Davies
- Daniel Hebenstreit's Research Group University of Warwick, CV4 7AL Coventry, UK
| | - Matt Jones
- Daniel Hebenstreit's Research Group University of Warwick, CV4 7AL Coventry, UK
| | - Juntai Liu
- Physics Department, University of Warwick, CV4 7AL Coventry, UK
| | | |
Collapse
|
12
|
Wang Q, Liu Z, Yan B, Chou WC, Ettwiller L, Ma Q, Liu B. A novel computational framework for genome-scale alternative transcription units prediction. Brief Bioinform 2021; 22:6265223. [PMID: 33957668 DOI: 10.1093/bib/bbab162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Revised: 03/18/2021] [Accepted: 04/07/2021] [Indexed: 11/12/2022] Open
Abstract
Alternative transcription units (ATUs) are dynamically encoded under different conditions and display overlapping patterns (sharing one or more genes) under a specific condition in bacterial genomes. Genome-scale identification of ATUs is essential for studying the emergence of human diseases caused by bacterial organisms. However, it is unrealistic to identify all ATUs using experimental techniques because of the complexity and dynamic nature of ATUs. Here, we present the first-of-its-kind computational framework, named SeqATU, for genome-scale ATU prediction based on next-generation RNA-Seq data. The framework utilizes a convex quadratic programming model to seek an optimum expression combination of all of the to-be-identified ATUs. The predicted ATUs in Escherichia coli reached a precision of 0.77/0.74 and a recall of 0.75/0.76 in the two RNA-Sequencing datasets compared with the benchmarked ATUs from third-generation RNA-Seq data. In addition, the proportion of 5'- or 3'-end genes of the predicted ATUs, having documented transcription factor binding sites and transcription termination sites, was three times greater than that of no 5'- or 3'-end genes. We further evaluated the predicted ATUs by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes functional enrichment analyses. The results suggested that gene pairs frequently encoded in the same ATUs are more functionally related than those that can belong to two distinct ATUs. Overall, these results demonstrated the high reliability of predicted ATUs. We expect that the new insights derived by SeqATU will not only improve the understanding of the transcription mechanism of bacteria but also guide the reconstruction of a genome-scale transcriptional regulatory network.
Collapse
Affiliation(s)
- Qi Wang
- School of Mathematics, Shandong University, Jinan 250200, China
| | - Zhaoqian Liu
- School of Mathematics, Shandong University, Jinan 250200, China.,Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Bo Yan
- New England Biolabs Inc., Ipswich, MA 01938, USA
| | - Wen-Chi Chou
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan 250200, China
| |
Collapse
|
13
|
Liu G, Wang J, Hou X. Transcriptome-Wide N6-Methyladenosine (m 6A) Methylome Profiling of Heat Stress in Pak-choi ( Brassica rapa ssp. chinensis). PLANTS 2020; 9:plants9091080. [PMID: 32842619 PMCID: PMC7570095 DOI: 10.3390/plants9091080] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Revised: 08/08/2020] [Accepted: 08/20/2020] [Indexed: 11/17/2022]
Abstract
Background: In higher eukaryotes, N6-methyladenosine (m6A) is the most common internal form of messenger RNA modification. By mapping the m6A methyl genome in multiple species, the potential regulatory function of reversible m6A methylation on mRNA is revealed. Recent studies have shown that RNA m6A modification influences mRNA transcription, location, translation, stability, splicing, and nuclear export. However, there are not enough data on the m6A transcriptome-wide map and its potential biological role in the heat stress of Pak-choi (Brassica rapa ssp. chinensis). Methods: In this work, MeRIP-seq was used to obtain the first transcriptome-wide profiling of RNA m6A modification in Pak-choi. Meanwhile, the transcriptome data were obtained by analyzing the input samples’ sequencing data. Results: Our research indicated that with three replicates, there were 11,252 common m6A peaks and 9729 common m6A-containing genes identified in the normal (CK) and heat stress (T43) groups. It was found that m6A peaks were highly enriched in the 3′ untranslated region in both CK and T43 groups. About 80% of the genes have one m6A site. The consensus sequence of m6A peaks was also enriched, which showed as AAACCV (V: U/A/G). In addition, association analysis found that there is a certain relationship between the degree of m6A methylation and the transcription level, indicating that m6A plays a certain regulatory role in gene expression. Conclusion: This comprehensive map in the study may provide a solid basis for determining the potential function of RNA m6A modification in Pak-choi under normal (CK) and heat stress (T43) conditions.
Collapse
Affiliation(s)
- Gaofeng Liu
- State Key Laboratory of Crop Genetics and Germplasm Enhancement/Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops in East China, Ministry of Agriculture/Engineering Research Center of Germplasm Enhancement and Utilization of Horticultural Crops, Ministry of Education, Nanjing Agricultural University, Nanjing 210095, China; (G.L.); (J.W.)
- Institute of Urban Agriculture, Chinese Academy of Agricultural Sciences, Chengdu 610213, China
| | - Jin Wang
- State Key Laboratory of Crop Genetics and Germplasm Enhancement/Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops in East China, Ministry of Agriculture/Engineering Research Center of Germplasm Enhancement and Utilization of Horticultural Crops, Ministry of Education, Nanjing Agricultural University, Nanjing 210095, China; (G.L.); (J.W.)
| | - Xilin Hou
- State Key Laboratory of Crop Genetics and Germplasm Enhancement/Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops in East China, Ministry of Agriculture/Engineering Research Center of Germplasm Enhancement and Utilization of Horticultural Crops, Ministry of Education, Nanjing Agricultural University, Nanjing 210095, China; (G.L.); (J.W.)
- Correspondence: ; Tel.: +86-025-8439-5917
| |
Collapse
|
14
|
Ma Y, Yan G, Han X, Zhang J, Xiong J, Miao W. Sexual cell cycle initiation is regulated by CDK19 and CYC9 in Tetrahymena thermophila. J Cell Sci 2020; 133:jcs235721. [PMID: 32041901 DOI: 10.1242/jcs.235721] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 01/27/2020] [Indexed: 01/31/2023] Open
Abstract
To investigate the mechanisms underlying initiation of the sexual cell cycle in eukaryotes, we have focused on cyclins and cyclin-dependent kinases (CDKs) in the well-studied model ciliate, Tetrahymena thermophila We identified two genes, CDK19 and CYC9, which are highly co-expressed with the mating-associated factors MTA, MTB and HAP2. Both CDK19 and CYC9 were found to be essential for mating in T. thermophila Subcellular localization experiments suggested that these proteins are located at the oral area, including the conjugation junction area, and that CDK19 or CYC9 knockout prevents mating. We found that CDK19 and CYC9 form a complex, and also identified several additional subunits, which may have regulatory or constitutive functions. RNA sequencing analyses and cytological experiments showed that mating is abnormal in both ΔCDK19 and ΔCYC9, mainly at the entry to the co-stimulation stage. These results indicate that the CDK19-CYC9 complex initiates the sexual cell cycle in T. thermophila.
Collapse
Affiliation(s)
- Yang Ma
- State Key Laboratory of Freshwater Ecology and Biotechnology, Wuhan 430072, China
- Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guanxiong Yan
- State Key Laboratory of Freshwater Ecology and Biotechnology, Wuhan 430072, China
- Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiaojie Han
- Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- College of Fisheries and Life Science, Dalian Ocean University, Dalian 116023, China
| | - Jing Zhang
- Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Jie Xiong
- Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Wei Miao
- State Key Laboratory of Freshwater Ecology and Biotechnology, Wuhan 430072, China
- Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- CAS Center for Excellence in Animal Evolution and Genetics, Kunming 650223, China
| |
Collapse
|
15
|
Han X, Yan G, Ma Y, Miao W, Wang G. Sequencing and characterization of the macronuclear rDNA minichromosome of the protozoan Tetrahymena pyriformis. Int J Biol Macromol 2020; 147:576-581. [PMID: 31931068 DOI: 10.1016/j.ijbiomac.2020.01.063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2019] [Revised: 12/06/2019] [Accepted: 01/06/2020] [Indexed: 10/25/2022]
Abstract
Tetrahymena ribosomal DNA (rDNA) is an ideal system for studying eukaryotic DNA replication and gene transcription. In this study, we developed a new method to isolate rDNA from Tetrahymena cells and used it to sequence and annotate the complete 19,670 bp macronuclear rDNA minichromosome of Tetrahymena pyriformis, a species that lacks the germ-line micronucleus and is unable to undergo sexual reproduction. The key features of T. pyriformis and Tetrahymena thermophila rDNA sequences were then compared. Our results showed (i) the short inverted repeats (M repeats) essential for formation of rDNA minichromosome palindromic structure during sexual reproduction in Tetrahymena are highly conserved in T. pyriformis; (ii) in contrast to T. thermophila, which has two tandem domains that coordinately regulate rDNA replication, T. pyriformis has only a single domain; (iii) the 35S pre-rRNA precursor has 80.25% similarity between the two species; and (iv) the G + C content is higher in the transcribed region than the non-transcribed region in both species, but the GC-skew is more stable in T. pyriformis. The new isolation method and annotated information for the T. pyriformis rDNA minichromosome will provide a useful resource for studying DNA replication and chromosome copy number control in Tetrahymena.
Collapse
Affiliation(s)
- Xiaojie Han
- College of Fisheries and Life Science, Dalian Ocean University, Dalian 116023, China; Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guanxiong Yan
- Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yang Ma
- Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wei Miao
- College of Fisheries and Life Science, Dalian Ocean University, Dalian 116023, China; Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; University of Chinese Academy of Sciences, Beijing 100049, China; State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; CAS Center for Excellence in Animal Evolution and Genetics, Kunming 650223, China
| | - Guangying Wang
- Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.
| |
Collapse
|
16
|
Lemer S, Bieler R, Giribet G. Resolving the relationships of clams and cockles: dense transcriptome sampling drastically improves the bivalve tree of life. Proc Biol Sci 2020; 286:20182684. [PMID: 30963927 PMCID: PMC6408618 DOI: 10.1098/rspb.2018.2684] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Bivalvia has been the subject of extensive recent phylogenetic work to attempt resolving either the backbone of the bivalve tree using transcriptomic data, or the tips using morpho-anatomical data and up to five genetic markers. Yet the first approach lacked decisive taxon sampling and the second failed to resolve many interfamilial relationships, especially within the diverse clade Imparidentia. Here we combine dense taxon sampling with 108 deep-sequenced Illumina-based transcriptomes to provide resolution in nodes that required additional study. We designed specific data matrices to address the poorly resolved relationships within Imparidentia. Our results support the overall backbone of the bivalve tree, the monophyly of Bivalvia and all its main nodes, although the monophyly of Protobranchia remains less clear. Likewise, the inter-relationships of the six main bivalve clades were fully supported. Within Imparidentia, resolution increases when analysing Imparidentia-specific matrices. Lucinidae, Thyasiridae and Gastrochaenida represent three early branches. Gastrochaenida is sister group to all remaining imparidentians, which divide into six orders. Neoheterodontei is always fully supported, and consists of Sphaeriida, Myida and Venerida, with the latter now also containing Mactroidea, Ungulinoidea and Chamidae, a family particularly difficult to place in earlier work. Overall, our study, by using densely sampled transcriptomes, provides the best-resolved bivalve phylogeny to date.
Collapse
Affiliation(s)
- Sarah Lemer
- 1 University of Guam Marine Laboratory , 303 University Drive, UOG Station, Mangilao, GU 96923 , USA.,2 Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University , 26 Oxford Street, Cambridge, MA 02138 , USA
| | - Rüdiger Bieler
- 3 Integrative Research Center, Field Museum of Natural History , 1400 South Lake Shore Drive, Chicago, IL 60605 , USA
| | - Gonzalo Giribet
- 2 Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University , 26 Oxford Street, Cambridge, MA 02138 , USA
| |
Collapse
|
17
|
Grozhik AV, Jaffrey SR. Distinguishing RNA modifications from noise in epitranscriptome maps. Nat Chem Biol 2019; 14:215-225. [PMID: 29443978 DOI: 10.1038/nchembio.2546] [Citation(s) in RCA: 73] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2017] [Accepted: 12/04/2017] [Indexed: 12/26/2022]
Abstract
Messenger RNA (mRNA) and long noncoding RNA (lncRNA) can be subjected to a variety of post-transcriptional modifications that markedly influence their fate and function. This concept of 'epitranscriptomic' modifications and the understanding of their function has been driven by new technologies for transcriptome-wide mapping of modified nucleotides using next-generation sequencing. Mapping technologies have successfully documented the location and prevalence of several modified nucleotides in the transcriptome. However, some mapping methods have led to proposals of pervasive novel RNA modifications that have subsequently been shown to be exceptionally rare. These controversies have resulted in confusion about the identity of the modified nucleotides comprising the epitranscriptome in mRNA and lncRNA. Here we discuss the different transcriptome-wide technologies for mapping modified nucleotides. We describe why these methods can have poor accuracy and specificity. Finally, we describe emerging strategies that minimize false positives and other pitfalls associated with mapping and measuring epitranscriptomic modifications.
Collapse
Affiliation(s)
- Anya V Grozhik
- Department of Pharmacology, Weill Cornell Medicine, Cornell University, New York, New York, USA
| | - Samie R Jaffrey
- Department of Pharmacology, Weill Cornell Medicine, Cornell University, New York, New York, USA
| |
Collapse
|
18
|
Abstract
Identification of differentially expressed genes has been a high priority task of downstream analyses to further advances in biomedical research. Investigators have been faced with an array of issues in dealing with more complicated experiments and metadata, including batch effects, normalization, temporal dynamics (temporally differential expression), and isoform diversity (isoform-level quantification and differential splicing events). To date, there are currently no standard approaches to precisely and efficiently analyze these moderate or large-scale experimental designs, especially with combined metadata. In this report, we propose comprehensive analytical pipelines to precisely characterize temporal dynamics in differential expression of genes and other genomic features, i.e., the variability of transcripts, isoforms and exons, by controlling batch effects and other nuisance factors that could have significant confounding effects on the main effects of interest in comparative models and may result in misleading interpretations.
Collapse
|
19
|
Li Y, Liu H, Giffen KP, Chen L, Beisel KW, He DZZ. Transcriptomes of cochlear inner and outer hair cells from adult mice. Sci Data 2018; 5:180199. [PMID: 30277483 PMCID: PMC6167952 DOI: 10.1038/sdata.2018.199] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 08/02/2018] [Indexed: 01/09/2023] Open
Abstract
Inner hair cells (IHCs) and outer hair cells (OHCs) are the two anatomically and functionally distinct types of mechanosensitive receptor cells in the mammalian cochlea. The molecular mechanisms defining their morphological and functional specializations are largely unclear. As a first step to uncover the underlying mechanisms, we examined the transcriptomes of IHCs and OHCs isolated from adult CBA/J mouse cochleae. One thousand IHCs and OHCs were separately collected using the suction pipette technique. RNA sequencing of IHCs and OHCs was performed and their transcriptomes were analyzed. The results were validated by comparing some IHC and OHC preferentially expressed genes between present study and published microarray-based data as well as by real-time qPCR. Antibody-based immunocytochemistry was used to validate preferential expression of SLC7A14 and DNM3 in IHCs and OHCs. These data are expected to serve as a highly valuable resource for unraveling the molecular mechanisms underlying different biological properties of IHCs and OHCs as well as to provide a road map for future characterization of genes expressed in IHCs and OHCs.
Collapse
Affiliation(s)
- Yi Li
- Department of Otorhinolaryngology, Beijing Tongren Hospital, Beijing Capital Medical University, Beijing 100730, China
- Department of Biomedical Sciences, Creighton University School of Medicine, Omaha, Nebraska 68170, USA
| | - Huizhan Liu
- Department of Biomedical Sciences, Creighton University School of Medicine, Omaha, Nebraska 68170, USA
| | - Kimberlee P. Giffen
- Department of Biomedical Sciences, Creighton University School of Medicine, Omaha, Nebraska 68170, USA
| | - Lei Chen
- Department of Biomedical Sciences, Creighton University School of Medicine, Omaha, Nebraska 68170, USA
- Chongqing Academy of Animal Science, Chongqing 402460, China
| | - Kirk W. Beisel
- Department of Biomedical Sciences, Creighton University School of Medicine, Omaha, Nebraska 68170, USA
| | - David Z. Z. He
- Department of Biomedical Sciences, Creighton University School of Medicine, Omaha, Nebraska 68170, USA
| |
Collapse
|
20
|
Shi X, Wang X, Wang TL, Hilakivi-Clarke L, Clarke R, Xuan J. SparseIso: a novel Bayesian approach to identify alternatively spliced isoforms from RNA-seq data. Bioinformatics 2018; 34:56-63. [PMID: 28968634 DOI: 10.1093/bioinformatics/btx557] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2017] [Accepted: 09/02/2017] [Indexed: 01/01/2023] Open
Abstract
Motivation Recent advances in high-throughput RNA sequencing (RNA-seq) technologies have made it possible to reconstruct the full transcriptome of various types of cells. It is important to accurately assemble transcripts or identify isoforms for an improved understanding of molecular mechanisms in biological systems. Results We have developed a novel Bayesian method, SparseIso, to reliably identify spliced isoforms from RNA-seq data. A spike-and-slab prior is incorporated into the Bayesian model to enforce the sparsity for isoform identification, effectively alleviating the problem of overfitting. A Gibbs sampling procedure is further developed to simultaneously identify and quantify transcripts from RNA-seq data. With the sampling approach, SparseIso estimates the joint distribution of all candidate transcripts, resulting in a significantly improved performance in detecting lowly expressed transcripts and multiple expressed isoforms of genes. Both simulation study and real data analysis have demonstrated that the proposed SparseIso method significantly outperforms existing methods for improved transcript assembly and isoform identification. Availability and implementation The SparseIso package is available at http://github.com/henryxushi/SparseIso. Contact xuan@vt.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xu Shi
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Xiao Wang
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Tian-Li Wang
- Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, USA
| | - Leena Hilakivi-Clarke
- Department of Oncology and Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC 20057, USA
| | - Robert Clarke
- Department of Oncology and Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC 20057, USA
| | - Jianhua Xuan
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| |
Collapse
|
21
|
Segatto ALA, Diesel JF, Loreto ELS, da Rocha JBT. De novo transcriptome assembly of the lobster cockroach Nauphoeta cinerea (Blaberidae). Genet Mol Biol 2018; 41:713-721. [PMID: 30043835 PMCID: PMC6136372 DOI: 10.1590/1678-4685-gmb-2017-0264] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2017] [Accepted: 01/03/2018] [Indexed: 12/17/2022] Open
Abstract
The use of Drosophila as a scientific model is well established, but the use of cockroaches as experimental organisms has been increasing, mainly in toxicology research. Nauphoeta cinerea is one of the species that has been studied, and among its advantages is its easy laboratory maintenance. However, a limited amount of genetic data about N. cinerea is available, impeding gene identification and expression analyses, genetic manipulation, and a deeper understanding of its functional biology. Here we describe the N. cinerea fat body and head transcriptome, in order to provide a database of genetic sequences to better understand the metabolic role of these tissues, and describe detoxification and stress response genes. After removing low-quality sequences, we obtained 62,121 transcripts, of which more than 50% had a length of 604 pb. The assembled sequences were annotated according to their genes ontology (GO). We identified 367 genes related to stress and detoxification; among these, the more frequent were p450 genes. The results presented here are the first large-scale sequencing of N. cinerea and will facilitate the genetic understanding of the species' biochemistry processes in future works.
Collapse
Affiliation(s)
- Ana Lúcia Anversa Segatto
- Departamento de Bioquímica e Biologia Molecular, CCNE, Universidade Federal de Santa Maria, Santa Maria, RS, Brazil
| | - José Francisco Diesel
- Departamento de Bioquímica e Biologia Molecular, CCNE, Universidade Federal de Santa Maria, Santa Maria, RS, Brazil
| | - Elgion Lucio Silva Loreto
- Departamento de Bioquímica e Biologia Molecular, CCNE, Universidade Federal de Santa Maria, Santa Maria, RS, Brazil
| | | |
Collapse
|
22
|
Kallal RJ, Fernández R, Giribet G, Hormiga G. A phylotranscriptomic backbone of the orb-weaving spider family Araneidae (Arachnida, Araneae) supported by multiple methodological approaches. Mol Phylogenet Evol 2018; 126:129-140. [PMID: 29635025 DOI: 10.1016/j.ympev.2018.04.007] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Revised: 03/05/2018] [Accepted: 04/06/2018] [Indexed: 01/01/2023]
Abstract
The orb-weaving spider family Araneidae is extremely diverse (>3100 spp.) and its members can be charismatic terrestrial arthropods, many of them recognizable by their iconic orbicular snare web, such as the common garden spiders. Despite considerable effort to better understand their backbone relationships based on multiple sources of data (morphological, behavioral and molecular), pervasive low support remains in recent studies. In addition, no overarching phylogeny of araneids is available to date, hampering further comparative work. In this study, we analyze the transcriptomes of 33 taxa, including 19 araneids - 12 of them new to this study - representing most of the core family lineages, to examine the relationships within the family using genomic-scale datasets resulting from various methodological treatments, namely ortholog selection and gene occupancy as a measure of matrix completion. Six matrices were constructed to assess these effects by varying orthology inference method and gene occupancy threshold. Orthology methods used are the benchmarking tool BUSCO and the tree-based method UPhO; three gene occupancy thresholds (45%, 65%, 85%) were used to assess the effect of missing data. Gene tree and species tree-based methods (including multi-species coalescent and concatenation approaches, as well as maximum likelihood and Bayesian inference) were used totalling 17 analytical treatments. The monophyly of Araneidae and the placement of core araneid lineages were supported, together with some previously unsound backbone divergences; these include high support for Zygiellinae as the earliest diverging subfamily (followed by Nephilinae), the placement of Gasteracanthinae as sister group to Cyclosa and close relatives, and close relationships between the Araneus + Neoscona clade and Cyrtophorinae + Argiopinae clade. Incongruences were relegated to short branches in the clade comprising Cyclosa and its close relatives. We found congruence between most of the completed analyses, with minimal topological effects from occupancy/missing data and orthology assessment. The resulting number of genes by certain combinations of orthology and occupancy thresholds being analyzed had the greatest effect on the resulting trees, with anomalous outcomes recovered from analysis of lower numbers of genes.
Collapse
Affiliation(s)
- Robert J Kallal
- Department of Biological Sciences, The George Washington University, 2029 G St. NW, Washington, DC 20052, USA.
| | - Rosa Fernández
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford St., Cambridge, MA 02138, USA; Bioinformatics and Genomics Unit, Center for Genomic Regulation, Carrer del Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Gonzalo Giribet
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford St., Cambridge, MA 02138, USA
| | - Gustavo Hormiga
- Department of Biological Sciences, The George Washington University, 2029 G St. NW, Washington, DC 20052, USA
| |
Collapse
|
23
|
Zhang J, Yan G, Tian M, Ma Y, Xiong J, Miao W. A DP-like transcription factor protein interacts with E2fl1 to regulate meiosis in Tetrahymena thermophila. Cell Cycle 2018; 17:634-642. [PMID: 29417875 DOI: 10.1080/15384101.2018.1431595] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Evolutionarily conserved E2F family transcription factors regulate the cell cycle via controlling gene expression in a wide range of eukaryotes. We previously demonstrated that the meiosis-specific transcription factor E2fl1 had an important role in meiosis in the model ciliate Tetrahymena thermophila. Here, we report that expression of another E2F family transcription factor gene DPL2 correlates highly with that of E2FL1. Similar to e2fl1Δ cells, dpl2Δ cells undergo meiotic arrest prior to anaphase I, with the five chromosomes adopting an abnormal tandem arrangement. Immunofluorescence staining and immunoprecipitation experiments demonstrate that Dpl2 and E2fl1 form a complex during meiosis. We previously identified several meiotic regulatory proteins in T. thermophila. Cyc2 and Tcdk3 may cooperate to initiate meiosis and Cyc17 is essential for initiating meiotic anaphase. We investigate the relationship of these regulators with Dpl2 and E2fl1, and then construct a meiotic regulatory network by measuring changes in meiotic genes expression in knockout cells. We conclude that the E2fl1/Dpl2 complex plays a central role in meiosis in T. thermophila.
Collapse
Affiliation(s)
- Jing Zhang
- a Key Laboratory of Aquatic Biodiversity and Conservation , Institute of Hydrobiology , Chinese Academy of Sciences , Wuhan , People's Republic of China.,b University of Chinese Academy of Sciences , Beijing , People's Republic of China
| | - Guanxiong Yan
- a Key Laboratory of Aquatic Biodiversity and Conservation , Institute of Hydrobiology , Chinese Academy of Sciences , Wuhan , People's Republic of China.,b University of Chinese Academy of Sciences , Beijing , People's Republic of China
| | - Miao Tian
- a Key Laboratory of Aquatic Biodiversity and Conservation , Institute of Hydrobiology , Chinese Academy of Sciences , Wuhan , People's Republic of China
| | - Yang Ma
- a Key Laboratory of Aquatic Biodiversity and Conservation , Institute of Hydrobiology , Chinese Academy of Sciences , Wuhan , People's Republic of China.,b University of Chinese Academy of Sciences , Beijing , People's Republic of China
| | - Jie Xiong
- a Key Laboratory of Aquatic Biodiversity and Conservation , Institute of Hydrobiology , Chinese Academy of Sciences , Wuhan , People's Republic of China
| | - Wei Miao
- a Key Laboratory of Aquatic Biodiversity and Conservation , Institute of Hydrobiology , Chinese Academy of Sciences , Wuhan , People's Republic of China
| |
Collapse
|
24
|
Guðbrandsson J, Franzdóttir SR, Kristjánsson BK, Ahi EP, Maier VH, Kapralova KH, Snorrason SS, Jónsson ZO, Pálsson A. Differential gene expression during early development in recently evolved and sympatric Arctic charr morphs. PeerJ 2018; 6:e4345. [PMID: 29441236 PMCID: PMC5807978 DOI: 10.7717/peerj.4345] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Accepted: 01/19/2018] [Indexed: 02/06/2023] Open
Abstract
Phenotypic differences between closely related taxa or populations can arise through genetic variation or be environmentally induced, leading to altered transcription of genes during development. Comparative developmental studies of closely related species or variable populations within species can help to elucidate the molecular mechanisms related to evolutionary divergence and speciation. Studies of Arctic charr (Salvelinus alpinus) and related salmonids have revealed considerable phenotypic variation among populations and in Arctic charr many cases of extensive variation within lakes (resource polymorphism) have been recorded. One example is the four Arctic charr morphs in the ∼10,000 year old Lake Thingvallavatn, which differ in numerous morphological and life history traits. We set out to investigate the molecular and developmental roots of this polymorphism by studying gene expression in embryos of three of the morphs reared in a common garden set-up. We performed RNA-sequencing, de-novo transcriptome assembly and compared gene expression among morphs during an important timeframe in early development, i.e., preceding the formation of key trophic structures. Expectedly, developmental time was the predominant explanatory variable. As the data were affected by some form of RNA-degradation even though all samples passed quality control testing, an estimate of 3'-bias was the second most common explanatory variable. Importantly, morph, both as an independent variable and as interaction with developmental time, affected the expression of numerous transcripts. Transcripts with morph effect, separated the three morphs at the expression level, with the two benthic morphs being more similar. However, Gene Ontology analyses did not reveal clear functional enrichment of transcripts between groups. Verification via qPCR confirmed differential expression of several genes between the morphs, including regulatory genes such as AT-Rich Interaction Domain 4A (arid4a) and translin (tsn). The data are consistent with a scenario where genetic divergence has contributed to differential expression of multiple genes and systems during early development of these sympatric Arctic charr morphs.
Collapse
Affiliation(s)
- Jóhannes Guðbrandsson
- Institute of Life and Environmental Sciences, University of Iceland, Reykjavík, Iceland
- Freshwater Division, Marine and Freshwater Research Institute, Reykjavík, Iceland
| | - Sigríður Rut Franzdóttir
- Institute of Life and Environmental Sciences, University of Iceland, Reykjavík, Iceland
- Biomedical Center, University of Iceland, Reykjavík, Iceland
| | | | - Ehsan Pashay Ahi
- Institute of Life and Environmental Sciences, University of Iceland, Reykjavík, Iceland
- Karl-Franzens-Universität, Graz, Austria
| | - Valerie Helene Maier
- Institute of Life and Environmental Sciences, University of Iceland, Reykjavík, Iceland
- Biomedical Center, University of Iceland, Reykjavík, Iceland
| | | | | | - Zophonías Oddur Jónsson
- Institute of Life and Environmental Sciences, University of Iceland, Reykjavík, Iceland
- Biomedical Center, University of Iceland, Reykjavík, Iceland
| | - Arnar Pálsson
- Institute of Life and Environmental Sciences, University of Iceland, Reykjavík, Iceland
- Biomedical Center, University of Iceland, Reykjavík, Iceland
| |
Collapse
|
25
|
Jin ZB, Li Z, Liu Z, Jiang Y, Cai XB, Wu J. Identification of de novo germline mutations and causal genes for sporadic diseases using trio-based whole-exome/genome sequencing. Biol Rev Camb Philos Soc 2017; 93:1014-1031. [PMID: 29154454 DOI: 10.1111/brv.12383] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Revised: 09/28/2017] [Accepted: 10/10/2017] [Indexed: 12/14/2022]
Abstract
Whole-genome or whole-exome sequencing (WGS/WES) of the affected proband together with normal parents (trio) is commonly adopted to identify de novo germline mutations (DNMs) underlying sporadic cases of various genetic disorders. However, our current knowledge of the occurrence and functional effects of DNMs remains limited and accurately identifying the disease-causing DNM from a group of irrelevant DNMs is complicated. Herein, we provide a general-purpose discussion of important issues related to pathogenic gene identification based on trio-based WGS/WES data. Specifically, the relevance of DNMs to human sporadic diseases, current knowledge of DNM biogenesis mechanisms, and common strategies or software tools used for DNM detection are reviewed, followed by a discussion of pathogenic gene prioritization. In addition, several key factors that may affect DNM identification accuracy and causal gene prioritization are reviewed. Based on recent major advances, this review both sheds light on how trio-based WGS/WES technologies can play a significant role in the identification of DNMs and causal genes for sporadic diseases, and also discusses existing challenges.
Collapse
Affiliation(s)
- Zi-Bing Jin
- Division of Ophthalmic Genetics, The Eye Hospital, School of Ophthalmology & Optometry, Wenzhou Medical University, Wenzhou, 325027, China.,State Key Laboratory of Ophthalmology Optometry and Vision Science, Wenzhou Medical University, Wenzhou, 325027, China
| | - Zhongshan Li
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, 325000, China
| | - Zhenwei Liu
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, 325000, China
| | - Yi Jiang
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, 325000, China
| | - Xue-Bi Cai
- Division of Ophthalmic Genetics, The Eye Hospital, School of Ophthalmology & Optometry, Wenzhou Medical University, Wenzhou, 325027, China.,State Key Laboratory of Ophthalmology Optometry and Vision Science, Wenzhou Medical University, Wenzhou, 325027, China
| | - Jinyu Wu
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, 325000, China
| |
Collapse
|
26
|
Szkop KJ, Nobeli I. Untranslated Parts of Genes Interpreted: Making Heads or Tails of High-Throughput Transcriptomic Data via Computational Methods: Computational methods to discover and quantify isoforms with alternative untranslated regions. Bioessays 2017; 39. [PMID: 29052251 DOI: 10.1002/bies.201700090] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2017] [Revised: 09/12/2017] [Indexed: 01/07/2023]
Abstract
In this review we highlight the importance of defining the untranslated parts of transcripts, and present a number of computational approaches for the discovery and quantification of alternative transcription start and poly-adenylation events in high-throughput transcriptomic data. The fate of eukaryotic transcripts is closely linked to their untranslated regions, which are determined by the position at which transcription starts and ends at a genomic locus. Although the extent of alternative transcription starts and alternative poly-adenylation sites has been revealed by sequencing methods focused on the ends of transcripts, the application of these methods is not yet widely adopted by the community. We suggest that computational methods applied to standard high-throughput technologies are a useful, albeit less accurate, alternative to the expertise-demanding 5' and 3' sequencing and they are the only option for analysing legacy transcriptomic data. We review these methods here, focusing on technical challenges and arguing for the need to include better normalization of the data and more appropriate statistical models of the expected variation in the signal.
Collapse
Affiliation(s)
- Krzysztof J Szkop
- Institute of Structural and Molecular Biology, Department of Biological Sciences Birkbeck, University of London, Malet Street, London WC1E 7HX, UK
| | - Irene Nobeli
- Institute of Structural and Molecular Biology, Department of Biological Sciences Birkbeck, University of London, Malet Street, London WC1E 7HX, UK
| |
Collapse
|
27
|
Tuerk A, Wiktorin G, Güler S. Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates. PLoS Comput Biol 2017; 13:e1005515. [PMID: 28505151 PMCID: PMC5448817 DOI: 10.1371/journal.pcbi.1005515] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Revised: 05/30/2017] [Accepted: 04/12/2017] [Indexed: 12/03/2022] Open
Abstract
Accuracy of transcript quantification with RNA-Seq is negatively affected by positional fragment bias. This article introduces Mix2 (rd. “mixquare”), a transcript quantification method which uses a mixture of probability distributions to model and thereby neutralize the effects of positional fragment bias. The parameters of Mix2 are trained by Expectation Maximization resulting in simultaneous transcript abundance and bias estimates. We compare Mix2 to Cufflinks, RSEM, eXpress and PennSeq; state-of-the-art quantification methods implementing some form of bias correction. On four synthetic biases we show that the accuracy of Mix2 overall exceeds the accuracy of the other methods and that its bias estimates converge to the correct solution. We further evaluate Mix2 on real RNA-Seq data from the Microarray and Sequencing Quality Control (MAQC, SEQC) Consortia. On MAQC data, Mix2 achieves improved correlation to qPCR measurements with a relative increase in R2 between 4% and 50%. Mix2 also yields repeatable concentration estimates across technical replicates with a relative increase in R2 between 8% and 47% and reduced standard deviation across the full concentration range. We further observe more accurate detection of differential expression with a relative increase in true positives between 74% and 378% for 5% false positives. In addition, Mix2 reveals 5 dominant biases in MAQC data deviating from the common assumption of a uniform fragment distribution. On SEQC data, Mix2 yields higher consistency between measured and predicted concentration ratios. A relative error of 20% or less is obtained for 51% of transcripts by Mix2, 40% of transcripts by Cufflinks and RSEM and 30% by eXpress. Titration order consistency is correct for 47% of transcripts for Mix2, 41% for Cufflinks and RSEM and 34% for eXpress. We, further, observe improved repeatability across laboratory sites with a relative increase in R2 between 8% and 44% and reduced standard deviation. RNA-Seq is a powerful tool for detecting and quantifying genes and gene isoforms. However, accurate quantification in genomic loci with multiple isoforms has proven difficult. This is due to the fact that the transcript generating an RNA-Seq fragment cannot be identified if multiple transcripts share the fragment sequence. Due to this ambiguity, transcript concentration is usually determined in a statistical framework by calculating the probability that a transcript generates an RNA-Seq fragment. Accurate estimation of this probability requires an accurate model of the transcript specific distributions of RNA-Seq fragments. However, fragment distributions in statistical models of RNA-Seq data are usually over-simplified. This article introduces the Mix2 (rd. “mixquare”) model which uses mixtures of probability distributions to model the transcript specific positional fragment distributions. Mix2 learns the mixture weights and approximates therefore the fragment bias in RNA-Seq data. We compare Mix2 on artificial and real RNA-Seq data to four state-of-the-art quantification methods. Our experiments show that Mix2 yields more accurate and repeatable quantification estimates and that it leads to more accurate detection of differential expression. We further show that the biases detected by Mix2 contradict the common assumption of a uniform fragment distribution.
Collapse
|
28
|
Tao X, Chen J, Jiang Y, Wei Y, Chen Y, Xu H, Zhu L, Tang G, Li M, Jiang A, Shuai S, Bai L, Liu H, Ma J, Jin L, Wen A, Wang Q, Zhu G, Xie M, Wu J, He T, Huang C, Gao X, Li X. Transcriptome-wide N 6 -methyladenosine methylome profiling of porcine muscle and adipose tissues reveals a potential mechanism for transcriptional regulation and differential methylation pattern. BMC Genomics 2017; 18:336. [PMID: 28454518 PMCID: PMC5410061 DOI: 10.1186/s12864-017-3719-1] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2016] [Accepted: 04/25/2017] [Indexed: 01/10/2023] Open
Abstract
Background N6-methyladenosine (m6A) is the most prevalent internal form of modification in messenger RNA in higher eukaryotes and potential regulatory functions of reversible m6A methylation on mRNA have been revealed by mapping of m6A methylomes in several species. m6A modification in active gene regulation manifests itself as altered methylation profiles in a tissue-specific manner or in response to changing cellular or species living environment. However, up to date, there has no data on m6A porcine transcriptome-wide map and its potential biological roles in adipose deposition and muscle growth. Methods In this work, we used methylated RNA immunoprecipitation with next-generation sequencing (MeRIP-Seq) technique to acquire the first ever m6A porcine transcriptome-wide map. Transcriptomes of muscle and adipose tissues from three different pig breeds, the wild boar, Landrace, and Rongchang pig, were used to generate these maps. Results Our findings show that there were 5,872 and 2,826 m6A peaks respectively, in the porcine muscle and adipose tissue transcriptomes. Stop codons, 3′-untranslated regions, and coding regions were found to be mainly enriched for m6A peaks. Gene ontology analysis revealed that common m6A peaks in nuclear genes are associated with transcriptional factors, suggestive of a relationship between m6A mRNA methylation and nuclear genome transcription. Some genes showed tissue- and breed-differential methylation, and have novel biological functions. We also found a relationship between the m6A methylation extent and the transcript level, suggesting a regulatory role for m6A in gene expression. Conclusion This comprehensive map provides a solid basis for the determination of potential functional roles for RNA m6A modification in adipose deposition and muscle growth. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3719-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xuelian Tao
- Department of Zoology, College of Life Science, Sichuan Agricultural University, No. 46, Xinkang Road, Ya'an City, 625014, Sichuan Province, China
| | - Jianning Chen
- Department of Zoology, College of Life Science, Sichuan Agricultural University, No. 46, Xinkang Road, Ya'an City, 625014, Sichuan Province, China
| | - Yanzhi Jiang
- Department of Zoology, College of Life Science, Sichuan Agricultural University, No. 46, Xinkang Road, Ya'an City, 625014, Sichuan Province, China.
| | - Yingying Wei
- Department of Zoology, College of Life Science, Sichuan Agricultural University, No. 46, Xinkang Road, Ya'an City, 625014, Sichuan Province, China
| | - Yan Chen
- Department of Zoology, College of Life Science, Sichuan Agricultural University, No. 46, Xinkang Road, Ya'an City, 625014, Sichuan Province, China
| | - Huaming Xu
- Department of Zoology, College of Life Science, Sichuan Agricultural University, No. 46, Xinkang Road, Ya'an City, 625014, Sichuan Province, China
| | - Li Zhu
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
| | - Guoqing Tang
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
| | - Mingzhou Li
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
| | - Anan Jiang
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
| | - Surong Shuai
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
| | - Lin Bai
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
| | - Haifeng Liu
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
| | - Jideng Ma
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
| | - Long Jin
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
| | - Anxiang Wen
- Department of Zoology, College of Life Science, Sichuan Agricultural University, No. 46, Xinkang Road, Ya'an City, 625014, Sichuan Province, China
| | - Qin Wang
- Department of Zoology, College of Life Science, Sichuan Agricultural University, No. 46, Xinkang Road, Ya'an City, 625014, Sichuan Province, China
| | - Guangxiang Zhu
- Department of Zoology, College of Life Science, Sichuan Agricultural University, No. 46, Xinkang Road, Ya'an City, 625014, Sichuan Province, China
| | - Meng Xie
- Department of Zoology, College of Life Science, Sichuan Agricultural University, No. 46, Xinkang Road, Ya'an City, 625014, Sichuan Province, China
| | - Jiayun Wu
- Department of Zoology, College of Life Science, Sichuan Agricultural University, No. 46, Xinkang Road, Ya'an City, 625014, Sichuan Province, China
| | - Tao He
- Department of Zoology, College of Life Science, Sichuan Agricultural University, No. 46, Xinkang Road, Ya'an City, 625014, Sichuan Province, China
| | - Chunyu Huang
- Genergy Biological Technology (Shanghai) Company of Limited Liability, Shanghai, 200233, China
| | - Xiang Gao
- Genergy Biological Technology (Shanghai) Company of Limited Liability, Shanghai, 200233, China
| | - Xuewei Li
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China.
| |
Collapse
|
29
|
Johnson M, Purdom E. Clustering of mRNA-Seq data based on alternative splicing patterns. Biostatistics 2017; 18:295-307. [PMID: 27780810 PMCID: PMC6415726 DOI: 10.1093/biostatistics/kxw044] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2016] [Revised: 05/16/2016] [Accepted: 08/12/2016] [Indexed: 01/18/2023] Open
Abstract
Sequencing of messenger RNA (mRNA) can provide estimates of the levels of individual isoforms within the cell. It remains to adapt many standard statistical methods commonly used for analyzing gene expression levels to take advantage of this additional information. One novel question is whether we can find clusters of samples that are distinguished not by their gene expression but by their isoform usage. We propose a novel approach for clustering mRNA-Seq data that identifies such clusters. We show via simulation that our methods are more sensitive to finding clusters based on isoform usage than standard clustering techniques. We demonstrate its performance by finding a technical artifact that resulted in different batches having different isoform usage patterns, and illustrate its usage on several The Cancer Genome Atlas datasets.
Collapse
|
30
|
Prakash C, Haeseler AV. An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile. J Comput Biol 2017; 24:200-212. [PMID: 27661099 PMCID: PMC5346924 DOI: 10.1089/cmb.2016.0096] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing coverage across a genomic feature has been a concern in RNA-seq and is attributed to biases for certain fragments in RNA-seq library preparation and sequencing. To investigate the expected coverage obtained from fragmentation, we develop a simple fragmentation model that is independent of bias from the experimental method and is not specific to the transcript sequence. Essentially, we enumerate all configurations for maximal placement of a given fragment length, F, on transcript length, T, to represent every possible fragmentation pattern, from which we compute the expected coverage profile across a transcript. We extend this model to incorporate general empirical attributes such as read length, fragment length distribution, and number of molecules of the transcript. We further introduce the fragment starting-point, fragment coverage, and read coverage profiles. We find that the expected profiles are not uniform and that factors such as fragment length to transcript length ratio, read length to fragment length ratio, fragment length distribution, and number of molecules influence the variability of coverage across a transcript. Finally, we explore a potential application of the model where, with simulations, we show that it is possible to correctly estimate the transcript copy number for any transcript in the RNA-seq experiment.
Collapse
Affiliation(s)
- Celine Prakash
- Max F. Perutz Laboratories (MFPL), Center for Integrative Bioinformatics Vienna (CIBIV), University of Vienna, Medical University of Vienna, Vienna, Austria
| | - Arndt Von Haeseler
- Max F. Perutz Laboratories (MFPL), Center for Integrative Bioinformatics Vienna (CIBIV), University of Vienna, Medical University of Vienna, Vienna, Austria
- Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Vienna, Austria
| |
Collapse
|
31
|
Zhang J, Tian M, Yan GX, Shodhan A, Miao W. E2fl1 is a meiosis-specific transcription factor in the protist Tetrahymena thermophila. Cell Cycle 2016; 16:123-135. [PMID: 27892792 DOI: 10.1080/15384101.2016.1259779] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Members of the E2F family of transcription factors have been reported to regulate the expression of genes involved in cell cycle control, DNA replication, and DNA repair in multicellular eukaryotes. Here, E2FL1, a meiosis-specific E2F transcription factor gene, was identified in the model ciliate Tetrahymena thermophila. Loss of this gene resulted in meiotic arrest prior to anaphase I. The cytological experiments revealed that the meiotic homologous pairing was not affected in the absence of E2FL1, but the paired homologous chromosomes did not separate and assumed a peculiar tandem arrangement. This is the first time that an E2F family member has been shown to regulate meiotic events. Moreover, BrdU incorporation showed that DSB processing during meiosis was abnormal upon the deletion of E2FL1. Transcriptome sequencing analysis revealed that E2FL1 knockout decreased the expression of genes involved in DNA replication and DNA repair in T. thermophila, suggesting that the function of E2F is highly conserved in eukaryotes. In addition, E2FL1 deletion inhibited the expression of related homologous chromosome segregation genes in T. thermophila. The result may explain the meiotic arrest phenotype at anaphase I. Finally, by searching for E2F DNA-binding motifs in the entire T. thermophila genome, we identified 714 genes containing at least one E2F DNA-binding motif; of these, 235 downregulated represent putative E2FL1 target genes.
Collapse
Affiliation(s)
- Jing Zhang
- a Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences , Wuhan , People's Republic of China.,b University of Chinese Academy of Sciences , Beijing , People's Republic of China
| | - Miao Tian
- a Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences , Wuhan , People's Republic of China.,c Department of Chromosome Biology and Max F. Perutz Laboratories, Center for Molecular Biology, University of Vienna , Vienna , Austria
| | - Guan-Xiong Yan
- a Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences , Wuhan , People's Republic of China.,b University of Chinese Academy of Sciences , Beijing , People's Republic of China
| | - Anura Shodhan
- c Department of Chromosome Biology and Max F. Perutz Laboratories, Center for Molecular Biology, University of Vienna , Vienna , Austria
| | - Wei Miao
- a Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences , Wuhan , People's Republic of China
| |
Collapse
|
32
|
Archer N, Walsh MD, Shahrezaei V, Hebenstreit D. Modeling Enzyme Processivity Reveals that RNA-Seq Libraries Are Biased in Characteristic and Correctable Ways. Cell Syst 2016; 3:467-479.e12. [PMID: 27840077 PMCID: PMC5167349 DOI: 10.1016/j.cels.2016.10.012] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Revised: 07/28/2016] [Accepted: 10/13/2016] [Indexed: 12/22/2022]
Abstract
Experimental procedures for preparing RNA-seq and single-cell (sc) RNA-seq libraries are based on assumptions regarding their underlying enzymatic reactions. Here, we show that the fairness of these assumptions varies within libraries: coverage by sequencing reads along and between transcripts exhibits characteristic, protocol-dependent biases. To understand the mechanistic basis of this bias, we present an integrated modeling framework that infers the relationship between enzyme reactions during library preparation and the characteristic coverage patterns observed for different protocols. Analysis of new and existing (sc)RNA-seq data from six different library preparation protocols reveals that polymerase processivity is the mechanistic origin of coverage biases. We apply our framework to demonstrate that lowering incubation temperature increases processivity, yield, and (sc)RNA-seq sensitivity in all protocols. We also provide correction factors based on our model for increasing accuracy of transcript quantification in existing samples prepared at standard temperatures. In total, our findings improve our ability to accurately reflect in vivo transcript abundances in (sc)RNA-seq libraries. Characterization of global RNA-seq biases specific to library preparation protocols Mathematical framework to reverse engineer enzyme reactions that cause bias Insights from reverse engineering allow optimization of RNA-seq protocols Lowered incubation temperatures during library preparation improve sensitivity
Collapse
Affiliation(s)
- Nathan Archer
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK
| | - Mark D Walsh
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK
| | - Vahid Shahrezaei
- Department of Mathematics, Imperial College, London SW7 2AZ, UK.
| | | |
Collapse
|
33
|
Fernández R, Edgecombe GD, Giribet G. Exploring Phylogenetic Relationships within Myriapoda and the Effects of Matrix Composition and Occupancy on Phylogenomic Reconstruction. Syst Biol 2016; 65:871-89. [PMID: 27162151 PMCID: PMC4997009 DOI: 10.1093/sysbio/syw041] [Citation(s) in RCA: 74] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2015] [Accepted: 04/28/2016] [Indexed: 11/14/2022] Open
Abstract
Myriapods, including the diverse and familiar centipedes and millipedes, are one of the dominant terrestrial arthropod groups. Although molecular evidence has shown that Myriapoda is monophyletic, its internal phylogeny remains contentious and understudied, especially when compared to those of Chelicerata and Hexapoda. Until now, efforts have focused on taxon sampling (e.g., by including a handful of genes from many species) or on maximizing matrix size (e.g., by including hundreds or thousands of genes in just a few species), but a phylogeny maximizing sampling at both levels remains elusive. In this study, we analyzed 40 Illumina transcriptomes representing 3 of the 4 myriapod classes (Diplopoda, Chilopoda, and Symphyla); 25 transcriptomes were newly sequenced to maximize representation at the ordinal level in Diplopoda and at the family level in Chilopoda. Ten supermatrices were constructed to explore the effect of several potential phylogenetic biases (e.g., rate of evolution, heterotachy) at 3 levels of gene occupancy per taxon (50%, 75%, and 90%). Analyses based on maximum likelihood and Bayesian mixture models retrieved monophyly of each myriapod class, and resulted in 2 alternative phylogenetic positions for Symphyla, as sister group to Diplopoda + Chilopoda, or closer to Diplopoda, the latter hypothesis having been traditionally supported by morphology. Within centipedes, all orders were well supported, but 2 deep nodes remained in conflict in the different analyses despite dense taxon sampling at the family level. Relationships among centipede orders in all analyses conducted with the most complete matrix (90% occupancy) are at odds not only with the sparser but more gene-rich supermatrices (75% and 50% supermatrices) and with the matrices optimizing phylogenetic informativeness or most conserved genes, but also with previous hypotheses based on morphology, development, or other molecular data sets. Our results indicate that a high percentage of ribosomal proteins in the most complete matrices, in conjunction with distance from the root, can act in concert to compromise the estimated relationships within the ingroup. We discuss the implications of these findings in the context of the ever more prevalent quest for completeness in phylogenomic studies.
Collapse
Affiliation(s)
- Rosa Fernández
- Museum of Comparative Zoology & Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
| | - Gregory D Edgecombe
- Department of Earth Sciences, The Natural History Museum, Cromwell Road, London SW7 5BD, UK
| | - Gonzalo Giribet
- Museum of Comparative Zoology & Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
| |
Collapse
|
34
|
Yan GX, Zhang J, Shodhan A, Tian M, Miao W. Cdk3, a conjugation-specific cyclin-dependent kinase, is essential for the initiation of meiosis in Tetrahymena thermophila. Cell Cycle 2016; 15:2506-14. [PMID: 27420775 DOI: 10.1080/15384101.2016.1207838] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Meiosis is an important process in sexual reproduction. Meiosis initiation has been found to be highly diverse among species. In yeast, it has been established that cyclin-dependent kinases (Cdks) and cyclins are essential components in the meiosis initiation pathway. In this study, we identified 4 Cdks in the model ciliate, Tetrahymena thermophila, and we found one of them, Cdk3, which is specifically expressed during early conjugation, to be essential for meiosis initiation. Cdk3 deletion led to arrest at the pair formation stage of conjugation. We then confirmed that Cdk3 acts upstream of double-strand break (DSB) formation. Moreover, we detected that Cdk3 is necessary for the expression of many genes involved in early meiotic events. Through proteomic quantification of phosphorylation, co-expression analysis and RNA-Seq analyses, we identified a conjugation-specific cyclin, Cyc2, which most likely partners with Cdk3 to initiate meiosis.
Collapse
Affiliation(s)
- Guan-Xiong Yan
- a Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences , Wuhan , People's Republic of China.,b University of Chinese Academy of Sciences , Beijing , People's Republic of China
| | - Jing Zhang
- a Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences , Wuhan , People's Republic of China.,b University of Chinese Academy of Sciences , Beijing , People's Republic of China
| | - Anura Shodhan
- c Department of Chromosome Biology and Max F. Perutz Laboratories , Center for Molecular Biology, University of Vienna , Vienna , Austria
| | - Miao Tian
- a Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences , Wuhan , People's Republic of China.,c Department of Chromosome Biology and Max F. Perutz Laboratories , Center for Molecular Biology, University of Vienna , Vienna , Austria
| | - Wei Miao
- a Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences , Wuhan , People's Republic of China
| |
Collapse
|
35
|
Schuierer S, Roma G. The exon quantification pipeline (EQP): a comprehensive approach to the quantification of gene, exon and junction expression from RNA-seq data. Nucleic Acids Res 2016; 44:e132. [PMID: 27302131 PMCID: PMC5027495 DOI: 10.1093/nar/gkw538] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Accepted: 06/04/2016] [Indexed: 01/24/2023] Open
Abstract
The quantification of transcriptomic features is the basis of the analysis of RNA-seq data. We present an integrated alignment workflow and a simple counting-based approach to derive estimates for gene, exon and exon–exon junction expression. In contrast to previous counting-based approaches, EQP takes into account only reads whose alignment pattern agrees with the splicing pattern of the features of interest. This leads to improved gene expression estimates as well as to the generation of exon counts that allow disambiguating reads between overlapping exons. Unlike other methods that quantify skipped introns, EQP offers a novel way to compute junction counts based on the agreement of the read alignments with the exons on both sides of the junction, thus providing a uniformly derived set of counts. We evaluated the performance of EQP on both simulated and real Illumina RNA-seq data and compared it with other quantification tools. Our results suggest that EQP provides superior gene expression estimates and we illustrate the advantages of EQP's exon and junction counts. The provision of uniformly derived high-quality counts makes EQP an ideal quantification tool for differential expression and differential splicing studies. EQP is freely available for download at https://github.com/Novartis/EQP-cluster.
Collapse
Affiliation(s)
- Sven Schuierer
- Novartis Institutes for Biomedical Research, CH-4056 Basel, Switzerland
| | - Guglielmo Roma
- Novartis Institutes for Biomedical Research, CH-4056 Basel, Switzerland
| |
Collapse
|
36
|
Yan GX, Dang H, Tian M, Zhang J, Shodhan A, Ning YZ, Xiong J, Miao W. Cyc17, a meiosis-specific cyclin, is essential for anaphase initiation and chromosome segregation in Tetrahymena thermophila. Cell Cycle 2016; 15:1855-64. [PMID: 27192402 DOI: 10.1080/15384101.2016.1188238] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Although the role of cyclins in controlling nuclear division is well established, their function in ciliate meiosis remains unknown. In ciliates, the cyclin family has undergone massive expansion which suggests that diverse cell cycle systems exist, and this warrants further investigation. A screen for cyclins in the model ciliate Tetrahymena thermophila showed that there are 34 cyclins in this organism. Only 1 cyclin, Cyc17, contains the complete cyclin core and is specifically expressed during meiosis. Deletion of CYC17 led to meiotic arrest at the diakinesis-like metaphase I stage. Expression of genes involved in DNA metabolism and chromosome organization (chromatin remodeling and basic chromosomal structure) was repressed in cyc17 knockout matings. Further investigation suggested that Cyc17 is involved in regulating spindle pole attachment, and is thus essential for chromosome segregation at meiosis. These findings suggest a simple model in which chromosome segregation is influenced by Cyc17.
Collapse
Affiliation(s)
- Guan-Xiong Yan
- a Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences , Wuhan , People's Republic of China.,b University of Chinese Academy of Sciences , Beijing , People's Republic of China
| | - Huai Dang
- c College of Life Sciences, Northwest Normal University , Lanzhou , People's Republic of China
| | - Miao Tian
- a Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences , Wuhan , People's Republic of China.,d Department of Chromosome Biology and Max F. Perutz Laboratories , Center for Molecular Biology, University of Vienna , Vienna , Austria
| | - Jing Zhang
- a Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences , Wuhan , People's Republic of China.,b University of Chinese Academy of Sciences , Beijing , People's Republic of China
| | - Anura Shodhan
- d Department of Chromosome Biology and Max F. Perutz Laboratories , Center for Molecular Biology, University of Vienna , Vienna , Austria
| | - Ying-Zhi Ning
- c College of Life Sciences, Northwest Normal University , Lanzhou , People's Republic of China
| | - Jie Xiong
- a Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences , Wuhan , People's Republic of China
| | - Wei Miao
- a Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences , Wuhan , People's Republic of China
| |
Collapse
|
37
|
The A, C, G, and T of Genome Assembly. BIOMED RESEARCH INTERNATIONAL 2016; 2016:6329217. [PMID: 27247941 PMCID: PMC4877455 DOI: 10.1155/2016/6329217] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Accepted: 12/22/2015] [Indexed: 11/18/2022]
Abstract
Genome assembly in its two decades of history has produced significant research, in terms of both biotechnology and computational biology. This contribution delineates sequencing platforms and their characteristics, examines key steps involved in filtering and processing raw data, explains assembly frameworks, and discusses quality statistics for the assessment of the assembled sequence. Furthermore, the paper explores recent Ubuntu-based software environments oriented towards genome assembly as well as some avenues for future research.
Collapse
|
38
|
PBSeq: Modeling base-level bias to estimate gene and isoform expression for RNA-seq data. INT J MACH LEARN CYB 2016. [DOI: 10.1007/s13042-016-0497-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
39
|
Zhang J, Wei Z. An empirical Bayes change-point model for identifying 3' and 5' alternative splicing by next-generation RNA sequencing. Bioinformatics 2016; 32:1823-31. [PMID: 26873932 DOI: 10.1093/bioinformatics/btw060] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Accepted: 01/19/2016] [Indexed: 01/08/2023] Open
Abstract
MOTIVATION Next-generation RNA sequencing (RNA-seq) has been widely used to investigate alternative isoform regulations. Among them, alternative 3 ': splice site (SS) and 5 ': SS account for more than 30% of all alternative splicing (AS) events in higher eukaryotes. Recent studies have revealed that they play important roles in building complex organisms and have a critical impact on biological functions which could cause disease. Quite a few analytical methods have been developed to facilitate alternative 3 ': SS and 5 ': SS studies using RNA-seq data. However, these methods have various limitations and their performances may be further improved. RESULTS We propose an empirical Bayes change-point model to identify alternative 3 ': SS and 5 ': SS. Compared with previous methods, our approach has several unique merits. First of all, our model does not rely on annotation information. Instead, it provides for the first time a systematic framework to integrate various information when available, in particular the useful junction read information, in order to obtain better performance. Second, we utilize an empirical Bayes model to efficiently pool information across genes to improve detection efficiency. Third, we provide a flexible testing framework in which the user can choose to address different levels of questions, namely, whether alternative 3 ': SS or 5 ': SS happens, and/or where it happens. Simulation studies and real data application have demonstrated that our method is powerful and accurate. AVAILABILITY AND IMPLEMENTATION The software is implemented in Java and can be freely downloaded from http://ebchangepoint.sourceforge.net/ CONTACT zhiwei@njit.edu.
Collapse
Affiliation(s)
- Jie Zhang
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA
| | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA
| |
Collapse
|
40
|
Hu X, Wu Y, Lu ZJ, Yip KY. Analysis of sequencing data for probing RNA secondary structures and protein–RNA binding in studying posttranscriptional regulations. Brief Bioinform 2015; 17:1032-1043. [DOI: 10.1093/bib/bbv106] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Revised: 10/11/2015] [Indexed: 11/12/2022] Open
|
41
|
Liu X, Shi X, Chen C, Zhang L. Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate. BMC Bioinformatics 2015; 16:332. [PMID: 26475308 PMCID: PMC4609108 DOI: 10.1186/s12859-015-0750-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Accepted: 09/24/2015] [Indexed: 12/05/2022] Open
Abstract
Background The high-throughput sequencing technology, RNA-Seq, has been widely used to quantify gene and isoform expression in the study of transcriptome in recent years. Accurate expression measurement from the millions or billions of short generated reads is obstructed by difficulties. One is ambiguous mapping of reads to reference transcriptome caused by alternative splicing. This increases the uncertainty in estimating isoform expression. The other is non-uniformity of read distribution along the reference transcriptome due to positional, sequencing, mappability and other undiscovered sources of biases. This violates the uniform assumption of read distribution for many expression calculation approaches, such as the direct RPKM calculation and Poisson-based models. Many methods have been proposed to address these difficulties. Some approaches employ latent variable models to discover the underlying pattern of read sequencing. However, most of these methods make bias correction based on surrounding sequence contents and share the bias models by all genes. They therefore cannot estimate gene- and isoform-specific biases as revealed by recent studies. Results We propose a latent variable model, NLDMseq, to estimate gene and isoform expression. Our method adopts latent variables to model the unknown isoforms, from which reads originate, and the underlying percentage of multiple spliced variants. The isoform- and exon-specific read sequencing biases are modeled to account for the non-uniformity of read distribution, and are identified by utilizing the replicate information of multiple lanes of a single library run. We employ simulation and real data to verify the performance of our method in terms of accuracy in the calculation of gene and isoform expression. Results show that NLDMseq obtains competitive gene and isoform expression compared to popular alternatives. Finally, the proposed method is applied to the detection of differential expression (DE) to show its usefulness in the downstream analysis. Conclusions The proposed NLDMseq method provides an approach to accurately estimate gene and isoform expression from RNA-Seq data by modeling the isoform- and exon-specific read sequencing biases. It makes use of a latent variable model to discover the hidden pattern of read sequencing. We have shown that it works well in both simulations and real datasets, and has competitive performance compared to popular methods. The method has been implemented as a freely available software which can be found at https://github.com/PUGEA/NLDMseq. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0750-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xuejun Liu
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, 29 Jiangjun Rd., Nanjing, 211106, China.
| | - Xinxin Shi
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, 29 Jiangjun Rd., Nanjing, 211106, China.
| | - Chunlin Chen
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, 29 Jiangjun Rd., Nanjing, 211106, China.
| | - Li Zhang
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, 29 Jiangjun Rd., Nanjing, 211106, China.
| |
Collapse
|
42
|
Liu X, Zhang L, Chen S. Modeling Exon-Specific Bias Distribution Improves the Analysis of RNA-Seq Data. PLoS One 2015; 10:e0140032. [PMID: 26448625 PMCID: PMC4598124 DOI: 10.1371/journal.pone.0140032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2015] [Accepted: 09/21/2015] [Indexed: 11/29/2022] Open
Abstract
RNA-seq technology has become an important tool for quantifying the gene and transcript expression in transcriptome study. The two major difficulties for the gene and transcript expression quantification are the read mapping ambiguity and the overdispersion of the read distribution along reference sequence. Many approaches have been proposed to deal with these difficulties. A number of existing methods use Poisson distribution to model the read counts and this easily splits the counts into the contributions from multiple transcripts. Meanwhile, various solutions were put forward to account for the overdispersion in the Poisson models. By checking the similarities among the variation patterns of read counts for individual genes, we found that the count variation is exon-specific and has the conserved pattern across the samples for each individual gene. We introduce Gamma-distributed latent variables to model the read sequencing preference for each exon. These variables are embedded to the rate parameter of a Poisson model to account for the overdispersion of read distribution. The model is tractable since the Gamma priors can be integrated out in the maximum likelihood estimation. We evaluate the proposed approach, PGseq, using four real datasets and one simulated dataset, and compare its performance with other popular methods. Results show that PGseq presents competitive performance compared to other alternatives in terms of accuracy in the gene and transcript expression calculation and in the downstream differential expression analysis. Especially, we show the advantage of our method in the analysis of low expression.
Collapse
Affiliation(s)
- Xuejun Liu
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
- * E-mail:
| | - Li Zhang
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | - Songcan Chen
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| |
Collapse
|
43
|
Feng H, Zhang X, Zhang C. mRIN for direct assessment of genome-wide and gene-specific mRNA integrity from large-scale RNA-sequencing data. Nat Commun 2015; 6:7816. [PMID: 26234653 PMCID: PMC4523900 DOI: 10.1038/ncomms8816] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2015] [Accepted: 06/15/2015] [Indexed: 02/05/2023] Open
Abstract
The volume of RNA-Seq data sets in public repositories has been expanding exponentially, providing unprecedented opportunities to study gene expression regulation. Because degraded RNA samples, such as those collected from post-mortem tissues, can result in distinct expression profiles with potential biases, a particularly important step in mining these data is quality control. Here we develop a method named mRIN to directly assess mRNA integrity from RNA-Seq data at the sample and individual gene level. We systematically analyse large-scale RNA-Seq data sets of the human brain transcriptome generated by different consortia. Our analysis demonstrates that 3′ bias resulting from partial RNA fragmentation in post-mortem tissues has a marked impact on global expression profiles, and that mRIN effectively identifies samples with different levels of mRNA degradation. Unexpectedly, this process has a reproducible and gene-specific component, and transcripts with different stabilities are associated with distinct functions and structural features reminiscent of mRNA decay in living cells. With the rapid increase in the volume of publically available RNA-seq data, quality control is an increasingly important consideration. Here Feng et al. develop mRIN, a method to directly assess mRNA integrity, and show that RNA degradation in post-mortem samples has a strong impact on global expression profiles.
Collapse
Affiliation(s)
- Huijuan Feng
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, China.,Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, New York 10032, USA
| | - Xuegong Zhang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, China
| | - Chaolin Zhang
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, New York 10032, USA
| |
Collapse
|
44
|
Guo J, Cheng G, Gou XY, Xing F, Li S, Han YC, Wang L, Song JM, Shu CC, Chen SW, Chen LL. Comprehensive transcriptome and improved genome annotation ofBacillus licheniformisWX-02. FEBS Lett 2015; 589:2372-81. [DOI: 10.1016/j.febslet.2015.07.029] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2015] [Revised: 07/11/2015] [Accepted: 07/20/2015] [Indexed: 01/10/2023]
|
45
|
Oh S. How are Bayesian and Non-Parametric Methods Doing a Great Job in RNA-Seq Differential Expression Analysis? : A Review. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS 2015. [DOI: 10.5351/csam.2015.22.2.181] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Sunghee Oh
- Department of Veterinary Medicine, Jeju National University, Korea
| |
Collapse
|
46
|
Tasnim M, Ma S, Yang EW, Jiang T, Li W. Accurate inference of isoforms from multiple sample RNA-Seq data. BMC Genomics 2015; 16 Suppl 2:S15. [PMID: 25708199 PMCID: PMC4331715 DOI: 10.1186/1471-2164-16-s2-s15] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND RNA-Seq based transcriptome assembly has become a fundamental technique for studying expressed mRNAs (i.e., transcripts or isoforms) in a cell using high-throughput sequencing technologies, and is serving as a basis to analyze the structural and quantitative differences of expressed isoforms between samples. However, the current transcriptome assembly algorithms are not specifically designed to handle large amounts of errors that are inherent in real RNA-Seq datasets, especially those involving multiple samples, making downstream differential analysis applications difficult. On the other hand, multiple sample RNA-Seq datasets may provide more information than single sample datasets that can be utilized to improve the performance of transcriptome assembly and abundance estimation, but such information remains overlooked by the existing assembly tools. RESULTS We formulate a computational framework of transcriptome assembly that is capable of handling noisy RNA-Seq reads and multiple sample RNA-Seq datasets efficiently. We show that finding an optimal solution under this framework is an NP-hard problem. Instead, we develop an efficient heuristic algorithm, called Iterative Shortest Path (ISP), based on linear programming (LP) and integer linear programming (ILP). Our preliminary experimental results on both simulated and real datasets and comparison with the existing assembly tools demonstrate that (i) the ISP algorithm is able to assemble transcriptomes with a greatly increased precision while keeping the same level of sensitivity, especially when many samples are involved, and (ii) its assembly results help improve downstream differential analysis. The source code of ISP is freely available at http://alumni.cs.ucr.edu/~liw/isp.html.
Collapse
Affiliation(s)
- Masruba Tasnim
- Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA, 92507, USA
| | - Shining Ma
- Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA, 92507, USA
- MOE Key Lab of Bioinformatics and Bioinformatics Division, TNLIST / Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Ei-Wen Yang
- Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA, 92507, USA
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA, 92507, USA
- MOE Key Lab of Bioinformatics and Bioinformatics Division, TNLIST / Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Wei Li
- Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA, 92507, USA
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, MA, 02215, USA
| |
Collapse
|
47
|
Sun H, Yang S, Tun L, Li Y. IAOseq: inferring abundance of overlapping genes using RNA-seq data. BMC Bioinformatics 2015; 16 Suppl 1:S3. [PMID: 25707673 PMCID: PMC4331702 DOI: 10.1186/1471-2105-16-s1-s3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Overlapping transcription constitutes a common mechanism for regulating gene expression. A major limitation of the overlapping transcription assays is the lack of high throughput expression data. RESULTS We developed a new tool (IAOseq) that is based on reads distributions along the transcribed regions to identify the expression levels of overlapping genes from standard RNA-seq data. Compared with five commonly used quantification methods, IAOseq showed better performance in the estimation accuracy of overlapping transcription levels. For the same strand overlapping transcription, currently existing high-throughput methods are rarely available to distinguish which strand was present in the original mRNA template. The IAOseq results showed that the commonly used methods gave an average of 1.6 fold overestimation of the expression levels of same strand overlapping genes. CONCLUSIONS This work provides a useful tool for mining overlapping transcription levels from standard RNA-seq libraries. IAOseq could be used to help us understand the complex regulatory mechanism mediated by overlapping transcripts. IAOseq is freely available at http://lifecenter.sgst.cn/main/en/IAO_seq.jsp.
Collapse
|
48
|
Zhang J, Kuo CCJ, Chen L. WemIQ: an accurate and robust isoform quantification method for RNA-seq data. ACTA ACUST UNITED AC 2014; 31:878-85. [PMID: 25406327 DOI: 10.1093/bioinformatics/btu757] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION The deconvolution of isoform expression from RNA-seq remains challenging because of non-uniform read sampling and subtle differences among isoforms. RESULTS We present a weighted-log-likelihood expectation maximization method on isoform quantification (WemIQ). WemIQ integrates an effective bias removal with a weighted expectation maximization (EM) algorithm to distribute reads among isoforms efficiently. The weight represents the oversampling or undersampling of sequence reads and is estimated through a generalized Poisson model without any presumption on the bias sources and formats. WemIQ significantly improves the quantification of isoform and gene expression as well as the derived exon inclusion rates. It provides robust expression estimates across different laboratories and protocols, which is valuable for the integrative analysis of RNA-seq. For the recent single-cell RNA-seq data, WemIQ also provides the opportunity to distinguish bias heterogeneity from true biological heterogeneity and uncovers smaller cell-to-cell expression variability.
Collapse
Affiliation(s)
- Jing Zhang
- Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089, USA and Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - C-C Jay Kuo
- Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089, USA and Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Liang Chen
- Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089, USA and Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
49
|
Lemer S, Kawauchi GY, Andrade SCS, González VL, Boyle MJ, Giribet G. Re-evaluating the phylogeny of Sipuncula through transcriptomics. Mol Phylogenet Evol 2014; 83:174-83. [PMID: 25450098 DOI: 10.1016/j.ympev.2014.10.019] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2014] [Revised: 10/17/2014] [Accepted: 10/23/2014] [Indexed: 01/28/2023]
Abstract
Sipunculans (also known as peanut worms) are an ancient group of exclusively marine worms with a global distribution and a fossil record that dates back to the Early Cambrian. The systematics of sipunculans, now considered a distinct subclade of Annelida, has been studied for decades using morphological and molecular characters, and has reached the limits of Sanger-based approaches. Here, we reevaluate their family-level phylogeny by comparative transcriptomic analysis of eight species representing all known families within Sipuncula. Two data matrices with alternative gene occupancy levels (large matrix with 675 genes and 62% missing data; reduced matrix with 141 genes and 23% missing data) were analysed using concatenation and gene-tree methods, yielding congruent results and resolving each internal node with maximum support. We thus corroborate prior phylogenetic work based on molecular data, resolve outstanding issues with respect to the familial relationships of Aspidosiphonidae, Antillesomatidae and Phascolosomatidae, and highlight the next area of focus for sipunculan systematics.
Collapse
Affiliation(s)
- Sarah Lemer
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA.
| | - Gisele Y Kawauchi
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA; CEBIMar, Universidade de São Paulo, Praia do Cabelo Gordo, São Sebastião, São Paulo, Brazil
| | - Sónia C S Andrade
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA; Departamento de Zootecnia, ESALQ-USP, Piracicaba, São Paulo, Brazil
| | - Vanessa L González
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA; Department of Invertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20013, USA
| | - Michael J Boyle
- Smithsonian Tropical Research Institute (STRI), Naos Marine Laboratories, Panama 0843/03092, Panama
| | - Gonzalo Giribet
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
| |
Collapse
|
50
|
Gu J, Wang X, Halakivi-Clarke L, Clarke R, Xuan J. BADGE: a novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data. BMC Bioinformatics 2014; 15 Suppl 9:S6. [PMID: 25252852 PMCID: PMC4168709 DOI: 10.1186/1471-2105-15-s9-s6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Background Recent advances in RNA sequencing (RNA-Seq) technology have offered unprecedented scope and resolution for transcriptome analysis. However, precise quantification of mRNA abundance and identification of differentially expressed genes are complicated due to biological and technical variations in RNA-Seq data. Results We systematically study the variation in count data and dissect the sources of variation into between-sample variation and within-sample variation. A novel Bayesian framework is developed for joint estimate of gene level mRNA abundance and differential state, which models the intrinsic variability in RNA-Seq to improve the estimation. Specifically, a Poisson-Lognormal model is incorporated into the Bayesian framework to model within-sample variation; a Gamma-Gamma model is then used to model between-sample variation, which accounts for over-dispersion of read counts among multiple samples. Simulation studies, where sequencing counts are synthesized based on parameters learned from real datasets, have demonstrated the advantage of the proposed method in both quantification of mRNA abundance and identification of differentially expressed genes. Moreover, performance comparison on data from the Sequencing Quality Control (SEQC) Project with ERCC spike-in controls has shown that the proposed method outperforms existing RNA-Seq methods in differential analysis. Application on breast cancer dataset has further illustrated that the proposed Bayesian model can 'blindly' estimate sources of variation caused by sequencing biases. Conclusions We have developed a novel Bayesian hierarchical approach to investigate within-sample and between-sample variations in RNA-Seq data. Simulation and real data applications have validated desirable performance of the proposed method. The software package is available at http://www.cbil.ece.vt.edu/software.htm.
Collapse
|