101
|
Tsagiopoulou M, Maniou MC, Pechlivanis N, Togkousidis A, Kotrová M, Hutzenlaub T, Kappas I, Chatzidimitriou A, Psomopoulos F. UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction. Front Genet 2021; 12:660366. [PMID: 34122513 PMCID: PMC8193862 DOI: 10.3389/fgene.2021.660366] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 04/08/2021] [Indexed: 11/17/2022] Open
Abstract
A recent refinement in high-throughput sequencing involves the incorporation of unique molecular identifiers (UMIs), which are random oligonucleotide barcodes, on the library preparation steps. A UMI adds a unique identity to different DNA/RNA input molecules through polymerase chain reaction (PCR) amplification, thus reducing bias of this step. Here, we propose an alignment free framework serving as a preprocessing step of fastq files, called UMIc, for deduplication and correction of reads building consensus sequences from each UMI. Our approach takes into account the frequency and the Phred quality of nucleotides and the distances between the UMIs and the actual sequences. We have tested the tool using different scenarios of UMI-tagged library data, having in mind the aspect of a wide application. UMIc is an open-source tool implemented in R and is freely available from https://github.com/BiodataAnalysisGroup/UMIc.
Collapse
Affiliation(s)
- Maria Tsagiopoulou
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Maria Christina Maniou
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Nikolaos Pechlivanis
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
- Department of Genetics, Development and Molecular Biology, School of Biology, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Anastasis Togkousidis
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Michaela Kotrová
- Unit for Hematological Diagnostics, Department of Internal Medicine II, University Medical Center Schleswig-Holstein, Kiel, Germany
| | - Tobias Hutzenlaub
- Laboratory for MEMS Applications, IMTEK-Department of Microsystems Engineering, University of Freiburg, Freiburg, Germany
- Hahn-Schickard, Freiburg, Germany
| | - Ilias Kappas
- Department of Genetics, Development and Molecular Biology, School of Biology, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | | | - Fotis Psomopoulos
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| |
Collapse
|
102
|
Gilis J, Vitting-Seerup K, Van den Berge K, Clement L. satuRn: Scalable analysis of differential transcript usage for bulk and single-cell RNA-sequencing applications. F1000Res 2021; 10:374. [PMID: 36762203 PMCID: PMC9892655 DOI: 10.12688/f1000research.51749.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/26/2022] [Indexed: 11/20/2022] Open
Abstract
Alternative splicing produces multiple functional transcripts from a single gene. Dysregulation of splicing is known to be associated with disease and as a hallmark of cancer. Existing tools for differential transcript usage (DTU) analysis either lack in performance, cannot account for complex experimental designs or do not scale to massive single-cell transcriptome sequencing (scRNA-seq) datasets. We introduce satuRn, a fast and flexible quasi-binomial generalized linear modelling framework that is on par with the best performing DTU methods from the bulk RNA-seq realm, while providing good false discovery rate control, addressing complex experimental designs, and scaling to scRNA-seq applications.
Collapse
Affiliation(s)
- Jeroen Gilis
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
- Data Mining and Modeling for Biomedicine, VIB Flemish Institute for Biotechnology, Ghent, 9000, Belgium
- Bioinformatics Institute, Ghent University, Ghent, 9000, Belgium
| | - Kristoffer Vitting-Seerup
- Department of Biology, Kobenhavns Universitet, Copenhagen, 2200, Denmark
- Biotech Research and Innovation Centre (BRIC), Kobenhavns Universitet, Copenhagen, 2200, Denmark
- Danish Cancer Society Research Center, Copenhagen, 2100, Denmark
- Department of Health Technology, Danish Technical University, Kongens Lyngby, 2800, Denmark
| | - Koen Van den Berge
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute, Ghent University, Ghent, 9000, Belgium
- Department of Statistics, University of California, Berkeley, Berkeley, California, USA
| | - Lieven Clement
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute, Ghent University, Ghent, 9000, Belgium
| |
Collapse
|
103
|
Wilson GW, Derouet M, Darling GE, Yeung JC. scSNV: accurate dscRNA-seq SNV co-expression analysis using duplicate tag collapsing. Genome Biol 2021; 22:144. [PMID: 33962667 PMCID: PMC8103760 DOI: 10.1186/s13059-021-02364-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 04/23/2021] [Indexed: 12/21/2022] Open
Abstract
Identifying single nucleotide variants has become common practice for droplet-based single-cell RNA-seq experiments; however, presently, a pipeline does not exist to maximize variant calling accuracy. Furthermore, molecular duplicates generated in these experiments have not been utilized to optimally detect variant co-expression. Herein, we introduce scSNV designed from the ground up to "collapse" molecular duplicates and accurately identify variants and their co-expression. We demonstrate that scSNV is fast, with a reduced false-positive variant call rate, and enables the co-detection of genetic variants and A>G RNA edits across twenty-two samples.
Collapse
Affiliation(s)
- Gavin W Wilson
- Latner Thoracic Surgery Research Laboratories, University Health Network, 101 College St., 2-501, Toronto, ON, M5G 2C4, Canada.
| | - Mathieu Derouet
- Latner Thoracic Surgery Research Laboratories, University Health Network, 101 College St., 2-501, Toronto, ON, M5G 2C4, Canada
| | - Gail E Darling
- Latner Thoracic Surgery Research Laboratories, University Health Network, 101 College St., 2-501, Toronto, ON, M5G 2C4, Canada.,Division of Thoracic Surgery, Department of Surgery, University of Toronto, Toronto, M5G 2C4, Canada
| | - Jonathan C Yeung
- Latner Thoracic Surgery Research Laboratories, University Health Network, 101 College St., 2-501, Toronto, ON, M5G 2C4, Canada. .,Division of Thoracic Surgery, Department of Surgery, University of Toronto, Toronto, M5G 2C4, Canada. .,Toronto General Hospital, 200 Elizabeth St, 9N-983, Toronto, ON, M5G 2C4, Canada.
| |
Collapse
|
104
|
Cell-level metadata are indispensable for documenting single-cell sequencing datasets. PLoS Biol 2021; 19:e3001077. [PMID: 33945522 PMCID: PMC8121533 DOI: 10.1371/journal.pbio.3001077] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 05/14/2021] [Indexed: 11/19/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) provides an unprecedented view of cellular diversity of biological systems. However, across the thousands of publications and datasets generated using this technology, we estimate that only a minority (<25%) of studies provide cell-level metadata information containing identified cell types and related findings of the published dataset. Metadata omission hinders reproduction, exploration, validation, and knowledge transfer and is a common problem across journals, data repositories, and publication dates. We encourage investigators, reviewers, journals, and data repositories to improve their standards and ensure proper documentation of these valuable datasets. Most Gene Expression Omnibus (GEO) depositions of single-cell mRNA sequencing data do not include cell-level metadata generated by typical analysis pipelines; this Essay maintains that this omission greatly hinders reproduction, exploration, validation, and knowledge transfer and is a common problem across journals, data repositories, and publication dates.
Collapse
|
105
|
Gillen AE, Goering R, Taliaferro JM. Quantifying alternative polyadenylation in RNAseq data with LABRAT. Methods Enzymol 2021; 655:245-263. [PMID: 34183124 DOI: 10.1016/bs.mie.2021.03.018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Alternative polyadenylation (APA) generates transcript isoforms that differ in their 3' UTR content and may therefore be subject to different regulatory fates. Although the existence of APA has been known for decades, quantification of APA isoforms from high-throughput RNA sequencing data has been difficult. To facilitate the study of APA in large datasets, we developed an APA quantification technique called LABRAT (Lightweight Alignment-Based Reckoning of Alternative Three-prime ends). LABRAT leverages modern transcriptome quantification approaches to determine the relative abundances of APA isoforms. In this manuscript we describe how LABRAT produces its calculations, provide a step-by-step protocol for its use, and demonstrate its ability to quantify APA in single-cell RNAseq data.
Collapse
Affiliation(s)
- Austin E Gillen
- Division of Hematology, University of Colorado School of Medicine, Aurora, CO, United States
| | - Raeann Goering
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States; RNA Bioscience Initiative, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - J Matthew Taliaferro
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States; RNA Bioscience Initiative, University of Colorado Anschutz Medical Campus, Aurora, CO, United States.
| |
Collapse
|
106
|
Zheng H, Rao AM, Dermadi D, Toh J, Murphy Jones L, Donato M, Liu Y, Su Y, Dai CL, Kornilov SA, Karagiannis M, Marantos T, Hasin-Brumshtein Y, He YD, Giamarellos-Bourboulis EJ, Heath JR, Khatri P. Multi-cohort analysis of host immune response identifies conserved protective and detrimental modules associated with severity across viruses. Immunity 2021; 54:753-768.e5. [PMID: 33765435 PMCID: PMC7988739 DOI: 10.1016/j.immuni.2021.03.002] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 12/03/2020] [Accepted: 03/01/2021] [Indexed: 02/08/2023]
Abstract
Viral infections induce a conserved host response distinct from bacterial infections. We hypothesized that the conserved response is associated with disease severity and is distinct between patients with different outcomes. To test this, we integrated 4,780 blood transcriptome profiles from patients aged 0 to 90 years infected with one of 16 viruses, including SARS-CoV-2, Ebola, chikungunya, and influenza, across 34 cohorts from 18 countries, and single-cell RNA sequencing profiles of 702,970 immune cells from 289 samples across three cohorts. Severe viral infection was associated with increased hematopoiesis, myelopoiesis, and myeloid-derived suppressor cells. We identified protective and detrimental gene modules that defined distinct trajectories associated with mild versus severe outcomes. The interferon response was decoupled from the protective host response in patients with severe outcomes. These findings were consistent, irrespective of age and virus, and provide insights to accelerate the development of diagnostics and host-directed therapies to improve global pandemic preparedness.
Collapse
Affiliation(s)
- Hong Zheng
- Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, CA 94305, USA; Center for Biomedical Informatics Research, Department of Medicine, School of Medicine, Stanford University, CA 94305, USA
| | - Aditya M Rao
- Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, CA 94305, USA; Immunology program, Stanford University, CA 94305, USA
| | - Denis Dermadi
- Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, CA 94305, USA; Center for Biomedical Informatics Research, Department of Medicine, School of Medicine, Stanford University, CA 94305, USA
| | - Jiaying Toh
- Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, CA 94305, USA; Immunology program, Stanford University, CA 94305, USA
| | - Lara Murphy Jones
- Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, CA 94305, USA; Center for Biomedical Informatics Research, Department of Medicine, School of Medicine, Stanford University, CA 94305, USA; Division of Critical Care Medicine, Department of Pediatrics, School of Medicine, Stanford University, CA 94305, USA
| | - Michele Donato
- Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, CA 94305, USA; Center for Biomedical Informatics Research, Department of Medicine, School of Medicine, Stanford University, CA 94305, USA
| | - Yiran Liu
- Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, CA 94305, USA; Cancer Biology program, Stanford University, CA 94305, USA
| | - Yapeng Su
- Institute for Systems Biology, Seattle, WA, USA
| | - Cheng L Dai
- Institute for Systems Biology, Seattle, WA, USA
| | | | - Minas Karagiannis
- 4(th) Department of Internal Medicine, National and Kapodistrian University of Athens, Medical School, 124 62 Athens, Greece
| | - Theodoros Marantos
- 4(th) Department of Internal Medicine, National and Kapodistrian University of Athens, Medical School, 124 62 Athens, Greece
| | | | | | | | - James R Heath
- Institute for Systems Biology, Seattle, WA, USA; Department of Bioengineering, University of Washington, Seattle, WA 98195
| | - Purvesh Khatri
- Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, CA 94305, USA; Center for Biomedical Informatics Research, Department of Medicine, School of Medicine, Stanford University, CA 94305, USA.
| |
Collapse
|
107
|
Modular, efficient and constant-memory single-cell RNA-seq preprocessing. Nat Biotechnol 2021; 39:813-818. [PMID: 33795888 DOI: 10.1038/s41587-021-00870-2] [Citation(s) in RCA: 177] [Impact Index Per Article: 59.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 02/09/2021] [Indexed: 11/08/2022]
Abstract
We describe a workflow for preprocessing of single-cell RNA-sequencing data that balances efficiency and accuracy. Our workflow is based on the kallisto and bustools programs, and is near optimal in speed with a constant memory requirement providing scalability for arbitrarily large datasets. The workflow is modular, and we demonstrate its flexibility by showing how it can be used for RNA velocity analyses.
Collapse
|
108
|
Prokop JW, Bupp CP, Frisch A, Bilinovich SM, Campbell DB, Vogt D, Schultz CR, Uhl KL, VanSickle E, Rajasekaran S, Bachmann AS. Emerging Role of ODC1 in Neurodevelopmental Disorders and Brain Development. Genes (Basel) 2021; 12:genes12040470. [PMID: 33806076 PMCID: PMC8064465 DOI: 10.3390/genes12040470] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Revised: 03/15/2021] [Accepted: 03/22/2021] [Indexed: 01/18/2023] Open
Abstract
Ornithine decarboxylase 1 (ODC1 gene) has been linked through gain-of-function variants to a rare disease featuring developmental delay, alopecia, macrocephaly, and structural brain anomalies. ODC1 has been linked to additional diseases like cancer, with growing evidence for neurological contributions to schizophrenia, mood disorders, anxiety, epilepsy, learning, and suicidal behavior. The evidence of ODC1 connection to neural disorders highlights the need for a systematic analysis of ODC1 genotype-to-phenotype associations. An analysis of variants from ClinVar, Geno2MP, TOPMed, gnomAD, and COSMIC revealed an intellectual disability and seizure connected loss-of-function variant, ODC G84R (rs138359527, NC_000002.12:g.10444500C > T). The missense variant is found in ~1% of South Asian individuals and results in 2.5-fold decrease in enzyme function. Expression quantitative trait loci (eQTLs) reveal multiple functionally annotated, non-coding variants regulating ODC1 that associate with psychiatric/neurological phenotypes. Further dissection of RNA-Seq during fetal brain development and within cerebral organoids showed an association of ODC1 expression with cell proliferation of neural progenitor cells, suggesting gain-of-function variants with neural over-proliferation and loss-of-function variants with neural depletion. The linkage from the expression data of ODC1 in early neural progenitor proliferation to phenotypes of neurodevelopmental delay and to the connection of polyamine metabolites in brain function establish ODC1 as a bona fide neurodevelopmental disorder gene.
Collapse
Affiliation(s)
- Jeremy W. Prokop
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA; (C.P.B.); (A.F.); (S.M.B.); (D.B.C.); (D.V.); (C.R.S.); (K.L.U.); (S.R.)
- Department of Pharmacology and Toxicology, Michigan State University, East Lansing, MI 48824, USA
- Center for Research in Autism, Intellectual, and Other Neurodevelopmental Disabilities, Michigan State University, East Lansing, MI 48824, USA
- Correspondence: (J.W.P.); (A.S.B.)
| | - Caleb P. Bupp
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA; (C.P.B.); (A.F.); (S.M.B.); (D.B.C.); (D.V.); (C.R.S.); (K.L.U.); (S.R.)
- Spectrum Health Medical Genetics, Grand Rapids, MI 49503, USA;
| | - Austin Frisch
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA; (C.P.B.); (A.F.); (S.M.B.); (D.B.C.); (D.V.); (C.R.S.); (K.L.U.); (S.R.)
| | - Stephanie M. Bilinovich
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA; (C.P.B.); (A.F.); (S.M.B.); (D.B.C.); (D.V.); (C.R.S.); (K.L.U.); (S.R.)
| | - Daniel B. Campbell
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA; (C.P.B.); (A.F.); (S.M.B.); (D.B.C.); (D.V.); (C.R.S.); (K.L.U.); (S.R.)
- Center for Research in Autism, Intellectual, and Other Neurodevelopmental Disabilities, Michigan State University, East Lansing, MI 48824, USA
- Neuroscience Program, Michigan State University, East Lansing, MI 48824, USA
| | - Daniel Vogt
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA; (C.P.B.); (A.F.); (S.M.B.); (D.B.C.); (D.V.); (C.R.S.); (K.L.U.); (S.R.)
- Center for Research in Autism, Intellectual, and Other Neurodevelopmental Disabilities, Michigan State University, East Lansing, MI 48824, USA
- Neuroscience Program, Michigan State University, East Lansing, MI 48824, USA
| | - Chad R. Schultz
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA; (C.P.B.); (A.F.); (S.M.B.); (D.B.C.); (D.V.); (C.R.S.); (K.L.U.); (S.R.)
| | - Katie L. Uhl
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA; (C.P.B.); (A.F.); (S.M.B.); (D.B.C.); (D.V.); (C.R.S.); (K.L.U.); (S.R.)
| | | | - Surender Rajasekaran
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA; (C.P.B.); (A.F.); (S.M.B.); (D.B.C.); (D.V.); (C.R.S.); (K.L.U.); (S.R.)
- Pediatric Intensive Care Unit, Helen DeVos Children’s Hospital, Grand Rapids, MI 49503, USA
- Office of Research, Spectrum Health, Grand Rapids, MI 49503, USA
| | - André S. Bachmann
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA; (C.P.B.); (A.F.); (S.M.B.); (D.B.C.); (D.V.); (C.R.S.); (K.L.U.); (S.R.)
- Correspondence: (J.W.P.); (A.S.B.)
| |
Collapse
|
109
|
Bilinovich SM, Uhl KL, Lewis K, Soehnlen X, Williams M, Vogt D, Prokop JW, Campbell DB. Integrated RNA Sequencing Reveals Epigenetic Impacts of Diesel Particulate Matter Exposure in Human Cerebral Organoids. Dev Neurosci 2021; 42:195-207. [PMID: 33657557 DOI: 10.1159/000513536] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 12/02/2020] [Indexed: 12/25/2022] Open
Abstract
Autism spectrum disorder (ASD) manifests early in childhood. While genetic variants increase risk for ASD, a growing body of literature has established that in utero chemical exposures also contribute to ASD risk. These chemicals include air-based pollutants like diesel particulate matter (DPM). A combination of single-cell and direct transcriptomics of DPM-exposed human-induced pluripotent stem cell-derived cerebral organoids revealed toxicogenomic effects of DPM exposure during fetal brain development. Direct transcriptomics, sequencing RNA bases via Nanopore, revealed that cerebral organoids contain extensive RNA modifications, with DPM-altering cytosine methylation in oxidative mitochondrial transcripts expressed in outer radial glia cells. Single-cell transcriptomics further confirmed an oxidative phosphorylation change in cell groups such as outer radial glia upon DPM exposure. This approach highlights how DPM exposure perturbs normal mitochondrial function and cellular respiration during early brain development, which may contribute to developmental disorders like ASD by altering neurodevelopment.
Collapse
Affiliation(s)
- Stephanie M Bilinovich
- Department of Pediatrics & Human Development, Michigan State University, Grand Rapids, Michigan, USA
| | - Katie L Uhl
- Department of Pediatrics & Human Development, Michigan State University, Grand Rapids, Michigan, USA
| | - Kristy Lewis
- Department of Pediatrics & Human Development, Michigan State University, Grand Rapids, Michigan, USA
| | - Xavier Soehnlen
- Department of Pediatrics & Human Development, Michigan State University, Grand Rapids, Michigan, USA
| | - Michael Williams
- Department of Pediatrics & Human Development, Michigan State University, Grand Rapids, Michigan, USA.,Center for Research in Autism, Intellectual, and other Neurodevelopmental Disabilities, Michigan State University, East Lansing, Michigan, USA.,Neuroscience Program, Michigan State University, East Lansing, Michigan, USA
| | - Daniel Vogt
- Department of Pediatrics & Human Development, Michigan State University, Grand Rapids, Michigan, USA.,Center for Research in Autism, Intellectual, and other Neurodevelopmental Disabilities, Michigan State University, East Lansing, Michigan, USA.,Neuroscience Program, Michigan State University, East Lansing, Michigan, USA
| | - Jeremy W Prokop
- Department of Pediatrics & Human Development, Michigan State University, Grand Rapids, Michigan, USA.,Center for Research in Autism, Intellectual, and other Neurodevelopmental Disabilities, Michigan State University, East Lansing, Michigan, USA.,Department of Pharmacology and Toxicology, Michigan State University, East Lansing, Michigan, USA
| | - Daniel B Campbell
- Department of Pediatrics & Human Development, Michigan State University, Grand Rapids, Michigan, USA, .,Center for Research in Autism, Intellectual, and other Neurodevelopmental Disabilities, Michigan State University, East Lansing, Michigan, USA, .,Neuroscience Program, Michigan State University, East Lansing, Michigan, USA,
| |
Collapse
|
110
|
Cribbs AP, Filippakopoulos P, Philpott M, Wells G, Penn H, Oerum H, Valge-Archer V, Feldmann M, Oppermann U. Dissecting the Role of BET Bromodomain Proteins BRD2 and BRD4 in Human NK Cell Function. Front Immunol 2021; 12:626255. [PMID: 33717143 PMCID: PMC7953504 DOI: 10.3389/fimmu.2021.626255] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 01/13/2021] [Indexed: 12/19/2022] Open
Abstract
Natural killer (NK) cells are innate lymphocytes that play a pivotal role in the immune surveillance and elimination of transformed or virally infected cells. Using a chemo-genetic approach, we identify BET bromodomain containing proteins BRD2 and BRD4 as central regulators of NK cell functions, including direct cytokine secretion, NK cell contact-dependent inflammatory cytokine secretion from monocytes as well as NK cell cytolytic functions. We show that both BRD2 and BRD4 control inflammatory cytokine production in NK cells isolated from healthy volunteers and from rheumatoid arthritis patients. In contrast, knockdown of BRD4 but not of BRD2 impairs NK cell cytolytic responses, suggesting BRD4 as critical regulator of NK cell mediated tumor cell elimination. This is supported by pharmacological targeting where the first-generation pan-BET bromodomain inhibitor JQ1(+) displays anti-inflammatory effects and inhibit tumor cell eradication, while the novel bivalent BET bromodomain inhibitor AZD5153, which shows differential activity towards BET family members, does not. Given the important role of both cytokine-mediated inflammatory microenvironment and cytolytic NK cell activities in immune-oncology therapies, our findings present a compelling argument for further clinical investigation.
Collapse
Affiliation(s)
- Adam P Cribbs
- Botnar Research Center, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, National Institute of Health Research Oxford Biomedical Research Unit (BRU), University of Oxford, Oxford, United Kingdom
| | | | - Martin Philpott
- Botnar Research Center, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, National Institute of Health Research Oxford Biomedical Research Unit (BRU), University of Oxford, Oxford, United Kingdom
| | - Graham Wells
- Botnar Research Center, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, National Institute of Health Research Oxford Biomedical Research Unit (BRU), University of Oxford, Oxford, United Kingdom
| | - Henry Penn
- Arthritis Centre, Northwick Park Hospital, Harrow, United Kingdom
| | - Henrik Oerum
- Roche Innovation Center Copenhagen A/S, Hørsholm, Denmark
| | - Viia Valge-Archer
- Bioscience, Research and Early Development, Oncology R&D, AstraZeneca, Cambridge, United Kingdom
| | - Marc Feldmann
- Kennedy Institute of Rheumatology Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, Oxford, United Kingdom
| | - Udo Oppermann
- Botnar Research Center, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, National Institute of Health Research Oxford Biomedical Research Unit (BRU), University of Oxford, Oxford, United Kingdom.,Freiburg Institute of Advanced Studies, Freiburg, Germany.,Oxford Centre for Translational Myeloma Research, Oxford, United Kingdom
| |
Collapse
|
111
|
Mukherjee K, Xue L, Planutis A, Gnanapragasam MN, Chess A, Bieker JJ. EKLF/KLF1 expression defines a unique macrophage subset during mouse erythropoiesis. eLife 2021; 10:61070. [PMID: 33570494 PMCID: PMC7932694 DOI: 10.7554/elife.61070] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 02/10/2021] [Indexed: 12/17/2022] Open
Abstract
Erythroblastic islands are a specialized niche that contain a central macrophage surrounded by erythroid cells at various stages of maturation. However, identifying the precise genetic and transcriptional control mechanisms in the island macrophage remains difficult due to macrophage heterogeneity. Using unbiased global sequencing and directed genetic approaches focused on early mammalian development, we find that fetal liver macrophages exhibit a unique expression signature that differentiates them from erythroid and adult macrophage cells. The importance of erythroid Krüppel-like factor (EKLF)/KLF1 in this identity is shown by expression analyses in EKLF-/- and in EKLF-marked macrophage cells. Single-cell sequence analysis simplifies heterogeneity and identifies clusters of genes important for EKLF-dependent macrophage function and novel cell surface biomarkers. Remarkably, this singular set of macrophage island cells appears transiently during embryogenesis. Together, these studies provide a detailed perspective on the importance of EKLF in the establishment of the dynamic gene expression network within erythroblastic islands in the developing embryo and provide the means for their efficient isolation.
Collapse
Affiliation(s)
- Kaustav Mukherjee
- Department of Cell, Developmental, and Regenerative Biology, Mount Sinai School of MedicineNew York, NYUnited States
- Black Family Stem Cell InstituteNew York, NYUnited States
| | - Li Xue
- Department of Cell, Developmental, and Regenerative Biology, Mount Sinai School of MedicineNew York, NYUnited States
| | - Antanas Planutis
- Department of Cell, Developmental, and Regenerative Biology, Mount Sinai School of MedicineNew York, NYUnited States
| | - Merlin Nithya Gnanapragasam
- Department of Cell, Developmental, and Regenerative Biology, Mount Sinai School of MedicineNew York, NYUnited States
| | - Andrew Chess
- Department of Cell, Developmental, and Regenerative Biology, Mount Sinai School of MedicineNew York, NYUnited States
| | - James J Bieker
- Department of Cell, Developmental, and Regenerative Biology, Mount Sinai School of MedicineNew York, NYUnited States
- Black Family Stem Cell InstituteNew York, NYUnited States
- Tisch Cancer InstituteNew York, NYUnited States
- Mindich Child Health and Development Institute, Mount Sinai School of MedicineNew York, NYUnited States
| |
Collapse
|
112
|
Wang YXR, Li L, Li JJ, Huang H. Network Modeling in Biology: Statistical Methods for Gene and Brain Networks. Stat Sci 2021; 36:89-108. [PMID: 34305304 PMCID: PMC8296984 DOI: 10.1214/20-sts792] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The rise of network data in many different domains has offered researchers new insight into the problem of modeling complex systems and propelled the development of numerous innovative statistical methodologies and computational tools. In this paper, we primarily focus on two types of biological networks, gene networks and brain networks, where statistical network modeling has found both fruitful and challenging applications. Unlike other network examples such as social networks where network edges can be directly observed, both gene and brain networks require careful estimation of edges using covariates as a first step. We provide a discussion on existing statistical and computational methods for edge esitimation and subsequent statistical inference problems in these two types of biological networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- School of Mathematics and Statistics, University of Sydney, Australia
| | - Lexin Li
- Department of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley
| | | | - Haiyan Huang
- Department of Statistics, University of California, Berkeley
| |
Collapse
|
113
|
Van Buren S, Sarkar H, Srivastava A, Rashid NU, Patro R, Love MI. Compression of quantification uncertainty for scRNA-seq counts. Bioinformatics 2021; 37:1699-1707. [PMID: 33471073 PMCID: PMC8289386 DOI: 10.1093/bioinformatics/btab001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 11/16/2020] [Accepted: 01/04/2021] [Indexed: 11/13/2022] Open
Abstract
Motivation Quantification estimates of gene expression from single-cell RNA-seq (scRNA-seq) data have inherent uncertainty due to reads that map to multiple genes. Many existing scRNA-seq quantification pipelines ignore multi-mapping reads and therefore underestimate expected read counts for many genes. alevin accounts for multi-mapping reads and allows for the generation of ‘inferential replicates’, which reflect quantification uncertainty. Previous methods have shown improved performance when incorporating these replicates into statistical analyses, but storage and use of these replicates increases computation time and memory requirements. Results We demonstrate that storing only the mean and variance from a set of inferential replicates (‘compression’) is sufficient to capture gene-level quantification uncertainty, while reducing disk storage to as low as 9% of original storage, and memory usage when loading data to as low as 6%. Using these values, we generate ‘pseudo-inferential’ replicates from a negative binomial distribution and propose a general procedure for incorporating these replicates into a proposed statistical testing framework. When applying this procedure to trajectory-based differential expression analyses, we show false positives are reduced by more than a third for genes with high levels of quantification uncertainty. We additionally extend the Swish method to incorporate pseudo-inferential replicates and demonstrate improvements in computation time and memory usage without any loss in performance. Lastly, we show that discarding multi-mapping reads can result in significant underestimation of counts for functionally important genes in a real dataset. Availability and implementation makeInfReps and splitSwish are implemented in the R/Bioconductor fishpond package available at https://bioconductor.org/packages/fishpond. Analyses and simulated datasets can be found in the paper’s GitHub repo at https://github.com/skvanburen/scUncertaintyPaperCode. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Scott Van Buren
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA
| | - Hirak Sarkar
- Department of Computer Science, University of Maryland College Park, MD 20742, USA.,Center for Bioinformatics and Computational Biology, University of Maryland College Park, MD 20742, USA
| | - Avi Srivastava
- New York Genome Center, New York, NY 10013, USA.,Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Naim U Rashid
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA.,Lineberger Comprehensive Cancer Center University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland College Park, MD 20742, USA.,Center for Bioinformatics and Computational Biology, University of Maryland College Park, MD 20742, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA.,Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| |
Collapse
|
114
|
Acosta J, Ssozi D, van Galen P. Single-Cell RNA Sequencing to Disentangle the Blood System. Arterioscler Thromb Vasc Biol 2021; 41:1012-1018. [PMID: 33441024 PMCID: PMC7901535 DOI: 10.1161/atvbaha.120.314654] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The blood system is often represented as a tree-like structure with stem cells that give rise to mature blood cell types through a series of demarcated steps. Although this representation has served as a model of hierarchical tissue organization for decades, single-cell technologies are shedding new light on the abundance of cell type intermediates and the molecular mechanisms that ensure balanced replenishment of differentiated cells. In this Brief Review, we exemplify new insights into blood cell differentiation generated by single-cell RNA sequencing, summarize considerations for the application of this technology, and highlight innovations that are leading the way to understand hematopoiesis at the resolution of single cells. Graphic Abstract: A graphic abstract is available for this article.
Collapse
Affiliation(s)
- Jean Acosta
- Division of Hematology, Brigham and Women's Hospital, Boston, MA. Department of Medicine, Harvard Medical School, Boston, MA. Broad Institute of MIT and Harvard, Cambridge, MA
| | - Daniel Ssozi
- Division of Hematology, Brigham and Women's Hospital, Boston, MA. Department of Medicine, Harvard Medical School, Boston, MA. Broad Institute of MIT and Harvard, Cambridge, MA
| | - Peter van Galen
- Division of Hematology, Brigham and Women's Hospital, Boston, MA. Department of Medicine, Harvard Medical School, Boston, MA. Broad Institute of MIT and Harvard, Cambridge, MA
| |
Collapse
|
115
|
Soneson C, Srivastava A, Patro R, Stadler MB. Preprocessing choices affect RNA velocity results for droplet scRNA-seq data. PLoS Comput Biol 2021; 17:e1008585. [PMID: 33428615 PMCID: PMC7822509 DOI: 10.1371/journal.pcbi.1008585] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 01/22/2021] [Accepted: 11/30/2020] [Indexed: 12/25/2022] Open
Abstract
Experimental single-cell approaches are becoming widely used for many purposes, including investigation of the dynamic behaviour of developing biological systems. Consequently, a large number of computational methods for extracting dynamic information from such data have been developed. One example is RNA velocity analysis, in which spliced and unspliced RNA abundances are jointly modeled in order to infer a 'direction of change' and thereby a future state for each cell in the gene expression space. Naturally, the accuracy and interpretability of the inferred RNA velocities depend crucially on the correctness of the estimated abundances. Here, we systematically compare five widely used quantification tools, in total yielding thirteen different quantification approaches, in terms of their estimates of spliced and unspliced RNA abundances in five experimental droplet scRNA-seq data sets. We show that there are substantial differences between the quantifications obtained from different tools, and identify typical genes for which such discrepancies are observed. We further show that these abundance differences propagate to the downstream analysis, and can have a large effect on estimated velocities as well as the biological interpretation. Our results highlight that abundance quantification is a crucial aspect of the RNA velocity analysis workflow, and that both the definition of the genomic features of interest and the quantification algorithm itself require careful consideration.
Collapse
Affiliation(s)
- Charlotte Soneson
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Avi Srivastava
- New York Genome Center, New York, United States of America
- Center for Genomics and Systems Biology, New York University, New York, United States of America
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, Maryland, United States of America
| | - Michael B. Stadler
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
- University of Basel, Basel, Switzerland
| |
Collapse
|
116
|
Abstract
Kidney fibrosis is the hallmark of chronic kidney disease progression; however, at present no antifibrotic therapies exist1-3. The origin, functional heterogeneity and regulation of scar-forming cells that occur during human kidney fibrosis remain poorly understood1,2,4. Here, using single-cell RNA sequencing, we profiled the transcriptomes of cells from the proximal and non-proximal tubules of healthy and fibrotic human kidneys to map the entire human kidney. This analysis enabled us to map all matrix-producing cells at high resolution, and to identify distinct subpopulations of pericytes and fibroblasts as the main cellular sources of scar-forming myofibroblasts during human kidney fibrosis. We used genetic fate-tracing, time-course single-cell RNA sequencing and ATAC-seq (assay for transposase-accessible chromatin using sequencing) experiments in mice, and spatial transcriptomics in human kidney fibrosis, to shed light on the cellular origins and differentiation of human kidney myofibroblasts and their precursors at high resolution. Finally, we used this strategy to detect potential therapeutic targets, and identified NKD2 as a myofibroblast-specific target in human kidney fibrosis.
Collapse
|
117
|
Oh Y, Yang S, Liu X, Jana S, Izaddoustdar F, Gao X, Debi R, Kim DK, Kim KH, Yang P, Kassiri Z, Lakin R, Backx PH. Transcriptomic Bioinformatic Analyses of Atria Uncover Involvement of Pathways Related to Strain and Post-translational Modification of Collagen in Increased Atrial Fibrillation Vulnerability in Intensely Exercised Mice. Front Physiol 2020; 11:605671. [PMID: 33424629 PMCID: PMC7793719 DOI: 10.3389/fphys.2020.605671] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2020] [Accepted: 11/26/2020] [Indexed: 02/06/2023] Open
Abstract
Atrial Fibrillation (AF) is the most common supraventricular tachyarrhythmia that is typically associated with cardiovascular disease (CVD) and poor cardiovascular health. Paradoxically, endurance athletes are also at risk for AF. While it is well-established that persistent AF is associated with atrial fibrosis, hypertrophy and inflammation, intensely exercised mice showed similar adverse atrial changes and increased AF vulnerability, which required tumor necrosis factor (TNF) signaling, even though ventricular structure and function improved. To identify some of the molecular factors underlying the chamber-specific and TNF-dependent atrial changes induced by exercise, we performed transcriptome analyses of hearts from wild-type and TNF-knockout mice following exercise for 2 days, 2 or 6 weeks of exercise. Consistent with the central role of atrial stretch arising from elevated venous pressure in AF promotion, all 3 time points were associated with differential regulation of genes in atria linked to mechanosensing (focal adhesion kinase, integrins and cell-cell communications), extracellular matrix (ECM) and TNF pathways, with TNF appearing to play a permissive, rather than causal, role in gene changes. Importantly, mechanosensing/ECM genes were only enriched, along with tubulin- and hypertrophy-related genes after 2 days of exercise while being downregulated at 2 and 6 weeks, suggesting that early reactive strain-dependent remodeling with exercise yields to compensatory adjustments. Moreover, at the later time points, there was also downregulation of both collagen genes and genes involved in collagen turnover, a pattern mirroring aging-related fibrosis. By comparison, twofold fewer genes were differentially regulated in ventricles vs. atria, independently of TNF. Our findings reveal that exercise promotes TNF-dependent atrial transcriptome remodeling of ECM/mechanosensing pathways, consistent with increased preload and atrial stretch seen with exercise. We propose that similar preload-dependent mechanisms are responsible for atrial changes and AF in both CVD patients and athletes.
Collapse
Affiliation(s)
- Yena Oh
- Department of Biology, York University, Toronto, ON, Canada.,Department of Physiology, University of Toronto, Toronto, ON, Canada.,Department of Cellular and Molecular Medicine, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada.,University of Ottawa Heart Institute, Ottawa, ON, Canada
| | - Sibao Yang
- Department of Biology, York University, Toronto, ON, Canada.,Department of Cardiology, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Xueyan Liu
- Department of Biology, York University, Toronto, ON, Canada.,Department of Cardiology, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Sayantan Jana
- Department of Physiology, Cardiovascular Research Center, University of Alberta, Edmonton, AB, Canada
| | | | - Xiaodong Gao
- Department of Biology, York University, Toronto, ON, Canada
| | - Ryan Debi
- Department of Biology, York University, Toronto, ON, Canada
| | - Dae-Kyum Kim
- Donnelly Centre, University of Toronto, Toronto, ON, Canada.,Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Kyoung-Han Kim
- Department of Cellular and Molecular Medicine, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada.,University of Ottawa Heart Institute, Ottawa, ON, Canada
| | - Ping Yang
- Department of Cardiology, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Zamaneh Kassiri
- Department of Physiology, Cardiovascular Research Center, University of Alberta, Edmonton, AB, Canada
| | - Robert Lakin
- Department of Biology, York University, Toronto, ON, Canada
| | - Peter H Backx
- Department of Biology, York University, Toronto, ON, Canada.,Department of Physiology, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
118
|
Zhang Z, Cui F, Wang C, Zhao L, Zou Q. Goals and approaches for each processing step for single-cell RNA sequencing data. Brief Bioinform 2020; 22:6034054. [PMID: 33316046 DOI: 10.1093/bib/bbaa314] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 10/10/2020] [Accepted: 10/16/2020] [Indexed: 12/12/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has enabled researchers to study gene expression at the cellular level. However, due to the extremely low levels of transcripts in a single cell and technical losses during reverse transcription, gene expression at a single-cell resolution is usually noisy and highly dimensional; thus, statistical analyses of single-cell data are a challenge. Although many scRNA-seq data analysis tools are currently available, a gold standard pipeline is not available for all datasets. Therefore, a general understanding of bioinformatics and associated computational issues would facilitate the selection of appropriate tools for a given set of data. In this review, we provide an overview of the goals and most popular computational analysis tools for the quality control, normalization, imputation, feature selection and dimension reduction of scRNA-seq data.
Collapse
Affiliation(s)
- Zilong Zhang
- University of Electronic Science and Technology of China
| | | | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology
| | - Lingling Zhao
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China
| |
Collapse
|
119
|
Tekman M, Batut B, Ostrovsky A, Antoniewski C, Clements D, Ramirez F, Etherington GJ, Hotz HR, Scholtalbers J, Manning JR, Bellenger L, Doyle MA, Heydarian M, Huang N, Soranzo N, Moreno P, Mautner S, Papatheodorou I, Nekrutenko A, Taylor J, Blankenberg D, Backofen R, Grüning B. A single-cell RNA-sequencing training and analysis suite using the Galaxy framework. Gigascience 2020; 9:5931798. [PMID: 33079170 PMCID: PMC7574357 DOI: 10.1093/gigascience/giaa102] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 08/30/2020] [Indexed: 11/25/2022] Open
Abstract
Background The vast ecosystem of single-cell RNA-sequencing tools has until recently been plagued by an excess of diverging analysis strategies, inconsistent file formats, and compatibility issues between different software suites. The uptake of 10x Genomics datasets has begun to calm this diversity, and the bioinformatics community leans once more towards the large computing requirements and the statistically driven methods needed to process and understand these ever-growing datasets. Results Here we outline several Galaxy workflows and learning resources for single-cell RNA-sequencing, with the aim of providing a comprehensive analysis environment paired with a thorough user learning experience that bridges the knowledge gap between the computational methods and the underlying cell biology. The Galaxy reproducible bioinformatics framework provides tools, workflows, and trainings that not only enable users to perform 1-click 10x preprocessing but also empower them to demultiplex raw sequencing from custom tagged and full-length sequencing protocols. The downstream analysis supports a range of high-quality interoperable suites separated into common stages of analysis: inspection, filtering, normalization, confounder removal, and clustering. The teaching resources cover concepts from computer science to cell biology. Access to all resources is provided at the singlecell.usegalaxy.eu portal. Conclusions The reproducible and training-oriented Galaxy framework provides a sustainable high-performance computing environment for users to run flexible analyses on both 10x and alternative platforms. The tutorials from the Galaxy Training Network along with the frequent training workshops hosted by the Galaxy community provide a means for users to learn, publish, and teach single-cell RNA-sequencing analysis.
Collapse
Affiliation(s)
- Mehmet Tekman
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Bérénice Batut
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Alexander Ostrovsky
- Department of Biology, Johns Hopkins University, Mudd Hall 144, 3400 N. Charles Street, Baltimore, MD 21218, USA
| | - Christophe Antoniewski
- ARTbio, Sorbonne Université, CNRS FR 3631, Inserm US 037, Paris, France.,Institut de Biologie Paris Seine, 9 Quai Saint-Bernard Université Pierre et Marie Curie, Campus Jussieu, Bâtiments A-B-C, 75005 Paris, France
| | - Dave Clements
- Department of Biology, Johns Hopkins University, Mudd Hall 144, 3400 N. Charles Street, Baltimore, MD 21218, USA
| | - Fidel Ramirez
- Boehringer Ingelheim International GmbH, Binger Strasse 173, 55216 Ingelheim am Rhein, Biberach, Germany
| | | | - Hans-Rudolf Hotz
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Maulbeerstrasse 66, 4058 Basel, Switzerland
| | - Jelle Scholtalbers
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Jonathan R Manning
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Lea Bellenger
- ARTbio, Sorbonne Université, CNRS FR 3631, Inserm US 037, Paris, France
| | - Maria A Doyle
- Research Computing Facility, Peter MacCallum Cancer Centre, Melbourne, 305 Grattan Street, Victoria 3000, Australia.,Sir Peter MacCallum Department of Oncology, The University of Melbourne, Victoria 3010, Australia
| | - Mohammad Heydarian
- Department of Biology, Johns Hopkins University, Mudd Hall 144, 3400 N. Charles Street, Baltimore, MD 21218, USA
| | - Ni Huang
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.,Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Nicola Soranzo
- Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Pablo Moreno
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Stefan Mautner
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Irene Papatheodorou
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - James Taylor
- Department of Biology, Johns Hopkins University, Mudd Hall 144, 3400 N. Charles Street, Baltimore, MD 21218, USA
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, 9500 Euclid Avenue, NB21 Cleveland, OH 44195, USA
| | - Rolf Backofen
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Björn Grüning
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| |
Collapse
|
120
|
Li B, Gould J, Yang Y, Sarkizova S, Tabaka M, Ashenberg O, Rosen Y, Slyper M, Kowalczyk MS, Villani AC, Tickle T, Hacohen N, Rozenblatt-Rosen O, Regev A. Cumulus provides cloud-based data analysis for large-scale single-cell and single-nucleus RNA-seq. Nat Methods 2020; 17:793-798. [PMID: 32719530 PMCID: PMC7437817 DOI: 10.1038/s41592-020-0905-x] [Citation(s) in RCA: 104] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 06/18/2020] [Indexed: 11/10/2022]
Abstract
Massively parallel single-cell and single-nucleus RNA sequencing has opened the way to systematic tissue atlases in health and disease, but as the scale of data generation is growing, so is the need for computational pipelines for scaled analysis. Here we developed Cumulus-a cloud-based framework for analyzing large-scale single-cell and single-nucleus RNA sequencing datasets. Cumulus combines the power of cloud computing with improvements in algorithm and implementation to achieve high scalability, low cost, user-friendliness and integrated support for a comprehensive set of features. We benchmark Cumulus on the Human Cell Atlas Census of Immune Cells dataset of bone marrow cells and show that it substantially improves efficiency over conventional frameworks, while maintaining or improving the quality of results, enabling large-scale studies.
Collapse
Affiliation(s)
- Bo Li
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Division of Rheumatology, Allergy, and Immunology, Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital, Boston, MA, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, USA.
| | - Joshua Gould
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Yiming Yang
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Division of Rheumatology, Allergy, and Immunology, Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital, Boston, MA, USA
| | - Siranush Sarkizova
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Marcin Tabaka
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Orr Ashenberg
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Yanay Rosen
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Michal Slyper
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Monika S Kowalczyk
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Alexandra-Chloé Villani
- Division of Rheumatology, Allergy, and Immunology, Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
| | - Timothy Tickle
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Nir Hacohen
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
| | | | - Aviv Regev
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Koch Institute of Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
121
|
Hie B, Peters J, Nyquist SK, Shalek AK, Berger B, Bryson BD. Computational Methods for Single-Cell RNA Sequencing. Annu Rev Biomed Data Sci 2020. [DOI: 10.1146/annurev-biodatasci-012220-100601] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) has provided a high-dimensional catalog of millions of cells across species and diseases. These data have spurred the development of hundreds of computational tools to derive novel biological insights. Here, we outline the components of scRNA-seq analytical pipelines and the computational methods that underlie these steps. We describe available methods, highlight well-executed benchmarking studies, and identify opportunities for additional benchmarking studies and computational methods. As the biochemical approaches for single-cell omics advance, we propose coupled development of robust analytical pipelines suited for the challenges that new data present and principled selection of analytical methods that are suited for the biological questions to be addressed.
Collapse
Affiliation(s)
- Brian Hie
- Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Joshua Peters
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
| | - Sarah K. Nyquist
- Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
- Program in Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Alex K. Shalek
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
- Department of Chemistry, Institute for Medical Engineering & Science (IMES), and Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Bryan D. Bryson
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
122
|
Niebler S, Müller A, Hankeln T, Schmidt B. RainDrop: Rapid activation matrix computation for droplet-based single-cell RNA-seq reads. BMC Bioinformatics 2020; 21:274. [PMID: 32611394 PMCID: PMC7329424 DOI: 10.1186/s12859-020-03593-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 06/09/2020] [Indexed: 12/19/2022] Open
Abstract
Background Obtaining data from single-cell transcriptomic sequencing allows for the investigation of cell-specific gene expression patterns, which could not be addressed a few years ago. With the advancement of droplet-based protocols the number of studied cells continues to increase rapidly. This establishes the need for software tools for efficient processing of the produced large-scale datasets. We address this need by presenting RainDrop for fast gene-cell count matrix computation from single-cell RNA-seq data produced by 10x Genomics Chromium technology. Results RainDrop can process single-cell transcriptomic datasets consisting of 784 million reads sequenced from around 8.000 cells in less than 40 minutes on a standard workstation. It significantly outperforms the established Cell Ranger pipeline and the recently introduced Alevin tool in terms of runtime by a maximal (average) speedup of 30.4 (22.6) and 3.5 (2.4), respectively, while keeping high agreements of the generated results. Conclusions RainDrop is a software tool for highly efficient processing of large-scale droplet-based single-cell RNA-seq datasets on standard workstations written in C++. It is available at https://gitlab.rlp.net/stnieble/raindrop.
Collapse
Affiliation(s)
- Stefan Niebler
- Department of Computer Science, Johannes Gutenberg University, Mainz, 55099, Germany
| | - André Müller
- Department of Computer Science, Johannes Gutenberg University, Mainz, 55099, Germany
| | - Thomas Hankeln
- Molecular Genetics and Genome Analysis, Institute of Organismal and Molecular Evolution, Johannes Gutenberg University, Mainz, 55099, Germany
| | - Bertil Schmidt
- Department of Computer Science, Johannes Gutenberg University, Mainz, 55099, Germany.
| |
Collapse
|
123
|
Srivastava A, Malik L, Sarkar H, Patro R. A Bayesian framework for inter-cellular information sharing improves dscRNA-seq quantification. Bioinformatics 2020; 36:i292-i299. [PMID: 32657394 PMCID: PMC7355277 DOI: 10.1093/bioinformatics/btaa450] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Motivation Droplet-based single-cell RNA-seq (dscRNA-seq) data are being generated at an unprecedented pace, and the accurate estimation of gene-level abundances for each cell is a crucial first step in most dscRNA-seq analyses. When pre-processing the raw dscRNA-seq data to generate a count matrix, care must be taken to account for the potentially large number of multi-mapping locations per read. The sparsity of dscRNA-seq data, and the strong 3’ sampling bias, makes it difficult to disambiguate cases where there is no uniquely mapping read to any of the candidate target genes. Results We introduce a Bayesian framework for information sharing across cells within a sample, or across multiple modalities of data using the same sample, to improve gene quantification estimates for dscRNA-seq data. We use an anchor-based approach to connect cells with similar gene-expression patterns, and learn informative, empirical priors which we provide to alevin’s gene multi-mapping resolution algorithm. This improves the quantification estimates for genes with no uniquely mapping reads (i.e. when there is no unique intra-cellular information). We show our new model improves the per cell gene-level estimates and provides a principled framework for information sharing across multiple modalities. We test our method on a combination of simulated and real datasets under various setups. Availability and implementation The information sharing model is included in alevin and is implemented in C++14. It is available as open-source software, under GPL v3, at https://github.com/COMBINE-lab/salmon as of version 1.1.0.
Collapse
Affiliation(s)
- Avi Srivastava
- Department of Computer Science, Stony Brook University, Stony Brook 11794, NY, USA
| | - Laraib Malik
- Department of Computer Science, Stony Brook University, Stony Brook 11794, NY, USA
| | - Hirak Sarkar
- Computer Science Department, University of Maryland, College Park 20742, MD, USA
| | - Rob Patro
- Computer Science Department, University of Maryland, College Park 20742, MD, USA
| |
Collapse
|
124
|
Qian H, Kang X, Hu J, Zhang D, Liang Z, Meng F, Zhang X, Xue Y, Maimon R, Dowdy SF, Devaraj NK, Zhou Z, Mobley WC, Cleveland DW, Fu XD. Reversing a model of Parkinson's disease with in situ converted nigral neurons. Nature 2020; 582:550-556. [PMID: 32581380 PMCID: PMC7521455 DOI: 10.1038/s41586-020-2388-4] [Citation(s) in RCA: 291] [Impact Index Per Article: 72.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Accepted: 05/13/2020] [Indexed: 12/21/2022]
Abstract
Parkinson's disease is characterized by loss of dopamine neurons in the substantia nigra1. Similar to other major neurodegenerative disorders, there are no disease-modifying treatments for Parkinson's disease. While most treatment strategies aim to prevent neuronal loss or protect vulnerable neuronal circuits, a potential alternative is to replace lost neurons to reconstruct disrupted circuits2. Here we report an efficient one-step conversion of isolated mouse and human astrocytes to functional neurons by depleting the RNA-binding protein PTB (also known as PTBP1). Applying this approach to the mouse brain, we demonstrate progressive conversion of astrocytes to new neurons that innervate into and repopulate endogenous neural circuits. Astrocytes from different brain regions are converted to different neuronal subtypes. Using a chemically induced model of Parkinson's disease in mouse, we show conversion of midbrain astrocytes to dopaminergic neurons, which provide axons to reconstruct the nigrostriatal circuit. Notably, re-innervation of striatum is accompanied by restoration of dopamine levels and rescue of motor deficits. A similar reversal of disease phenotype is also accomplished by converting astrocytes to neurons using antisense oligonucleotides to transiently suppress PTB. These findings identify a potentially powerful and clinically feasible approach to treating neurodegeneration by replacing lost neurons.
Collapse
Affiliation(s)
- Hao Qian
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Xinjiang Kang
- State Key Laboratory of Membrane Biology and Peking-Tsinghua Center for Life Sciences, Institute of Molecular Medicine, Peking University, Beijing, China.,MOE Key Lab of Medical Electrophysiology, ICR, Southwest Medical University, Luzhou, China
| | - Jing Hu
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA.,Sichuan Provincial Key Laboratory for Human Disease Gene Study, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, China
| | - Dongyang Zhang
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA, USA
| | - Zhengyu Liang
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Fan Meng
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Xuan Zhang
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Yuanchao Xue
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA.,Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Roy Maimon
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA.,Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Steven F Dowdy
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Neal K Devaraj
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA, USA
| | - Zhuan Zhou
- State Key Laboratory of Membrane Biology and Peking-Tsinghua Center for Life Sciences, Institute of Molecular Medicine, Peking University, Beijing, China
| | - William C Mobley
- Department of Neurosciences and Center for Neural Circuits and Behavior, University of California, San Diego, La Jolla, CA, USA
| | - Don W Cleveland
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA.,Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Xiang-Dong Fu
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA. .,Institute of Genomic Medicine, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
125
|
Van de Sande B, Flerin C, Davie K, De Waegeneer M, Hulselmans G, Aibar S, Seurinck R, Saelens W, Cannoodt R, Rouchon Q, Verbeiren T, De Maeyer D, Reumers J, Saeys Y, Aerts S. A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat Protoc 2020; 15:2247-2276. [PMID: 32561888 DOI: 10.1038/s41596-020-0336-2] [Citation(s) in RCA: 518] [Impact Index Per Article: 129.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Accepted: 04/17/2020] [Indexed: 11/09/2022]
Abstract
This protocol explains how to perform a fast SCENIC analysis alongside standard best practices steps on single-cell RNA-sequencing data using software containers and Nextflow pipelines. SCENIC reconstructs regulons (i.e., transcription factors and their target genes) assesses the activity of these discovered regulons in individual cells and uses these cellular activity patterns to find meaningful clusters of cells. Here we present an improved version of SCENIC with several advances. SCENIC has been refactored and reimplemented in Python (pySCENIC), resulting in a tenfold increase in speed, and has been packaged into containers for ease of use. It is now also possible to use epigenomic track databases, as well as motifs, to refine regulons. In this protocol, we explain the different steps of SCENIC: the workflow starts from the count matrix depicting the gene abundances for all cells and consists of three stages. First, coexpression modules are inferred using a regression per-target approach (GRNBoost2). Next, the indirect targets are pruned from these modules using cis-regulatory motif discovery (cisTarget). Lastly, the activity of these regulons is quantified via an enrichment score for the regulon's target genes (AUCell). Nonlinear projection methods can be used to display visual groupings of cells based on the cellular activity patterns of these regulons. The results can be exported as a loom file and visualized in the SCope web application. This protocol is illustrated on two use cases: a peripheral blood mononuclear cell data set and a panel of single-cell RNA-sequencing cancer experiments. For a data set of 10,000 genes and 50,000 cells, the pipeline runs in <2 h.
Collapse
Affiliation(s)
- Bram Van de Sande
- VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Christopher Flerin
- VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Kristofer Davie
- VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium
| | - Maxime De Waegeneer
- VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Gert Hulselmans
- VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Sara Aibar
- VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Ruth Seurinck
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.,Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Wouter Saelens
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.,Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Robrecht Cannoodt
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.,Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.,Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Quentin Rouchon
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.,Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Toni Verbeiren
- Janssen Pharmaceutica, Beerse, Belgium.,Data Intuitive, Ghent, Belgium
| | | | | | - Yvan Saeys
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.,Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Stein Aerts
- VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium. .,Department of Human Genetics, KU Leuven, Leuven, Belgium.
| |
Collapse
|
126
|
Abstract
Background: Analysis of scATAC-seq data has been recently scaled to thousands of cells. While processing of other types of single cell data was boosted by the implementation of alignment-free techniques, pipelines available to process scATAC-seq data still require large computational resources. We propose here an approach based on pseudoalignment, which reduces the execution times and hardware needs at little cost for precision. Methods: Public data for 10k PBMC were downloaded from 10x Genomics web site. Reads were aligned to various references derived from DNase I Hypersensitive Sites (DHS) using kallisto and quantified with bustools. We compared our results with the ones publicly available derived by cellranger-atac. We subsequently tested our approach on scATAC-seq data for K562 cell line. Results: We found that kallisto does not introduce biases in quantification of known peaks; cells groups identified are consistent with the ones identified from standard method. We also found that cell identification is robust when analysis is performed using DHS-derived reference in place of de novo identification of ATAC peaks. Lastly, we found that our approach is suitable for reliable quantification of gene activity based on scATAC-seq signal, thus allows for efficient labelling of cell groups based on marker genes. Conclusions: Analysis of scATAC-seq data by means of kallisto produces results in line with standard pipelines while being considerably faster; using a set of known DHS sites as reference does not affect the ability to characterize the cell populations.
Collapse
Affiliation(s)
- Valentina Giansanti
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
- Center for Omics Sciences, IRCCS San Raffaele Institute, Milan, Italy
| | - Ming Tang
- FAS informatics, Harvard University, Cambridge, MA, USA
| | - Davide Cittaro
- Center for Omics Sciences, IRCCS San Raffaele Institute, Milan, Italy
| |
Collapse
|
127
|
Love MI, Soneson C, Hickey PF, Johnson LK, Pierce NT, Shepherd L, Morgan M, Patro R. Tximeta: Reference sequence checksums for provenance identification in RNA-seq. PLoS Comput Biol 2020; 16:e1007664. [PMID: 32097405 PMCID: PMC7059966 DOI: 10.1371/journal.pcbi.1007664] [Citation(s) in RCA: 154] [Impact Index Per Article: 38.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Revised: 03/06/2020] [Accepted: 01/18/2020] [Indexed: 11/19/2022] Open
Abstract
Correct annotation metadata is critical for reproducible and accurate RNA-seq analysis. When files are shared publicly or among collaborators with incorrect or missing annotation metadata, it becomes difficult or impossible to reproduce bioinformatic analyses from raw data. It also makes it more difficult to locate the transcriptomic features, such as transcripts or genes, in their proper genomic context, which is necessary for overlapping expression data with other datasets. We provide a solution in the form of an R/Bioconductor package tximeta that performs numerous annotation and metadata gathering tasks automatically on behalf of users during the import of transcript quantification files. The correct reference transcriptome is identified via a hashed checksum stored in the quantification output, and key transcript databases are downloaded and cached locally. The computational paradigm of automatically adding annotation metadata based on reference sequence checksums can greatly facilitate genomic workflows, by helping to reduce overhead during bioinformatic analyses, preventing costly bioinformatic mistakes, and promoting computational reproducibility. The tximeta package is available at https://bioconductor.org/packages/tximeta.
Collapse
Affiliation(s)
- Michael I. Love
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Charlotte Soneson
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Peter F. Hickey
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- The Department of Medical Biology, University of Melbourne, Parkville, Victoria, Australia
| | - Lisa K. Johnson
- Department of Population Health and Reproduction, University of California, Davis, Davis, California, United States of America
| | - N. Tessa Pierce
- Department of Population Health and Reproduction, University of California, Davis, Davis, California, United States of America
| | - Lori Shepherd
- Roswell Park Comprehensive Cancer Center, Buffalo, New York, United States of America
| | - Martin Morgan
- Roswell Park Comprehensive Cancer Center, Buffalo, New York, United States of America
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, Maryland, United States of America
| |
Collapse
|
128
|
Amezquita RA, Lun ATL, Becht E, Carey VJ, Carpp LN, Geistlinger L, Marini F, Rue-Albrecht K, Risso D, Soneson C, Waldron L, Pagès H, Smith ML, Huber W, Morgan M, Gottardo R, Hicks SC. Orchestrating single-cell analysis with Bioconductor. Nat Methods 2020; 17:137-145. [PMID: 31792435 PMCID: PMC7358058 DOI: 10.1038/s41592-019-0654-x] [Citation(s) in RCA: 410] [Impact Index Per Article: 102.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 09/13/2019] [Accepted: 10/14/2019] [Indexed: 12/24/2022]
Abstract
Recent technological advancements have enabled the profiling of a large number of genome-wide features in individual cells. However, single-cell data present unique challenges that require the development of specialized methods and software infrastructure to successfully derive biological insights. The Bioconductor project has rapidly grown to meet these demands, hosting community-developed open-source software distributed as R packages. Featuring state-of-the-art computational methods, standardized data infrastructure and interactive data visualization tools, we present an overview and online book (https://osca.bioconductor.org) of single-cell methods for prospective users.
Collapse
Affiliation(s)
| | - Aaron T L Lun
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
- Bioinformatics and Computational Biology, Genentech Inc., San Francisco, CA, USA
| | - Etienne Becht
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Vince J Carey
- Channing Division of Network Medicine, Brigham And Women's Hospital, Boston, MA, USA
| | | | - Ludwig Geistlinger
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY, USA
- Institute for Implementation Science in Population Health, City University of New York, New York, NY, USA
| | - Federico Marini
- Center for Thrombosis and Hemostasis, Mainz, Germany
- Institute of Medical Biostatistics, Epidemiology and Informatics, Mainz, Germany
| | | | - Davide Risso
- Department of Statistical Sciences, University of Padua, Padua, Italy
- Division of Biostatistics and Epidemiology, Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY, USA
| | - Charlotte Soneson
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Levi Waldron
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY, USA
- Institute for Implementation Science in Population Health, City University of New York, New York, NY, USA
| | - Hervé Pagès
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Mike L Smith
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Wolfgang Huber
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Martin Morgan
- Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA
| | | | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
| |
Collapse
|
129
|
Papatheodorou I, Moreno P, Manning J, Fuentes AMP, George N, Fexova S, Fonseca NA, Füllgrabe A, Green M, Huang N, Huerta L, Iqbal H, Jianu M, Mohammed S, Zhao L, Jarnuczak AF, Jupp S, Marioni J, Meyer K, Petryszak R, Prada Medina CA, Talavera-López C, Teichmann S, Vizcaino JA, Brazma A. Expression Atlas update: from tissues to single cells. Nucleic Acids Res 2020; 48:D77-D83. [PMID: 31665515 PMCID: PMC7145605 DOI: 10.1093/nar/gkz947] [Citation(s) in RCA: 202] [Impact Index Per Article: 50.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Revised: 10/07/2019] [Accepted: 10/16/2019] [Indexed: 12/16/2022] Open
Abstract
Expression Atlas is EMBL-EBI's resource for gene and protein expression. It sources and compiles data on the abundance and localisation of RNA and proteins in various biological systems and contexts and provides open access to this data for the research community. With the increased availability of single cell RNA-Seq datasets in the public archives, we have now extended Expression Atlas with a new added-value service to display gene expression in single cells. Single Cell Expression Atlas was launched in 2018 and currently includes 123 single cell RNA-Seq studies from 12 species. The website can be searched by genes within or across species to reveal experiments, tissues and cell types where this gene is expressed or under which conditions it is a marker gene. Within each study, cells can be visualized using a pre-calculated t-SNE plot and can be coloured by different features or by cell clusters based on gene expression. Within each experiment, there are links to downloadable files, such as RNA quantification matrices, clustering results, reports on protocols and associated metadata, such as assigned cell types.
Collapse
Affiliation(s)
- Irene Papatheodorou
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Pablo Moreno
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Jonathan Manning
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | | | - Nancy George
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Silvie Fexova
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Nuno A Fonseca
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Anja Füllgrabe
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Matthew Green
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Ni Huang
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | - Laura Huerta
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Haider Iqbal
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Monica Jianu
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Suhaib Mohammed
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Lingyun Zhao
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Andrew F Jarnuczak
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Simon Jupp
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - John Marioni
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Kerstin Meyer
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | - Robert Petryszak
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | | | | | - Sarah Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | - Juan Antonio Vizcaino
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| |
Collapse
|
130
|
Liu D. Algorithms for efficiently collapsing reads with Unique Molecular Identifiers. PeerJ 2019; 7:e8275. [PMID: 31871845 PMCID: PMC6921982 DOI: 10.7717/peerj.8275] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 11/22/2019] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND Unique Molecular Identifiers (UMI) are used in many experiments to find and remove PCR duplicates. There are many tools for solving the problem of deduplicating reads based on their finding reads with the same alignment coordinates and UMIs. However, many tools either cannot handle substitution errors, or require expensive pairwise UMI comparisons that do not efficiently scale to larger datasets. RESULTS We reformulate the problem of deduplicating UMIs in a manner that enables optimizations to be made, and more efficient data structures to be used. We implement our data structures and optimizations in a tool called UMICollapse, which is able to deduplicate over one million unique UMIs of length 9 at a single alignment position in around 26 s, using only a single thread and much less than 10 GB of memory. CONCLUSIONS We present a new formulation of the UMI deduplication problem, and show that it can be solved faster, with more sophisticated data structures.
Collapse
Affiliation(s)
- Daniel Liu
- Torrey Pines High School, San Diego, CA, United States of America
- Department of Psychiatry, University of California, San Diego, La Jolla, CA, United States of America
| |
Collapse
|
131
|
Zhu A, Srivastava A, Ibrahim JG, Patro R, Love MI. Nonparametric expression analysis using inferential replicate counts. Nucleic Acids Res 2019; 47:e105. [PMID: 31372651 PMCID: PMC6765120 DOI: 10.1093/nar/gkz622] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2019] [Revised: 06/11/2019] [Accepted: 07/11/2019] [Indexed: 11/13/2022] Open
Abstract
A primary challenge in the analysis of RNA-seq data is to identify differentially expressed genes or transcripts while controlling for technical biases. Ideally, a statistical testing procedure should incorporate the inherent uncertainty of the abundance estimates arising from the quantification step. Most popular methods for RNA-seq differential expression analysis fit a parametric model to the counts for each gene or transcript, and a subset of methods can incorporate uncertainty. Previous work has shown that nonparametric models for RNA-seq differential expression may have better control of the false discovery rate, and adapt well to new data types without requiring reformulation of a parametric model. Existing nonparametric models do not take into account inferential uncertainty, leading to an inflated false discovery rate, in particular at the transcript level. We propose a nonparametric model for differential expression analysis using inferential replicate counts, extending the existing SAMseq method to account for inferential uncertainty. We compare our method, Swish, with popular differential expression analysis methods. Swish has improved control of the false discovery rate, in particular for transcripts with high inferential uncertainty. We apply Swish to a single-cell RNA-seq dataset, assessing differential expression between sub-populations of cells, and compare its performance to the Wilcoxon test.
Collapse
Affiliation(s)
- Anqi Zhu
- Department of Biostatistics, University of North Carolina-Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599, USA
| | - Avi Srivastava
- Department of Computer Science, Stony Brook University, Computer Science Building, Engineering Dr, Stony Brook, NY 11794, USA
| | - Joseph G Ibrahim
- Department of Biostatistics, University of North Carolina-Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599, USA
| | - Rob Patro
- Department of Computer Science, Stony Brook University, Computer Science Building, Engineering Dr, Stony Brook, NY 11794, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina-Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599, USA
- Department of Genetics, University of North Carolina-Chapel Hill, 120 Mason Farm Rd, Chapel Hill, NC 27514, USA
| |
Collapse
|
132
|
Sarkar H, Srivastava A, Patro R. Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level. Bioinformatics 2019; 35:i136-i144. [PMID: 31510649 PMCID: PMC6612833 DOI: 10.1093/bioinformatics/btz351] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
SUMMARY With the advancements of high-throughput single-cell RNA-sequencing protocols, there has been a rapid increase in the tools available to perform an array of analyses on the gene expression data that results from such studies. For example, there exist methods for pseudo-time series analysis, differential cell usage, cell-type detection RNA-velocity in single cells, etc. Most analysis pipelines validate their results using known marker genes (which are not widely available for all types of analysis) and by using simulated data from gene-count-level simulators. Typically, the impact of using different read-alignment or unique molecular identifier (UMI) deduplication methods has not been widely explored. Assessments based on simulation tend to start at the level of assuming a simulated count matrix, ignoring the effect that different approaches for resolving UMI counts from the raw read data may produce. Here, we present minnow, a comprehensive sequence-level droplet-based single-cell RNA-sequencing (dscRNA-seq) experiment simulation framework. Minnow accounts for important sequence-level characteristics of experimental scRNA-seq datasets and models effects such as polymerase chain reaction amplification, cellular barcodes (CB) and UMI selection and sequence fragmentation and sequencing. It also closely matches the gene-level ambiguity characteristics that are observed in real scRNA-seq experiments. Using minnow, we explore the performance of some common processing pipelines to produce gene-by-cell count matrices from droplet-bases scRNA-seq data, demonstrate the effect that realistic levels of gene-level sequence ambiguity can have on accurate quantification and show a typical use-case of minnow in assessing the output generated by different quantification pipelines on the simulated experiment. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hirak Sarkar
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Avi Srivastava
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Rob Patro
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| |
Collapse
|