51
|
Hsiao CJ, Paulson JN, Singh S, Mongodin EF, Carroll KC, Fraser CM, Rock P, Faraday N. Nasal Microbiota and Infectious Complications After Elective Surgical Procedures. JAMA Netw Open 2021; 4:e218386. [PMID: 33914049 PMCID: PMC8085724 DOI: 10.1001/jamanetworkopen.2021.8386] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
IMPORTANCE The association of the nasal microbiome with outcomes in surgical patients is poorly understood. OBJECTIVE To characterize the composition of nasal microbiota in patients undergoing clean elective surgical procedures and to examine the association between characteristics of preoperative nasal microbiota and occurrence of postoperative infection. DESIGN, SETTING, AND PARTICIPANTS Using a nested matched case-control design, 53 individuals who developed postoperative infection were matched (approximately 3:1 by age, sex, and surgical procedure) with 144 individuals who were not infected (ie, the control group). The 2 groups were selected from a prospective cohort of patients undergoing surgical procedures at 2 tertiary care university hospitals in Baltimore, Maryland, who were at high risk for postoperative infectious complications. Included individuals were aged 40 years or older; had no history of autoimmune disease, immunocompromised state, immune-modulating medication, or active infection; and were scheduled to undergo elective cardiac, vascular, spinal, or intracranial surgical procedure. Data were analyzed from October 2015 through September 2020. EXPOSURES Nasal microbiome cluster class served as the main exposure. An unsupervised clustering method (ie, grades of membership modeling) was used to classify nasal microbial samples into 2 groups based on features derived from 16S ribosomal RNA gene sequencing. The microbiome cluster groups were derived independently and agnostic of baseline clinical characteristics and infection status. MAIN OUTCOMES AND MEASURES Composite of surgical site infection, bacteremia, and pneumonia occurring within 6 months after surgical procedure. RESULTS Among 197 participants (mean [SD] age, 64.1 [10.6] years; 63 [37.7%] women), 553 bacterial taxa were identified from preoperative nasal swab samples. A 2-cluster model (with 167 patients in cluster 1 and 30 patients in cluster 2) accounted for the largest proportion of variance in microbial profiles using grades of membership modeling and was most parsimonious. After adjusting for potential confounders, the probability of assignment to cluster 2 was associated with 6-fold higher odds of infection after surgical procedure (odds ratio [OR], 6.18; 95% CI, 3.33-11.7; P < .001) independent of baseline clinical characteristics, including nasal carriage of Staphylococcus aureus. Intrasample (ie, α) diversity was inversely associated with infectious outcome in both clusters (OR, 0.57; 95% CI, 0.42-0.75; P < .001); however, probability of assignment to cluster 2 was associated with higher odds of infection independent of α diversity (OR, 4.61; 95% CI, 2.78-7.86; P < .001). CONCLUSIONS AND RELEVANCE These findings suggest that the nasal microbiome was an independent risk factor associated with infectious outcomes among individuals who underwent elective surgical procedures and may serve as a biomarker associated with infection susceptibility in this population.
Collapse
Affiliation(s)
| | - Joseph N. Paulson
- Product Development Biostatistics, Genentech, South San Francisco, California
| | - Sarabdeep Singh
- Center for Drug Evaluation and Research, Food and Drug Administration, White Oak, Maryland
| | - Emmanuel F. Mongodin
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore
- Lung Biology and Disease Program, Division of Lung Diseases, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland
| | - Karen C. Carroll
- Division of Medical Microbiology, Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Claire M. Fraser
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore
| | - Peter Rock
- Department of Anesthesiology, University of Maryland School of Medicine, Baltimore
| | - Nauder Faraday
- Department of Anesthesiology and Critical Care Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| |
Collapse
|
52
|
Bielecki P, Riesenfeld SJ, Hütter JC, Triglia ET, Kowalczyk MS, Ricardo-Gonzalez RR, Lian M, Vesely MCA, Kroehling L, Xu H, Slyper M, Muus C, Ludwig LS, Christian E, Tao L, Kedaigle AJ, Steach HR, York AG, Skadow MH, Yaghoubi P, Dionne D, Jarret A, McGee HM, Porter CBM, Licona-Limón P, Bailis W, Jackson R, Gagliani N, Gasteiger G, Locksley RM, Regev A, Flavell RA. Skin-resident innate lymphoid cells converge on a pathogenic effector state. Nature 2021; 592:128-132. [PMID: 33536623 PMCID: PMC8336632 DOI: 10.1038/s41586-021-03188-w] [Citation(s) in RCA: 93] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Accepted: 12/24/2020] [Indexed: 01/30/2023]
Abstract
Tissue-resident innate lymphoid cells (ILCs) help sustain barrier function and respond to local signals. ILCs are traditionally classified as ILC1, ILC2 or ILC3 on the basis of their expression of specific transcription factors and cytokines1. In the skin, disease-specific production of ILC3-associated cytokines interleukin (IL)-17 and IL-22 in response to IL-23 signalling contributes to dermal inflammation in psoriasis. However, it is not known whether this response is initiated by pre-committed ILCs or by cell-state transitions. Here we show that the induction of psoriasis in mice by IL-23 or imiquimod reconfigures a spectrum of skin ILCs, which converge on a pathogenic ILC3-like state. Tissue-resident ILCs were necessary and sufficient, in the absence of circulatory ILCs, to drive pathology. Single-cell RNA-sequencing (scRNA-seq) profiles of skin ILCs along a time course of psoriatic inflammation formed a dense transcriptional continuum-even at steady state-reflecting fluid ILC states, including a naive or quiescent-like state and an ILC2 effector state. Upon disease induction, the continuum shifted rapidly to span a mixed, ILC3-like subset also expressing cytokines characteristic of ILC2s, which we inferred as arising through multiple trajectories. We confirmed the transition potential of quiescent-like and ILC2 states using in vitro experiments, single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) and in vivo fate mapping. Our results highlight the range and flexibility of skin ILC responses, suggesting that immune activities primed in healthy tissues dynamically adapt to provocations and, left unchecked, drive pathological remodelling.
Collapse
Affiliation(s)
- Piotr Bielecki
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT, USA. .,Celsius Therapeutics, Cambridge, MA, USA.
| | - Samantha J. Riesenfeld
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge MA 02142, USA, Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL 60637, USA, Department of Medicine, University of Chicago, Chicago, IL 60637, USA, Correspondence to: A.R , R.A.F , P.B , and S.J.R
| | - Jan-Christian Hütter
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge MA 02142, USA
| | - Elena Torlai Triglia
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge MA 02142, USA
| | - Monika S. Kowalczyk
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge MA 02142, USA
| | - Roberto R. Ricardo-Gonzalez
- Department of Dermatology, University of California San Francisco, San Francisco, CA 94115, USA, Department of Medicine, Sandler Asthma Research Center University of California San Francisco, San Francisco, CA, USA
| | - Mi Lian
- Würzburg Institute of Systems Immunology, Max Planck Research Group at the Julius-Maximilians-Universität Würzburg, Germany, Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Maria C. Amezcua Vesely
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA, Howard Hughes Medical Institute
| | - Lina Kroehling
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Hao Xu
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Michal Slyper
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge MA 02142, USA
| | - Christoph Muus
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge MA 02142, USA, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA
| | - Leif S. Ludwig
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge MA 02142, USA
| | - Elena Christian
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge MA 02142, USA
| | - Liming Tao
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge MA 02142, USA
| | - Amanda J. Kedaigle
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge MA 02142, USA
| | - Holly R. Steach
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Autumn G. York
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Mathias H. Skadow
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Parastou Yaghoubi
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Danielle Dionne
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge MA 02142, USA
| | - Abigail Jarret
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Heather M. McGee
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA, NOMIS Center for Immunobiology and Microbial Pathogenesis, Salk Institute for Biological Sciences, La Jolla, CA 92037, USA
| | - Caroline B. M. Porter
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge MA 02142, USA
| | - Paula Licona-Limón
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA, Departamento de Biología Celular y del Desarrollo, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, México City 04510
| | - Will Bailis
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA, Division of Protective Immunity, Children’s Hospital of Philadelphia, 19104, Philadelphia, PA, USA., Department of Pathology and Laboratory Medicine, University of Pennsylvania, 19104, Philadelphia, PA, USA
| | - Ruaidhrí Jackson
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Nicola Gagliani
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA, Department of General, Visceral and Thoracic Surgery, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany, Department of Medicine, University Medical Center Hamburg-Eppendorf Hamburg-Eppendorf, 20246 Hamburg, Germany, Immunology and Allergy Unit, Department of Medicine, Solna, Karolinska Institute and University Hospital, 17176 Stockholm, Sweden
| | - Georg Gasteiger
- Würzburg Institute of Systems Immunology, Max Planck Research Group at the Julius-Maximilians-Universität Würzburg, Germany
| | - Richard M. Locksley
- Department of Medicine, Sandler Asthma Research Center University of California San Francisco, San Francisco, CA, USA, Howard Hughes Medical Institute
| | - Aviv Regev
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA. .,Howard Hughes Medical Institute, Chevy Chase, MD, USA. .,Koch Institute of Integrative Cancer Research, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA. .,Genentech, South San Francisco, CA, USA.
| | - Richard A. Flavell
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA, Howard Hughes Medical Institute, Correspondence to: A.R , R.A.F , P.B , and S.J.R
| |
Collapse
|
53
|
|
54
|
Vu TD, Iwasaki Y, Oshima K, Chiu MT, Nikaido M, Okada N. A unique neurogenomic state emerges after aggressive confrontations in males of the fish Betta splendens. Gene 2021; 784:145601. [PMID: 33766705 DOI: 10.1016/j.gene.2021.145601] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 03/01/2021] [Accepted: 03/17/2021] [Indexed: 12/13/2022]
Abstract
Territorial defense involves frequent aggressive confrontations with competitors, but little is known about how brain-transcriptomic profiles change between individuals competing for territory establishment. Our previous study elucidated that when two fish Betta splendens males interact, transcriptomes across their brains synchronize in a way that reflects a mutual assessment process between them at the gene expression level. Here we aim to evaluate how the brain-transcriptomic profiles of opponents change immediately after shifting their social status (i.e., the winner/loser has emerged) and 30 min after this shift. We showed that changes in the expression of certain genes are unique to different fighting stages and the expression patterns of certain genes are transiently or persistently changed across all fighting stages. These brain transcriptomic responses are in accordance with behavioral changes across the fight. Strikingly, the specificity of the brain-transcriptomic synchronization of a pair during fighting was gradually lost after fighting ceased, leading to the emergence of a basal neurogenomic state in which the changes in gene expression were reduced to minimum and consistent across all individuals. This state shares common characteristics with the hibernation state that animals adopt to minimize their metabolic rates to save energy. Interestingly, expression changes for genes related to metabolism, autism spectrum disorder, and long-term memory still differentiated losers from winners. Together, the fighting system using male B. splendens provides a promising platform for investigating neurogenomic states of aggression in vertebrates.
Collapse
Affiliation(s)
- Trieu-Duc Vu
- School of Pharmacy, Kitasato University, Tokyo, Japan; School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan; Department of Life Sciences, National Cheng Kung University, Tainan, Taiwan
| | - Yuki Iwasaki
- Nagahama Institute of Bio-Science and Technology, Nagahama, Japan
| | | | - Ming-Tzu Chiu
- Department of Life Sciences, National Cheng Kung University, Tainan, Taiwan
| | - Masato Nikaido
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
| | - Norihiro Okada
- School of Pharmacy, Kitasato University, Tokyo, Japan; Department of Life Sciences, National Cheng Kung University, Tainan, Taiwan; Nagahama Institute of Bio-Science and Technology, Nagahama, Japan.
| |
Collapse
|
55
|
White AE, Dey KK, Stephens M, Price TD. Dispersal syndromes drive the formation of biogeographical regions, illustrated by the case of Wallace's Line. GLOBAL ECOLOGY AND BIOGEOGRAPHY : A JOURNAL OF MACROECOLOGY 2021; 30:685-696. [PMID: 33776580 PMCID: PMC7986858 DOI: 10.1111/geb.13250] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Revised: 11/10/2020] [Accepted: 12/01/2020] [Indexed: 06/07/2023]
Abstract
AIM Biogeographical regions (realms) reflect patterns of co-distributed species (biotas) across space. Their boundaries are set by dispersal barriers and difficulties of establishment in new locations. We extend new methods to assess these two contributions by quantifying the degree to which realms intergrade across geographical space and the contributions of individual species to the delineation of those realms. As our example, we focus on Wallace's Line, the most enigmatic partitioning of the world's faunas, where climate is thought to have little effect and the majority of dispersal barriers are short water gaps. LOCATION Indo-Pacific. TIME PERIOD Present day. MAJOR TAXA STUDIED Birds and mammals. METHODS Terrestrial bird and mammal assemblages were established in 1-degree map cells using range maps. Assemblage structure was modelled using latent Dirichlet allocation, a continuous clustering method that simultaneously establishes the likely partitioning of species into biotas and the contribution of biotas to each map cell. Phylogenetic trees were used to assess the contribution of deep historical processes. Spatial segregation between biotas was evaluated across time and space in comparison with numerous hard realm boundaries drawn by various workers. RESULTS We demonstrate that the strong turnover between biotas coincides with the north-western extent of the region not connected to the mainland during the Pleistocene, although the Philippines contains mixed contributions. At deeper taxonomic levels, Sulawesi and the Philippines shift to primarily Asian affinities, resulting from transgressions of a few Asian-derived lineages across the line. The partitioning of biotas sometimes produces fragmented regions that reflect habitat. Differences in partitions between birds and mammals reflect differences in dispersal ability. MAIN CONCLUSIONS Permanent water barriers have selected for a dispersive archipelago fauna, excluded by an incumbent continental fauna on the Sunda shelf. Deep history, such as plate movements, is relatively unimportant in setting boundaries. The analysis implies a temporally dynamic interaction between a species' intrinsic dispersal ability, physiographic barriers, and recent climate change in the genesis of Earth's biotas.
Collapse
Affiliation(s)
- Alexander E. White
- Office of the Chief Information OfficerSmithsonian InstitutionWashingtonDCUSA
- Department of BotanyNational Museum of Natural HistorySmithsonian InstitutionWashingtonDCUSA
- Department of Ecology and EvolutionUniversity of ChicagoChicagoILUSA
| | - Kushal K. Dey
- Department of EpidemiologyHarvard T. H. Chan School of Public HealthBostonMAUSA
- Department of StatisticsUniversity of ChicagoChicagoILUSA
| | | | - Trevor D. Price
- Department of Ecology and EvolutionUniversity of ChicagoChicagoILUSA
| |
Collapse
|
56
|
Verifying explainability of a deep learning tissue classifier trained on RNA-seq data. Sci Rep 2021; 11:2641. [PMID: 33514769 PMCID: PMC7846764 DOI: 10.1038/s41598-021-81773-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 01/11/2021] [Indexed: 12/16/2022] Open
Abstract
For complex machine learning (ML) algorithms to gain widespread acceptance in decision making, we must be able to identify the features driving the predictions. Explainability models allow transparency of ML algorithms, however their reliability within high-dimensional data is unclear. To test the reliability of the explainability model SHapley Additive exPlanations (SHAP), we developed a convolutional neural network to predict tissue classification from Genotype-Tissue Expression (GTEx) RNA-seq data representing 16,651 samples from 47 tissues. Our classifier achieved an average F1 score of 96.1% on held-out GTEx samples. Using SHAP values, we identified the 2423 most discriminatory genes, of which 98.6% were also identified by differential expression analysis across all tissues. The SHAP genes reflected expected biological processes involved in tissue differentiation and function. Moreover, SHAP genes clustered tissue types with superior performance when compared to all genes, genes detected by differential expression analysis, or random genes. We demonstrate the utility and reliability of SHAP to explain a deep learning model and highlight the strengths of applying ML to transcriptome data.
Collapse
|
57
|
Valle F, Osella M, Caselle M. A Topic Modeling Analysis of TCGA Breast and Lung Cancer Transcriptomic Data. Cancers (Basel) 2020; 12:E3799. [PMID: 33339347 PMCID: PMC7766023 DOI: 10.3390/cancers12123799] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 12/07/2020] [Accepted: 12/11/2020] [Indexed: 01/18/2023] Open
Abstract
Topic modeling is a widely used technique to extract relevant information from large arrays of data. The problem of finding a topic structure in a dataset was recently recognized to be analogous to the community detection problem in network theory. Leveraging on this analogy, a new class of topic modeling strategies has been introduced to overcome some of the limitations of classical methods. This paper applies these recent ideas to TCGA transcriptomic data on breast and lung cancer. The established cancer subtype organization is well reconstructed in the inferred latent topic structure. Moreover, we identify specific topics that are enriched in genes known to play a role in the corresponding disease and are strongly related to the survival probability of patients. Finally, we show that a simple neural network classifier operating in the low dimensional topic space is able to predict with high accuracy the cancer subtype of a test expression sample.
Collapse
Affiliation(s)
- Filippo Valle
- Physics Department, University of Turin and INFN, via P. Giuria 1, 10125 Turin, Italy; (M.O.); (M.C.)
| | | | | |
Collapse
|
58
|
Prager BC, Vasudevan HN, Dixit D, Bernatchez JA, Wu Q, Wallace LC, Bhargava S, Lee D, King BH, Morton AR, Gimple RC, Pekmezci M, Zhu Z, Siqueira-Neto JL, Wang X, Xie Q, Chen C, Barnett GH, Vogelbaum MA, Mack SC, Chavez L, Perry A, Raleigh DR, Rich JN. The Meningioma Enhancer Landscape Delineates Novel Subgroups and Drives Druggable Dependencies. Cancer Discov 2020; 10:1722-1741. [PMID: 32703768 PMCID: PMC8194360 DOI: 10.1158/2159-8290.cd-20-0160] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2020] [Revised: 06/06/2020] [Accepted: 07/20/2020] [Indexed: 01/05/2023]
Abstract
Meningiomas are the most common primary intracranial tumor with current classification offering limited therapeutic guidance. Here, we interrogated meningioma enhancer landscapes from 33 tumors to stratify patients based upon prognosis and identify novel meningioma-specific dependencies. Enhancers robustly stratified meningiomas into three biologically distinct groups (adipogenesis/cholesterol, mesodermal, and neural crest) distinguished by distinct hormonal lineage transcriptional regulators. Meningioma landscapes clustered with intrinsic brain tumors and hormonally responsive systemic cancers with meningioma subgroups, reflecting progesterone or androgen hormonal signaling. Enhancer classification identified a subset of tumors with poor prognosis, irrespective of histologic grading. Superenhancer signatures predicted drug dependencies with superior in vitro efficacy to treatment based upon the NF2 genomic profile. Inhibition of DUSP1, a novel and druggable meningioma target, impaired tumor growth in vivo. Collectively, epigenetic landscapes empower meningioma classification and identification of novel therapies. SIGNIFICANCE: Enhancer landscapes inform prognostic classification of aggressive meningiomas, identifying tumors at high risk of recurrence, and reveal previously unknown therapeutic targets. Druggable dependencies discovered through epigenetic profiling potentially guide treatment of intractable meningiomas.This article is highlighted in the In This Issue feature, p. 1611.
Collapse
Affiliation(s)
- Briana C Prager
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Sanford Consortium for Regenerative Medicine, La Jolla, California
- Cleveland Clinic Lerner College of Medicine, Cleveland Clinic, Cleveland, Ohio
- Case Western Reserve University Medical Scientist Training Program, Case Western Reserve University School of Medicine, Cleveland, Ohio
| | - Harish N Vasudevan
- Department of Radiation Oncology, University of California, San Francisco, San Francisco, California
| | - Deobrat Dixit
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Sanford Consortium for Regenerative Medicine, La Jolla, California
| | - Jean A Bernatchez
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California
- Center for Discovery and Innovation in Parasitic Diseases, University of California, San Diego, La Jolla, California
| | - Qiulian Wu
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Sanford Consortium for Regenerative Medicine, La Jolla, California
| | - Lisa C Wallace
- Department of Biomedical Engineering, Cleveland Clinic, Cleveland, Ohio
| | - Shruti Bhargava
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Sanford Consortium for Regenerative Medicine, La Jolla, California
| | - Derrick Lee
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Sanford Consortium for Regenerative Medicine, La Jolla, California
- University of California San Diego School of Medicine, University of California, San Diego, La Jolla, California
| | - Bradley H King
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Sanford Consortium for Regenerative Medicine, La Jolla, California
- University of California San Diego School of Medicine, University of California, San Diego, La Jolla, California
| | - Andrew R Morton
- Case Western Reserve University Medical Scientist Training Program, Case Western Reserve University School of Medicine, Cleveland, Ohio
| | - Ryan C Gimple
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Sanford Consortium for Regenerative Medicine, La Jolla, California
- Case Western Reserve University Medical Scientist Training Program, Case Western Reserve University School of Medicine, Cleveland, Ohio
| | - Melike Pekmezci
- Department of Pathology, University of California, San Francisco, San Francisco, California
| | - Zhe Zhu
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
| | - Jair L Siqueira-Neto
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California
- Center for Discovery and Innovation in Parasitic Diseases, University of California, San Diego, La Jolla, California
| | - Xiuxing Wang
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Sanford Consortium for Regenerative Medicine, La Jolla, California
- School of Basic Medical Sciences, Nanjing Medical University, Nanjing, China
| | - Qi Xie
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Sanford Consortium for Regenerative Medicine, La Jolla, California
- Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Westlake University, Hangzhou, China
| | - Clark Chen
- Department of Neurosurgery, University of Minnesota, Minneapolis, Minnesota
| | - Gene H Barnett
- Department of Neurosurgery, Cleveland Clinic, Cleveland, Ohio
- Cleveland Clinic Lerner College of Medicine, Cleveland Clinic, Cleveland, Ohio
| | - Michael A Vogelbaum
- Department of Neurosurgery, University of Minnesota, Minneapolis, Minnesota
- Department of NeuroOncology, Moffitt Cancer Center, Tampa, Florida
| | | | - Lukas Chavez
- Department of Medicine, University of California, San Diego, San Diego, California
| | - Arie Perry
- Department of Pathology, University of California, San Francisco, San Francisco, California
| | - David R Raleigh
- Department of Radiation Oncology, University of California, San Francisco, San Francisco, California.
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, California
| | - Jeremy N Rich
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California.
- Sanford Consortium for Regenerative Medicine, La Jolla, California
- Department of Neurosciences, University of California, San Diego, La Jolla, California
| |
Collapse
|
59
|
Boezio GL, Bensimon-Brito A, Piesker J, Guenther S, Helker CS, Stainier DY. Endothelial TGF-β signaling instructs smooth muscle cell development in the cardiac outflow tract. eLife 2020; 9:57603. [PMID: 32990594 PMCID: PMC7524555 DOI: 10.7554/elife.57603] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 09/09/2020] [Indexed: 12/14/2022] Open
Abstract
The development of the cardiac outflow tract (OFT), which connects the heart to the great arteries, relies on a complex crosstalk between endothelial (ECs) and smooth muscle (SMCs) cells. Defects in OFT development can lead to severe malformations, including aortic aneurysms, which are frequently associated with impaired TGF-β signaling. To better understand the role of TGF-β signaling in OFT formation, we generated zebrafish lacking the TGF-β receptor Alk5 and found a strikingly specific dilation of the OFT: alk5-/- OFTs exhibit increased EC numbers as well as extracellular matrix (ECM) and SMC disorganization. Surprisingly, endothelial-specific alk5 overexpression in alk5-/- rescues the EC, ECM, and SMC defects. Transcriptomic analyses reveal downregulation of the ECM gene fibulin-5, which when overexpressed in ECs ameliorates OFT morphology and function. These findings reveal a new requirement for endothelial TGF-β signaling in OFT morphogenesis and suggest an important role for the endothelium in the etiology of aortic malformations.
Collapse
Affiliation(s)
- Giulia Lm Boezio
- Department of Developmental Genetics, Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Anabela Bensimon-Brito
- Department of Developmental Genetics, Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Janett Piesker
- Scientific Service Group Microscopy, Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Stefan Guenther
- Bioinformatics and Deep Sequencing Platform, Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Christian Sm Helker
- Department of Developmental Genetics, Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Didier Yr Stainier
- Department of Developmental Genetics, Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| |
Collapse
|
60
|
Palla G, Ferrero E. Latent Factor Modeling of scRNA-Seq Data Uncovers Dysregulated Pathways in Autoimmune Disease Patients. iScience 2020; 23:101451. [PMID: 32853994 PMCID: PMC7452208 DOI: 10.1016/j.isci.2020.101451] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 05/28/2020] [Accepted: 08/10/2020] [Indexed: 11/10/2022] Open
Abstract
Latent factor modeling applied to single-cell RNA sequencing (scRNA-seq) data is a useful approach to discover gene signatures. However, it is often unclear what methods are best suited for specific tasks and how latent factors should be interpreted. Here, we compare four state-of-the-art methods and propose an approach to assign derived latent factors to pathway activities and specific cell subsets. By applying this framework to scRNA-seq datasets from biopsies of patients with rheumatoid arthritis and systemic lupus erythematosus, we discover disease-relevant gene signatures in specific cellular subsets. In rheumatoid arthritis, we identify an inflammatory OSMR signaling signature active in a subset of synovial fibroblasts and an efferocytic signature in a subset of synovial monocytes. Overall, we provide insights into latent factors models for the analysis of scRNA-seq data, develop a framework to identify cell subtypes in a phenotype-driven way, and use it to identify novel pathways dysregulated in rheumatoid arthritis.
Collapse
Affiliation(s)
- Giovanni Palla
- Autoimmunity Transplantation and Inflammation Bioinformatics, Novartis Institutes for BioMedical Research, Novartis Campus, Basel 4056, Switzerland
| | - Enrico Ferrero
- Autoimmunity Transplantation and Inflammation Bioinformatics, Novartis Institutes for BioMedical Research, Novartis Campus, Basel 4056, Switzerland
| |
Collapse
|
61
|
Tercan B, Acar AC. The Use of Informed Priors in Biclustering of Gene Expression with the Hierarchical Dirichlet Process. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1810-1821. [PMID: 30835228 DOI: 10.1109/tcbb.2019.2901676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We motivate and describe the application of Hierarchical Dirichlet Process (HDP) models to the "soft" biclustering of gene expression data, in which we obtain modules (biclusters) where the affiliation of genes and samples with the modules are weighted, instead of being hard memberships. As a distinct contribution, we propose a method which HDP is informed with prior beliefs, significantly increasing the quality of the biclustering in terms of both the correctness of the number of modules inferred, and the precision of these modules, especially when evidence is sparse. We outline two such informed priors; one based on co-expression relationships inherent in the data, the other based on an externally provided regulatory network. We validate these results and compare the performance of our approach to Weighted Gene Correlation Network Analysis (WGCNA), another model that features weighted modules. We have, to this end, performed experiments on semi-synthetic data. The results show that HDP, with the addition of a well-informed prior, is able to capture the correct number of modules with increased accuracy. Furthermore, the model becomes robust to changes in the strength of the prior. We conclude by discussing these results and the benefits provided by our approach for gene expression analysis and network validation.
Collapse
|
62
|
Machado FB, Moharana KC, Almeida-Silva F, Gazara RK, Pedrosa-Silva F, Coelho FS, Grativol C, Venancio TM. Systematic analysis of 1298 RNA-Seq samples and construction of a comprehensive soybean (Glycine max) expression atlas. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 103:1894-1909. [PMID: 32445587 DOI: 10.1111/tpj.14850] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Revised: 04/15/2020] [Accepted: 05/06/2020] [Indexed: 05/23/2023]
Abstract
Soybean (Glycine max [L.] Merr.) is a major crop in animal feed and human nutrition, mainly for its rich protein and oil contents. The remarkable rise in soybean transcriptome studies over the past 5 years generated an enormous amount of RNA-seq data, encompassing various tissues, developmental conditions and genotypes. In this study, we have collected data from 1298 publicly available soybean transcriptome samples, processed the raw sequencing reads and mapped them to the soybean reference genome in a systematic fashion. We found that 94% of the annotated genes (52 737/56 044) had detectable expression in at least one sample. Unsupervised clustering revealed three major groups, comprising samples from aerial, underground and seed/seed-related parts. We found 452 genes with uniform and constant expression levels, supporting their roles as housekeeping genes. On the other hand, 1349 genes showed heavily biased expression patterns towards particular tissues. A transcript-level analysis revealed that 95% (70 963 of 74 490) of the assembled transcripts have intron chains exactly matching those from known transcripts, whereas 3256 assembled transcripts represent potentially novel splicing isoforms. The dataset compiled here constitute a new resource for the community, which can be downloaded or accessed through a user-friendly web interface at http://venanciogroup.uenf.br/resources/. This comprehensive transcriptome atlas will likely accelerate research on soybean genetics and genomics.
Collapse
Affiliation(s)
- Fabricio B Machado
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Kanhu C Moharana
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Fabricio Almeida-Silva
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Rajesh K Gazara
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Francisnei Pedrosa-Silva
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Fernanda S Coelho
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Clícia Grativol
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Thiago M Venancio
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| |
Collapse
|
63
|
Recursive Consensus Clustering for novel subtype discovery from transcriptome data. Sci Rep 2020; 10:11005. [PMID: 32620805 PMCID: PMC7335086 DOI: 10.1038/s41598-020-67016-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 06/02/2020] [Indexed: 12/05/2022] Open
Abstract
Large-scale transcriptomic data is used by biologists for the discovery of new molecular patterns or cell subpopulations. Clustering is one of the most popular methods for dimensionality reduction and data analysis for large scale datasets. The major problem while clustering the data is the selection of the optimal number of clusters (k) for each dataset and to discover new insights from it. We have developed Recursive Consensus Clustering (RCC), an unsupervised clustering algorithm for novel subtype discovery from both bulk and single-cell datasets. RCC is available as an R package and facilitates the generation of new biological insights through intuitive visualization of clustering results.
Collapse
|
64
|
Behavioral and brain- transcriptomic synchronization between the two opponents of a fighting pair of the fish Betta splendens. PLoS Genet 2020; 16:e1008831. [PMID: 32555673 PMCID: PMC7299326 DOI: 10.1371/journal.pgen.1008831] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 05/05/2020] [Indexed: 01/13/2023] Open
Abstract
Conspecific male animals fight for resources such as food and mating opportunities but typically stop fighting after assessing their relative fighting abilities to avoid serious injuries. Physiologically, how the fighting behavior is controlled remains unknown. Using the fighting fish Betta splendens, we studied behavioral and brain-transcriptomic changes during the fight between the two opponents. At the behavioral level, surface-breathing, and biting/striking occurred only during intervals between mouth-locking. Eventually, the behaviors of the two opponents became synchronized, with each pair showing a unique behavioral pattern. At the physiological level, we examined the expression patterns of 23,306 brain transcripts using RNA-sequencing data from brains of fighting pairs after a 20-min (D20) and a 60-min (D60) fight. The two opponents in each D60 fighting pair showed a strong gene expression correlation, whereas those in D20 fighting pairs showed a weak correlation. Moreover, each fighting pair in the D60 group showed pair-specific gene expression patterns in a grade of membership analysis (GoM) and were grouped as a pair in the heatmap clustering. The observed pair-specific individualization in brain-transcriptomic synchronization (PIBS) suggested that this synchronization provides a physiological basis for the behavioral synchronization. An analysis using the synchronized genes in fighting pairs of the D60 group found genes enriched for ion transport, synaptic function, and learning and memory. Brain-transcriptomic synchronization could be a general phenomenon and may provide a new cornerstone with which to investigate coordinating and sustaining social interactions between two interacting partners of vertebrates. Agonistic encounters induce changes in the brain and behavior, but their underlying molecular mechanisms remain poorly understood. The fighting fish Betta splendens are small freshwater fish that are well known for their aggressiveness and are widely used to study aggression. Here, by measuring aggressive behavior displays (bite/strike/surface-breathing) between two opponents during fighting, we demonstrate that the two opponents in each fighting pair showed similar fighting configurations by influencing each other. In addition, we compared brain gene expression between opponents and showed synchronization of gene expression within a fighting pair, leading to pair-specific synchronization in genes associated with ion transport, synapse function, and learning and memory. This study presents the possibility that similar behaviors in pairs of animals under similar conditions may trigger synchronizing waves of transcription between the individuals, providing a hint to support the idea that fighting behaviors contain cooperative aspects at the molecular level.
Collapse
|
65
|
Sarode P, Zheng X, Giotopoulou GA, Weigert A, Kuenne C, Günther S, Friedrich A, Gattenlöhner S, Stiewe T, Brüne B, Grimminger F, Stathopoulos GT, Pullamsetti SS, Seeger W, Savai R. Reprogramming of tumor-associated macrophages by targeting β-catenin/FOSL2/ARID5A signaling: A potential treatment of lung cancer. SCIENCE ADVANCES 2020; 6:eaaz6105. [PMID: 32548260 PMCID: PMC7274802 DOI: 10.1126/sciadv.aaz6105] [Citation(s) in RCA: 100] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Accepted: 03/27/2020] [Indexed: 05/03/2023]
Abstract
Tumor-associated macrophages (TAMs) influence lung tumor development by inducing immunosuppression. Transcriptome analysis of TAMs isolated from human lung tumor tissues revealed an up-regulation of the Wnt/β-catenin pathway. These findings were reproduced in a newly developed in vitro "trained" TAM model. Pharmacological and macrophage-specific genetic ablation of β-catenin reprogrammed M2-like TAMs to M1-like TAMs both in vitro and in various in vivo models, which was linked with the suppression of primary and metastatic lung tumor growth. An in-depth analysis of the underlying signaling events revealed that β-catenin-mediated transcriptional activation of FOS-like antigen 2 (FOSL2) and repression of the AT-rich interaction domain 5A (ARID5A) drive gene regulatory switch from M1-like TAMs to M2-like TAMs. Moreover, we found that high expressions of β-catenin and FOSL2 correlated with poor prognosis in patients with lung cancer. In conclusion, β-catenin drives a transcriptional switch in the lung tumor microenvironment, thereby promoting tumor progression and metastasis.
Collapse
Affiliation(s)
- Poonam Sarode
- Max Planck Institute for Heart and Lung Research, Member of the German Center for Lung Research (DZL), Member of the Cardio-Pulmonary Institute (CPI), Bad Nauheim 61231, Germany
| | - Xiang Zheng
- Max Planck Institute for Heart and Lung Research, Member of the German Center for Lung Research (DZL), Member of the Cardio-Pulmonary Institute (CPI), Bad Nauheim 61231, Germany
| | - Georgia A. Giotopoulou
- Laboratory for Molecular Respiratory Carcinogenesis, Department of Physiology, Faculty of Medicine, University of Patras, Rio, 26504, Greece and Lung Carcinogenesis Laboratory, Comprehensive Pneumology Center (CPC) and Institute for Lung Biology and Disease (iLBD), University Hospital, Ludwig-Maximilians University and Helmholtz Center Munich, Member of the German Center for Lung Research (DZL), Munich 81377, Germany
| | - Andreas Weigert
- Institute of Biochemistry I, Faculty of Medicine, Goethe University Frankfurt, Frankfurt 60323, Germany
| | - Carste Kuenne
- Max Planck Institute for Heart and Lung Research, Member of the German Center for Lung Research (DZL), Member of the Cardio-Pulmonary Institute (CPI), Bad Nauheim 61231, Germany
| | - Stefan Günther
- Max Planck Institute for Heart and Lung Research, Member of the German Center for Lung Research (DZL), Member of the Cardio-Pulmonary Institute (CPI), Bad Nauheim 61231, Germany
| | - Aleksandra Friedrich
- Max Planck Institute for Heart and Lung Research, Member of the German Center for Lung Research (DZL), Member of the Cardio-Pulmonary Institute (CPI), Bad Nauheim 61231, Germany
| | - Stefan Gattenlöhner
- Department of Pathology, Member of the DZL, Justus Liebig University, Giessen 35390, Germany
| | - Thorsten Stiewe
- Institute of Molecular Oncology, Philipps-University Marburg, Member of the DZL, Marburg 35043, Germany
| | - Bernhard Brüne
- Institute of Biochemistry I, Faculty of Medicine, Goethe University Frankfurt, Frankfurt 60323, Germany
- Frankfurt Cancer Institute (FCI), Goethe University, 60596 Frankfurt am Main, Germany
| | - Friedrich Grimminger
- Department of Internal Medicine, Member of the DZL, Member of CPI, Justus Liebig University, 35392 Giessen, Germany
| | - Georgios T. Stathopoulos
- Laboratory for Molecular Respiratory Carcinogenesis, Department of Physiology, Faculty of Medicine, University of Patras, Rio, 26504, Greece and Lung Carcinogenesis Laboratory, Comprehensive Pneumology Center (CPC) and Institute for Lung Biology and Disease (iLBD), University Hospital, Ludwig-Maximilians University and Helmholtz Center Munich, Member of the German Center for Lung Research (DZL), Munich 81377, Germany
| | - Soni Savai Pullamsetti
- Max Planck Institute for Heart and Lung Research, Member of the German Center for Lung Research (DZL), Member of the Cardio-Pulmonary Institute (CPI), Bad Nauheim 61231, Germany
- Department of Internal Medicine, Member of the DZL, Member of CPI, Justus Liebig University, 35392 Giessen, Germany
| | - Werner Seeger
- Max Planck Institute for Heart and Lung Research, Member of the German Center for Lung Research (DZL), Member of the Cardio-Pulmonary Institute (CPI), Bad Nauheim 61231, Germany
- Department of Internal Medicine, Member of the DZL, Member of CPI, Justus Liebig University, 35392 Giessen, Germany
- Institute for Lung Health (ILH), Justus Liebig University, 35392 Giessen, Germany
| | - Rajkumar Savai
- Max Planck Institute for Heart and Lung Research, Member of the German Center for Lung Research (DZL), Member of the Cardio-Pulmonary Institute (CPI), Bad Nauheim 61231, Germany
- Frankfurt Cancer Institute (FCI), Goethe University, 60596 Frankfurt am Main, Germany
- Department of Internal Medicine, Member of the DZL, Member of CPI, Justus Liebig University, 35392 Giessen, Germany
- Institute for Lung Health (ILH), Justus Liebig University, 35392 Giessen, Germany
- Corresponding author.
| |
Collapse
|
66
|
Characterizing and inferring quantitative cell cycle phase in single-cell RNA-seq data analysis. Genome Res 2020; 30:611-621. [PMID: 32312741 PMCID: PMC7197478 DOI: 10.1101/gr.247759.118] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2019] [Accepted: 04/02/2020] [Indexed: 11/25/2022]
Abstract
Cellular heterogeneity in gene expression is driven by cellular processes, such as cell cycle and cell-type identity, and cellular environment such as spatial location. The cell cycle, in particular, is thought to be a key driver of cell-to-cell heterogeneity in gene expression, even in otherwise homogeneous cell populations. Recent advances in single-cell RNA-sequencing (scRNA-seq) facilitate detailed characterization of gene expression heterogeneity and can thus shed new light on the processes driving heterogeneity. Here, we combined fluorescence imaging with scRNA-seq to measure cell cycle phase and gene expression levels in human induced pluripotent stem cells (iPSCs). By using these data, we developed a novel approach to characterize cell cycle progression. Although standard methods assign cells to discrete cell cycle stages, our method goes beyond this and quantifies cell cycle progression on a continuum. We found that, on average, scRNA-seq data from only five genes predicted a cell's position on the cell cycle continuum to within 14% of the entire cycle and that using more genes did not improve this accuracy. Our data and predictor of cell cycle phase can directly help future studies to account for cell cycle-related heterogeneity in iPSCs. Our results and methods also provide a foundation for future work to characterize the effects of the cell cycle on expression heterogeneity in other cell types.
Collapse
|
67
|
Xu H, Ding J, Porter CBM, Wallrapp A, Tabaka M, Ma S, Fu S, Guo X, Riesenfeld SJ, Su C, Dionne D, Nguyen LT, Lefkovith A, Ashenberg O, Burkett PR, Shi HN, Rozenblatt-Rosen O, Graham DB, Kuchroo VK, Regev A, Xavier RJ. Transcriptional Atlas of Intestinal Immune Cells Reveals that Neuropeptide α-CGRP Modulates Group 2 Innate Lymphoid Cell Responses. Immunity 2020; 51:696-708.e9. [PMID: 31618654 DOI: 10.1016/j.immuni.2019.09.004] [Citation(s) in RCA: 140] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 08/06/2019] [Accepted: 09/06/2019] [Indexed: 12/17/2022]
Abstract
Signaling abnormalities in immune responses in the small intestine can trigger chronic type 2 inflammation involving interaction of multiple immune cell types. To systematically characterize this response, we analyzed 58,067 immune cells from the mouse small intestine by single-cell RNA sequencing (scRNA-seq) at steady state and after induction of a type 2 inflammatory reaction to ovalbumin (OVA). Computational analysis revealed broad shifts in both cell-type composition and cell programs in response to the inflammation, especially in group 2 innate lymphoid cells (ILC2s). Inflammation induced the expression of exon 5 of Calca, which encodes the alpha-calcitonin gene-related peptide (α-CGRP), in intestinal KLRG1+ ILC2s. α-CGRP antagonized KLRG1+ ILC2s proliferation but promoted IL-5 expression. Genetic perturbation of α-CGRP increased the proportion of intestinal KLRG1+ ILC2s. Our work highlights a model where α-CGRP-mediated neuronal signaling is critical for suppressing ILC2 expansion and maintaining homeostasis of the type 2 immune machinery.
Collapse
Affiliation(s)
- Heping Xu
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou 310024, Zhejiang Province, China; Laboratory of Systems Immunology, Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou 310024, Zhejiang Province, China.
| | - Jiarui Ding
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | - Antonia Wallrapp
- Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women's Hospital, Boston, MA 02114, USA
| | - Marcin Tabaka
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Sai Ma
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Shujie Fu
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou 310024, Zhejiang Province, China; Laboratory of Systems Immunology, Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou 310024, Zhejiang Province, China
| | - Xuanxuan Guo
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou 310024, Zhejiang Province, China; Laboratory of Systems Immunology, Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou 310024, Zhejiang Province, China
| | | | - Chienwen Su
- Mucosal Immunology and Biology Research Center, Massachusetts General Hospital and Harvard Medical School, Charlestown, MA 02129, USA
| | - Danielle Dionne
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Lan T Nguyen
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ariel Lefkovith
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Orr Ashenberg
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Patrick R Burkett
- Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women's Hospital, Boston, MA 02114, USA
| | - Hai Ning Shi
- Mucosal Immunology and Biology Research Center, Massachusetts General Hospital and Harvard Medical School, Charlestown, MA 02129, USA
| | | | - Daniel B Graham
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Molecular Biology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA; Gastrointestinal Unit and Center for the Study of Inflammatory Bowel Disease, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA; Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, MA 02114, USA
| | - Vijay K Kuchroo
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women's Hospital, Boston, MA 02114, USA
| | - Aviv Regev
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Howard Hughes Medical Institute and Koch Institute for Integrative Cancer Research, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02142, USA.
| | - Ramnik J Xavier
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Molecular Biology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA; Gastrointestinal Unit and Center for the Study of Inflammatory Bowel Disease, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA; Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, MA 02114, USA; Center for Computational and Integrative Biology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA.
| |
Collapse
|
68
|
Al-Asadi H, Dey KK, Novembre J, Stephens M. Inference and visualization of DNA damage patterns using a grade of membership model. Bioinformatics 2020; 35:1292-1298. [PMID: 30192911 DOI: 10.1093/bioinformatics/bty779] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Revised: 08/11/2018] [Accepted: 09/04/2018] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION Quality control plays a major role in the analysis of ancient DNA (aDNA). One key step in this quality control is assessment of DNA damage: aDNA contains unique signatures of DNA damage that distinguish it from modern DNA, and so analyses of damage patterns can help confirm that DNA sequences obtained are from endogenous aDNA rather than from modern contamination. Predominant signatures of DNA damage include a high frequency of cytosine to thymine substitutions (C-to-T) at the ends of fragments, and elevated rates of purines (A & G) before the 5' strand-breaks. Existing QC procedures help assess damage by simply plotting for each sample, the C-to-T mismatch rate along the read and the composition of bases before the 5' strand-breaks. Here we present a more flexible and comprehensive model-based approach to infer and visualize damage patterns in aDNA, implemented in an R package aRchaic. This approach is based on a 'grade of membership' model (also known as 'admixture' or 'topic' model) in which each sample has an estimated grade of membership in each of K damage profiles that are estimated from the data. RESULTS We illustrate aRchaic on data from several aDNA studies and modern individuals from 1000 Genomes Project Consortium (2012). Here, aRchaic clearly distinguishes modern from ancient samples irrespective of DNA extraction, lab and sequencing protocols. Additionally, through an in-silico contamination experiment, we show that the aRchaic grades of membership reflect relative levels of exogenous modern contamination. Together, the outputs of aRchaic provide a concise visual summary of DNA damage patterns, as well as other processes generating mismatches in the data. AVAILABILITY AND IMPLEMENTATION aRchaic is available for download from https://www.github.com/kkdey/aRchaic. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hussein Al-Asadi
- Committee on Evolutionary Biology, University of Chicago, Chicago, IL, USA.,Department of Statistics, University of Chicago, Chicago, IL, USA
| | - Kushal K Dey
- Department of Statistics, University of Chicago, Chicago, IL, USA
| | - John Novembre
- Committee on Evolutionary Biology, University of Chicago, Chicago, IL, USA.,Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Matthew Stephens
- Department of Statistics, University of Chicago, Chicago, IL, USA.,Department of Human Genetics, University of Chicago, Chicago, IL, USA
| |
Collapse
|
69
|
Geddes TA, Kim T, Nan L, Burchfield JG, Yang JYH, Tao D, Yang P. Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis. BMC Bioinformatics 2019; 20:660. [PMID: 31870278 PMCID: PMC6929272 DOI: 10.1186/s12859-019-3179-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Accepted: 10/28/2019] [Indexed: 01/23/2023] Open
Abstract
Background Single-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data analysis is the identification of cell types from complex samples or tissues profiled in an experiment. To this end, clustering has become a key computational technique for grouping cells based on their transcriptome profiles, enabling subsequent cell type identification from each cluster of cells. Due to the high feature-dimensionality of the transcriptome (i.e. the large number of measured genes in each cell) and because only a small fraction of genes are cell type-specific and therefore informative for generating cell type-specific clusters, clustering directly on the original feature/gene dimension may lead to uninformative clusters and hinder correct cell type identification. Results Here, we propose an autoencoder-based cluster ensemble framework in which we first take random subspace projections from the data, then compress each random projection to a low-dimensional space using an autoencoder artificial neural network, and finally apply ensemble clustering across all encoded datasets to generate clusters of cells. We employ four evaluation metrics to benchmark clustering performance and our experiments demonstrate that the proposed autoencoder-based cluster ensemble can lead to substantially improved cell type-specific clusters when applied with both the standard k-means clustering algorithm and a state-of-the-art kernel-based clustering algorithm (SIMLR) designed specifically for scRNA-seq data. Compared to directly using these clustering algorithms on the original datasets, the performance improvement in some cases is up to 100%, depending on the evaluation metric used. Conclusions Our results suggest that the proposed framework can facilitate more accurate cell type identification as well as other downstream analyses. The code for creating the proposed autoencoder-based cluster ensemble framework is freely available from https://github.com/gedcom/scCCESS
Collapse
Affiliation(s)
- Thomas A Geddes
- Charles Perkins Centre, School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Sydney, NSW 2006, Australia.,Charles Perkins Centre, School of Life and Environmental Sciences, Faculty of Science, The University of Sydney, Sydney, NSW 2006, Australia
| | - Taiyun Kim
- Charles Perkins Centre, School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Sydney, NSW 2006, Australia
| | - Lihao Nan
- UBTECH Sydney Artificial Intelligence Centre and the School of Computer Science, Faculty of Engineering and Information Technologies, The University of Sydney, Sydney, NSW 2006, Australia
| | - James G Burchfield
- Charles Perkins Centre, School of Life and Environmental Sciences, Faculty of Science, The University of Sydney, Sydney, NSW 2006, Australia
| | - Jean Y H Yang
- Charles Perkins Centre, School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Sydney, NSW 2006, Australia
| | - Dacheng Tao
- UBTECH Sydney Artificial Intelligence Centre and the School of Computer Science, Faculty of Engineering and Information Technologies, The University of Sydney, Sydney, NSW 2006, Australia
| | - Pengyi Yang
- Charles Perkins Centre, School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Sydney, NSW 2006, Australia. .,Computational Systems Biology Group, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2145, Australia.
| |
Collapse
|
70
|
Sommeria-Klein G, Zinger L, Coissac E, Iribar A, Schimann H, Taberlet P, Chave J. Latent Dirichlet Allocation reveals spatial and taxonomic structure in a DNA-based census of soil biodiversity from a tropical forest. Mol Ecol Resour 2019; 20:371-386. [PMID: 31650682 DOI: 10.1111/1755-0998.13109] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 10/03/2019] [Accepted: 10/22/2019] [Indexed: 11/29/2022]
Abstract
High-throughput sequencing of amplicons from environmental DNA samples permits rapid, standardized and comprehensive biodiversity assessments. However, retrieving and interpreting the structure of such data sets requires efficient methods for dimensionality reduction. Latent Dirichlet Allocation (LDA) can be used to decompose environmental DNA samples into overlapping assemblages of co-occurring taxa. It is a flexible model-based method adapted to uneven sample sizes and to large and sparse data sets. Here, we compare LDA performance on abundance and occurrence data, and we quantify the robustness of the LDA decomposition by measuring its stability with respect to the algorithm's initialization. We then apply LDA to a survey of 1,131 soil DNA samples that were collected in a 12-ha plot of primary tropical forest and amplified using standard primers for bacteria, protists, fungi and metazoans. The analysis reveals that bacteria, protists and fungi exhibit a strong spatial structure, which matches the topographical features of the plot, while metazoans do not, confirming that microbial diversity is primarily controlled by environmental variation at the studied scale. We conclude that LDA is a sensitive, robust and computationally efficient method to detect and interpret the structure of large DNA-based biodiversity data sets. We finally discuss the possible future applications of this approach for the study of biodiversity.
Collapse
Affiliation(s)
- Guilhem Sommeria-Klein
- Laboratoire Evolution et Diversité Biologique (EDB, UMR 5174), CNRS, IRD, Université Toulouse 3 Paul Sabatier, Toulouse, France.,Institut de Biologie de l'ENS (IBENS, UMR 8197), Ecole Normale Supérieure, CNRS, Inserm, PSL Research University, Paris, France
| | - Lucie Zinger
- Laboratoire Evolution et Diversité Biologique (EDB, UMR 5174), CNRS, IRD, Université Toulouse 3 Paul Sabatier, Toulouse, France.,Institut de Biologie de l'ENS (IBENS, UMR 8197), Ecole Normale Supérieure, CNRS, Inserm, PSL Research University, Paris, France
| | - Eric Coissac
- Laboratoire d'Ecologie Alpine (LECA, UMR 5553), Université Grenoble Alpes, CNRS, Université Savoie Mont Blanc, Grenoble, France
| | - Amaia Iribar
- Laboratoire Evolution et Diversité Biologique (EDB, UMR 5174), CNRS, IRD, Université Toulouse 3 Paul Sabatier, Toulouse, France
| | - Heidy Schimann
- Laboratoire d'Ecologie des Forêts de Guyane (EcoFoG, UMR 745), INRA, AgroParisTech, CIRAD, CNRS, University of the French West Indies, University of French Guiana, Kourou, France
| | - Pierre Taberlet
- Laboratoire d'Ecologie Alpine (LECA, UMR 5553), Université Grenoble Alpes, CNRS, Université Savoie Mont Blanc, Grenoble, France
| | - Jérôme Chave
- Laboratoire Evolution et Diversité Biologique (EDB, UMR 5174), CNRS, IRD, Université Toulouse 3 Paul Sabatier, Toulouse, France
| |
Collapse
|
71
|
Marco-Puche G, Lois S, Benítez J, Trivino JC. RNA-Seq Perspectives to Improve Clinical Diagnosis. Front Genet 2019; 10:1152. [PMID: 31781178 PMCID: PMC6861419 DOI: 10.3389/fgene.2019.01152] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Accepted: 10/22/2019] [Indexed: 01/22/2023] Open
Abstract
In recent years, high-throughput next-generation sequencing technology has allowed a rapid increase in diagnostic capacity and precision through different bioinformatics processing algorithms, tools, and pipelines. The identification, annotation, and classification of sequence variants within different target regions are now considered a gold standard in clinical genetic diagnosis. However, this procedure lacks the ability to link regulatory events such as differential splicing to diseases. RNA-seq is necessary in clinical routine in order to interpret and detect among others splicing events and splicing variants, as it would increase the diagnostic rate by up to 10-35%. The transcriptome has a very dynamic nature, varying according to tissue type, cellular conditions, and environmental factors that may affect regulatory events such as splicing and the expression of genes or their isoforms. RNA-seq offers a robust technical analysis of this complexity, but it requires a profound knowledge of computational/statistical tools that may need to be adjusted depending on the disease under study. In this article we will cover RNA-seq analyses best practices applied to clinical routine, bioinformatics procedures, and present challenges of this approach.
Collapse
Affiliation(s)
| | - Sergio Lois
- Bioinformatics Group, Sistemas Genómicos, Paterna, Spain
| | - Javier Benítez
- Human Genetics Group, Spanish National Cancer Research Center, Madrid, Spain
| | | |
Collapse
|
72
|
McKennan C, Nicolae D. Accounting for unobserved covariates with varying degrees of estimability in high-dimensional biological data. Biometrika 2019; 106:823-840. [PMID: 31754283 DOI: 10.1093/biomet/asz037] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Indexed: 12/18/2022] Open
Abstract
An important phenomenon in high-throughput biological data is the presence of unobserved covariates that can have a significant impact on the measured response. When these covariates are also correlated with the covariate of interest, ignoring or improperly estimating them can lead to inaccurate estimates of and spurious inference on the corresponding coefficients of interest in a multivariate linear model. We first prove that existing methods to account for these unobserved covariates often inflate Type I error for the null hypothesis that a given coefficient of interest is zero. We then provide alternative estimators for the coefficients of interest that correct the inflation, and prove that our estimators are asymptotically equivalent to the ordinary least squares estimators obtained when every covariate is observed. Lastly, we use previously published DNA methylation data to show that our method can more accurately estimate the direct effect of asthma on DNA methylation levels compared to existing methods, the latter of which likely fail to recover and account for latent cell type heterogeneity.
Collapse
Affiliation(s)
- Chris McKennan
- Department of Statistics, University of Chicago, 5747 S. Ellis Avenue, Chicago, Illinois, U.S.A
| | - Dan Nicolae
- Department of Statistics, University of Chicago, 5747 S. Ellis Avenue, Chicago, Illinois, U.S.A
| |
Collapse
|
73
|
Son JH, Kohlbrenner T, Heinze S, Beukeboom LW, Bopp D, Meisel RP. Minimal Effects of Proto- Y Chromosomes on House Fly Gene Expression in Spite of Evidence that Selection Maintains Stable Polygenic Sex Determination. Genetics 2019; 213:313-327. [PMID: 31315889 PMCID: PMC6727804 DOI: 10.1534/genetics.119.302441] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 07/10/2019] [Indexed: 02/05/2023] Open
Abstract
Sex determination, the developmental process by which organismal sex is established, evolves fast, often due to changes in the master regulators at the top of the pathway. Additionally, in species with polygenic sex determination, multiple different master regulators segregate as polymorphisms. Understanding the forces that maintain polygenic sex determination can be informative of the factors that drive the evolution of sex determination. The house fly, Musca domestica, is a well-suited model to those ends because natural populations harbor male-determining loci on each of the six chromosomes and a biallelic female determiner. To investigate how natural selection maintains polygenic sex determination in the house fly, we assayed the phenotypic effects of proto-Y chromosomes by performing mRNA-sequencing experiments to measure gene expression in house fly males carrying different proto-Y chromosomes. We find that the proto-Y chromosomes have similar effects as a nonsex-determining autosome. In addition, we created sex-reversed males without any proto-Y chromosomes and they had nearly identical gene expression profiles as genotypic males. Therefore, the proto-Y chromosomes have a minor effect on male gene expression, consistent with previously described minimal X-Y sequence differences. Despite these minimal differences, we find evidence for a disproportionate effect of one proto-Y chromosome on male-biased expression, which could be partially responsible for fitness differences between males with different proto-Y chromosome genotypes. Therefore our results suggest that, if natural selection maintains polygenic sex determination in house fly via gene expression differences, the phenotypes under selection likely depend on a small number of genetic targets.
Collapse
Affiliation(s)
- Jae Hak Son
- Department of Biology and Biochemistry, University of Houston, Texas 77204-5001
| | - Tea Kohlbrenner
- Institute of Molecular Life Sciences, University of Zurich, Switzerland CH-8057
| | - Svenia Heinze
- Institute of Molecular Life Sciences, University of Zurich, Switzerland CH-8057
| | - Leo W Beukeboom
- Groningen Institute for Evolutionary Life Sciences, University of Groningen, The Netherlands 9700
| | - Daniel Bopp
- Institute of Molecular Life Sciences, University of Zurich, Switzerland CH-8057
| | - Richard P Meisel
- Department of Biology and Biochemistry, University of Houston, Texas 77204-5001
| |
Collapse
|
74
|
Roberts RM, Ezashi T, Sheridan MA, Yang Y. Specification of trophoblast from embryonic stem cells exposed to BMP4. Biol Reprod 2019; 99:212-224. [PMID: 29579154 DOI: 10.1093/biolre/ioy070] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Accepted: 03/21/2018] [Indexed: 01/16/2023] Open
Abstract
Trophoblast (TB) comprises the outer cell layers of the mammalian placenta that make direct contact with the maternal uterus and, in species with a highly invasive placenta, maternal blood. It has its origin as trophectoderm, a single epithelial layer of extra-embryonic ectoderm that surrounds the embryo proper at the blastocyst stage of development. Here, we briefly compare the features of TB specification and determination in the mouse and the human. We then review research on a model system that has been increasingly employed to study TB emergence, namely the BMP4 (bone morphogenetic protein-4)-directed differentiation of human embryonic stem cells (ESCd), and discuss why outcomes using it have proved so uneven. We also examine the controversial aspects of this model, particularly the issue of whether or not the ESCd represents TB at all. Our focus here has been to explore similarities and potential differences between the phenotypes of ESCd, trophectoderm, placental villous TB, and human TB stem cells. We then explore the role of BMP4 in the differentiation of human pluripotent cells to TB and suggest that it converts the ESC into a totipotent state that is primed for TB differentiation when self-renewal is blocked. Finally we speculate that the TB formed from ESC is homologous to the trophectoderm-derived, invasive TB that envelopes the implanting conceptus during the second week of pregnancy.
Collapse
Affiliation(s)
- R Michael Roberts
- Bond Life Sciences Center, University of Missouri, Columbia, Missouri, USA.,Department of Biochemistry, University of Missouri, Columbia, Missouri, USA.,Division of Animal Sciences, University of Missouri, Columbia, Missouri, USA
| | - Toshihiko Ezashi
- Bond Life Sciences Center, University of Missouri, Columbia, Missouri, USA.,Division of Animal Sciences, University of Missouri, Columbia, Missouri, USA
| | - Megan A Sheridan
- Bond Life Sciences Center, University of Missouri, Columbia, Missouri, USA.,Department of Biochemistry, University of Missouri, Columbia, Missouri, USA
| | - Ying Yang
- Department of Molecular Pharmacology and Physiology, University of South Florida, Tampa, Florida, USA
| |
Collapse
|
75
|
White AE, Dey KK, Mohan D, Stephens M, Price TD. Regional influences on community structure across the tropical-temperate divide. Nat Commun 2019; 10:2646. [PMID: 31201312 PMCID: PMC6570764 DOI: 10.1038/s41467-019-10253-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2019] [Accepted: 04/29/2019] [Indexed: 12/02/2022] Open
Abstract
Many models to explain the differences in the flora and fauna of tropical and temperate regions assume that whole clades are restricted to the tropics. We develop methods to assess the extent to which biotas are geographically discrete, and find that transition zones between regions occupied by tropical-associated or temperate-associated biotas are often narrow, suggesting a role for freezing temperatures in partitioning global biotas. Across the steepest tropical-temperate gradient in the world, that of the Himalaya, bird communities below and above the freezing line are largely populated by different tropical and temperate biotas with links to India and Southeast Asia, or to China respectively. The importance of the freezing line is retained when clades rather than species are considered, reflecting confinement of different clades to one or another climate zone. The reality of the sharp tropical-temperate boundary adds credence to the argument that exceptional species richness in the tropics reflects species accumulation over time, with limited transgressions of species and clades into the temperate.
Collapse
Affiliation(s)
- Alexander E White
- Department of Ecology and Evolution, University of Chicago, 1101 E 57th Street, Chicago, IL, 60637, USA.
- Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, 600 Maryland Avenue SW, Washington, 20024, DC, USA.
- National Museum of Natural History, Smithsonian Institution, MRC 166 PO Box 37012, Washington, DC, 20013, USA.
| | - Kushal K Dey
- Department of Statistics, University of Chicago, 5747 S Ellis Avenue, Chicago, IL, 60637, USA
- Department of Epidemiology, Harvard University, 665 Huntington Avenue, Cambridge, MA, 02115, USA
| | - Dhananjai Mohan
- Wildlife Institute of India, PO Box 18, Chandrabani, Dehradun, 248001, India
| | - Matthew Stephens
- Department of Statistics, University of Chicago, 5747 S Ellis Avenue, Chicago, IL, 60637, USA
- Department of Human Genetics, University of Chicago, 920 E 58th Street, Chicago, IL, 60637, USA
| | - Trevor D Price
- Department of Ecology and Evolution, University of Chicago, 1101 E 57th Street, Chicago, IL, 60637, USA
| |
Collapse
|
76
|
Wang M, Fischer J, Song YS. THREE-WAY CLUSTERING OF MULTI-TISSUE MULTI-INDIVIDUAL GENE EXPRESSION DATA USING SEMI-NONNEGATIVE TENSOR DECOMPOSITION. Ann Appl Stat 2019; 13:1103-1127. [PMID: 33381253 PMCID: PMC7771883 DOI: 10.1214/18-aoas1228] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
The advent of high-throughput sequencing technologies has led to an increasing availability of large multi-tissue data sets which contain gene expression measurements across different tissues and individuals. In this setting, variation in expression levels arises due to contributions specific to genes, tissues, individuals, and interactions thereof. Classical clustering methods are ill-suited to explore these three-way interactions and struggle to fully extract the insights into transcriptome complexity contained in the data. We propose a new statistical method, called MultiCluster, based on semi-nonnegative tensor decomposition which permits the investigation of transcriptome variation across individuals and tissues simultaneously. We further develop a tensor projection procedure which detects covariate-related genes with high power, demonstrating the advantage of tensor-based methods in incorporating information across similar tissues. Through simulation and application to the GTEx RNA-seq data from 53 human tissues, we show that MultiCluster identifies three-way interactions with high accuracy and robustness.
Collapse
Affiliation(s)
- Miaoyan Wang
- University of Wisconsin, Madison and University of California, Berkeley
| | - Jonathan Fischer
- University of Wisconsin, Madison and University of California, Berkeley
| | - Yun S Song
- University of Wisconsin, Madison and University of California, Berkeley
| |
Collapse
|
77
|
Duan B, Zhou C, Zhu C, Yu Y, Li G, Zhang S, Zhang C, Ye X, Ma H, Qu S, Zhang Z, Wang P, Sun S, Liu Q. Model-based understanding of single-cell CRISPR screening. Nat Commun 2019; 10:2233. [PMID: 31110232 PMCID: PMC6527552 DOI: 10.1038/s41467-019-10216-x] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 04/30/2019] [Indexed: 12/26/2022] Open
Abstract
The recently developed single-cell CRISPR screening techniques, independently termed Perturb-Seq, CRISP-seq, or CROP-seq, combine pooled CRISPR screening with single-cell RNA-seq to investigate functional CRISPR screening in a single-cell granularity. Here, we present MUSIC, an integrated pipeline for model-based understanding of single-cell CRISPR screening data. Comprehensive tests applied to all the publicly available data revealed that MUSIC accurately quantifies and prioritizes the individual gene perturbation effect on cell phenotypes with tolerance for the substantial noise that exists in such data analysis. MUSIC facilitates the single-cell CRISPR screening from three perspectives, i.e., prioritizing the gene perturbation effect as an overall perturbation effect, in a functional topic-specific way, and quantifying the relationships between different perturbations. In summary, MUSIC provides an effective and applicable solution to elucidate perturbation function and biologic circuits by a model-based quantitative analysis of single-cell-based CRISPR screening data.
Collapse
Affiliation(s)
- Bin Duan
- Department of Endocrinology and Metabolism, Shanghai Tenth People's Hospital, Bioinformatics Department, College of Life Science, Tongji University, Shanghai, China
- Department of Ophthalmology, Ninghai First Hospital, Ninghai, Zhejiang, China
| | - Chi Zhou
- Department of Endocrinology and Metabolism, Shanghai Tenth People's Hospital, Bioinformatics Department, College of Life Science, Tongji University, Shanghai, China
| | - Chengyu Zhu
- Department of Endocrinology and Metabolism, Shanghai Tenth People's Hospital, Bioinformatics Department, College of Life Science, Tongji University, Shanghai, China
| | - Yifei Yu
- Department of Endocrinology and Metabolism, Shanghai Tenth People's Hospital, Bioinformatics Department, College of Life Science, Tongji University, Shanghai, China
| | - Gaoyang Li
- Tongji University Cancer Center, Shanghai Tenth People's Hospital of Tongji University, Shanghai, China
- School of Medicine Tongji University, Shanghai, China
| | - Shihua Zhang
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Beijing, China
| | - Chao Zhang
- Department of Endocrinology and Metabolism, Shanghai Tenth People's Hospital, Bioinformatics Department, College of Life Science, Tongji University, Shanghai, China
| | - Xiangyun Ye
- Shanghai Chest Hospital Shanghai Jiaotong University, Shanghai, China
| | - Hanhui Ma
- School of Life Science and Technology ShanghaiTech University, Shanghai, China
| | - Shen Qu
- Department of Endocrinology and Metabolism, Shanghai Tenth People's Hospital, Bioinformatics Department, College of Life Science, Tongji University, Shanghai, China
| | - Zhiyuan Zhang
- Department of Oral and Maxillofacial-Head Neck Oncology, Shanghai Ninth People's Hospital, College of Stomatology, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Ping Wang
- Tongji University Cancer Center, Shanghai Tenth People's Hospital of Tongji University, Shanghai, China.
- School of Medicine Tongji University, Shanghai, China.
| | - Shuyang Sun
- Department of Oral and Maxillofacial-Head Neck Oncology, Shanghai Ninth People's Hospital, College of Stomatology, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| | - Qi Liu
- Department of Endocrinology and Metabolism, Shanghai Tenth People's Hospital, Bioinformatics Department, College of Life Science, Tongji University, Shanghai, China.
- Department of Ophthalmology, Ninghai First Hospital, Ninghai, Zhejiang, China.
| |
Collapse
|
78
|
Liang L, Chen V, Zhu K, Fan X, Lu X, Lu S. Integrating data and knowledge to identify functional modules of genes: a multilayer approach. BMC Bioinformatics 2019; 20:225. [PMID: 31046665 PMCID: PMC6498600 DOI: 10.1186/s12859-019-2800-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Accepted: 04/09/2019] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Characterizing the modular structure of cellular network is an important way to identify novel genes for targeted therapeutics. This is made possible by the rising of high-throughput technology. Unfortunately, computational methods to identify functional modules were limited by the data quality issues of high-throughput techniques. This study aims to integrate knowledge extracted from literature to further improve the accuracy of functional module identification. RESULTS Our new model and algorithm were applied to both yeast and human interactomes. Predicted functional modules have covered over 90% of the proteins in both organisms, while maintaining a comparable overall accuracy. We found that the combination of both mRNA expression information and biomedical knowledge greatly improved the performance of functional module identification, which is better than those only using protein interaction network weighted with transcriptomic data, literature knowledge, or simply unweighted protein interaction network. Our new algorithm also achieved better performance when comparing with some other well-known methods, especially in terms of the positive predictive value (PPV), which indicated the confidence of novel discovery. CONCLUSION Higher PPV with the multiplex approach suggested that information from both sources has been effectively integrated to reduce false positive. With protein coverage higher than 90%, our algorithm is able to generate more novel biological hypothesis with higher confidence.
Collapse
Affiliation(s)
- Lifan Liang
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Vicky Chen
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
- Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc, Frederick, USA
| | - Kunju Zhu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
- Clinical Medicine Research Institute, Jinan University, Guangzhou, 51063, Guangdong, China
| | - Xiaonan Fan
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, Shanxi, China
| | - Xinghua Lu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Songjian Lu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
79
|
Zhu X, Stephens M. Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes. Nat Commun 2018; 9:4361. [PMID: 30341297 PMCID: PMC6195536 DOI: 10.1038/s41467-018-06805-x] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 09/29/2018] [Indexed: 12/27/2022] Open
Abstract
Genome-wide association studies (GWAS) aim to identify genetic factors associated with phenotypes. Standard analyses test variants for associations individually. However, variant-level associations are hard to identify and can be difficult to interpret biologically. Enrichment analyses help address both problems by targeting sets of biologically related variants. Here we introduce a new model-based enrichment method that requires only GWAS summary statistics. Applying this method to interrogate 4,026 gene sets in 31 human phenotypes identifies many previously-unreported enrichments, including enrichments of endochondral ossification pathway for height, NFAT-dependent transcription pathway for rheumatoid arthritis, brain-related genes for coronary artery disease, and liver-related genes for Alzheimer’s disease. A key feature of our method is that inferred enrichments automatically help identify new trait-associated genes. For example, accounting for enrichment in lipid transport genes highlights association between MTTP and low-density lipoprotein levels, whereas conventional analyses of the same data found no significant variants near this gene. In genome-wide association studies, variant-level associations are hard to identify and can be difficult to interpret biologically. Here, the authors develop a new model-based enrichment analysis method, and apply it to identify new associated genes, pathways and tissues across 31 human phenotypes.
Collapse
Affiliation(s)
- Xiang Zhu
- Department of Statistics, Stanford University, Stanford, 94305, CA, USA. .,Department of Statistics, The University of Chicago, Chicago, 60637, IL, USA.
| | - Matthew Stephens
- Department of Statistics, The University of Chicago, Chicago, 60637, IL, USA. .,Department of Human Genetics, The University of Chicago, Chicago, 60637, IL, USA.
| |
Collapse
|
80
|
Stein-O'Brien GL, Arora R, Culhane AC, Favorov AV, Garmire LX, Greene CS, Goff LA, Li Y, Ngom A, Ochs MF, Xu Y, Fertig EJ. Enter the Matrix: Factorization Uncovers Knowledge from Omics. Trends Genet 2018; 34:790-805. [PMID: 30143323 PMCID: PMC6309559 DOI: 10.1016/j.tig.2018.07.003] [Citation(s) in RCA: 100] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 06/01/2018] [Accepted: 07/16/2018] [Indexed: 12/20/2022]
Abstract
Omics data contain signals from the molecular, physical, and kinetic inter- and intracellular interactions that control biological systems. Matrix factorization (MF) techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in applications ranging from pathway discovery to timecourse analysis. We review exemplary applications of MF for systems-level analyses. We discuss appropriate applications of these methods, their limitations, and focus on the analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with MF enables discovery from high-throughput data beyond the limits of current biological knowledge - answering questions from high-dimensional data that we have not yet thought to ask.
Collapse
Affiliation(s)
- Genevieve L Stein-O'Brien
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA; Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Raman Arora
- Department of Computer Science, Institute for Data Intensive Engineering and Science, Johns Hopkins University, Baltimore, MD, USA
| | - Aedin C Culhane
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - Alexander V Favorov
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA; Vavilov Institute of General Genetics, Moscow, Russia
| | | | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, PA, USA; Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, PA, USA
| | - Loyal A Goff
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Yifeng Li
- Digital Technologies Research Centre, National Research Council of Canada, Ottawa, ON, Canada
| | - Aloune Ngom
- School of Computer Science, University of Windsor, Windsor, ON, Canada
| | - Michael F Ochs
- Department of Mathematics and Statistics, The College of New Jersey, Ewing, NJ, USA
| | - Yanxun Xu
- Department of Applied Mathematics and Statistics, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Elana J Fertig
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
81
|
Freytag S, Tian L, Lönnstedt I, Ng M, Bahlo M. Comparison of clustering tools in R for medium-sized 10x Genomics single-cell RNA-sequencing data. F1000Res 2018; 7:1297. [PMID: 30228881 PMCID: PMC6124389 DOI: 10.12688/f1000research.15809.1] [Citation(s) in RCA: 99] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/07/2018] [Indexed: 01/21/2023] Open
Abstract
Background: The commercially available 10x Genomics protocol to generate droplet-based single-cell RNA-seq (scRNA-seq) data is enjoying growing popularity among researchers. Fundamental to the analysis of such scRNA-seq data is the ability to cluster similar or same cells into non-overlapping groups. Many competing methods have been proposed for this task, but there is currently little guidance with regards to which method to use. Methods: Here we use one gold standard 10x Genomics dataset, generated from the mixture of three cell lines, as well as three silver standard 10x Genomics datasets generated from peripheral blood mononuclear cells to examine not only the accuracy but also robustness of a dozen methods. Results: We found that some methods, including Seurat and Cell Ranger, outperform other methods, although performance seems to be dependent on the complexity of the studied system. Furthermore, we found that solutions produced by different methods have little in common with each other. Conclusions: In light of this, we conclude that the choice of clustering tool crucially determines interpretation of scRNA-seq data generated by 10x Genomics. Hence practitioners and consumers should remain vigilant about the outcome of 10x Genomics scRNA-seq analysis.
Collapse
Affiliation(s)
- Saskia Freytag
- Population Health and Immunity, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
- Department of Medical Biology, University of Melbourne, Parkville, Australia
| | - Luyi Tian
- Department of Medical Biology, University of Melbourne, Parkville, Australia
- Molecular Medicine Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
| | | | - Milica Ng
- Bio21 Insititute, CSL Limited, Parkville, Australia
| | - Melanie Bahlo
- Population Health and Immunity, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
- Department of Medical Biology, University of Melbourne, Parkville, Australia
| |
Collapse
|
82
|
Freytag S, Tian L, Lönnstedt I, Ng M, Bahlo M. Comparison of clustering tools in R for medium-sized 10x Genomics single-cell RNA-sequencing data. F1000Res 2018; 7:1297. [PMID: 30228881 PMCID: PMC6124389 DOI: 10.12688/f1000research.15809.2] [Citation(s) in RCA: 76] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/14/2018] [Indexed: 12/23/2022] Open
Abstract
Background: The commercially available 10x Genomics protocol to generate droplet-based single cell RNA-seq (scRNA-seq) data is enjoying growing popularity among researchers. Fundamental to the analysis of such scRNA-seq data is the ability to cluster similar or same cells into non-overlapping groups. Many competing methods have been proposed for this task, but there is currently little guidance with regards to which method to use. Methods: Here we use one gold standard 10x Genomics dataset, generated from the mixture of three cell lines, as well as multiple silver standard 10x Genomics datasets generated from peripheral blood mononuclear cells to examine not only the accuracy but also running time and robustness of a dozen methods. Results: We found that Seurat outperformed other methods, although performance seems to be dependent on many factors, including the complexity of the studied system. Furthermore, we found that solutions produced by different methods have little in common with each other. Conclusions: In light of this we conclude that the choice of clustering tool crucially determines interpretation of scRNA-seq data generated by 10x Genomics. Hence practitioners and consumers should remain vigilant about the outcome of 10x Genomics scRNA-seq analysis.
Collapse
Affiliation(s)
- Saskia Freytag
- Population Health and Immunity, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
- Department of Medical Biology, University of Melbourne, Parkville, Australia
| | - Luyi Tian
- Department of Medical Biology, University of Melbourne, Parkville, Australia
- Molecular Medicine Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
| | | | - Milica Ng
- Bio21 Insititute, CSL Limited, Parkville, Australia
| | - Melanie Bahlo
- Population Health and Immunity, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
- Department of Medical Biology, University of Melbourne, Parkville, Australia
| |
Collapse
|
83
|
Hon CC, Shin JW, Carninci P, Stubbington MJT. The Human Cell Atlas: Technical approaches and challenges. Brief Funct Genomics 2018; 17:283-294. [PMID: 29092000 PMCID: PMC6063304 DOI: 10.1093/bfgp/elx029] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The Human Cell Atlas is a large, international consortium that aims to identify and describe every cell type in the human body. The comprehensive cellular maps that arise from this ambitious effort have the potential to transform many aspects of fundamental biology and clinical practice. Here, we discuss the technical approaches that could be used today to generate such a resource and also the technical challenges that will be encountered.
Collapse
Affiliation(s)
- Chung-Chau Hon
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | - Jay W Shin
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | - Piero Carninci
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | | |
Collapse
|
84
|
Zhang H, Lee CAA, Li Z, Garbe JR, Eide CR, Petegrosso R, Kuang R, Tolar J. A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa. PLoS Comput Biol 2018; 14:e1006053. [PMID: 29630593 PMCID: PMC5908193 DOI: 10.1371/journal.pcbi.1006053] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Revised: 04/19/2018] [Accepted: 02/21/2018] [Indexed: 12/31/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has been widely applied to discover new cell types by detecting sub-populations in a heterogeneous group of cells. Since scRNA-seq experiments have lower read coverage/tag counts and introduce more technical biases compared to bulk RNA-seq experiments, the limited number of sampled cells combined with the experimental biases and other dataset specific variations presents a challenge to cross-dataset analysis and discovery of relevant biological variations across multiple cell populations. In this paper, we introduce a method of variance-driven multitask clustering of single-cell RNA-seq data (scVDMC) that utilizes multiple single-cell populations from biological replicates or different samples. scVDMC clusters single cells in multiple scRNA-seq experiments of similar cell types and markers but varying expression patterns such that the scRNA-seq data are better integrated than typical pooled analyses which only increase the sample size. By controlling the variance among the cell clusters within each dataset and across all the datasets, scVDMC detects cell sub-populations in each individual experiment with shared cell-type markers but varying cluster centers among all the experiments. Applied to two real scRNA-seq datasets with several replicates and one large-scale droplet-based dataset on three patient samples, scVDMC more accurately detected cell populations and known cell markers than pooled clustering and other recently proposed scRNA-seq clustering methods. In the case study applied to in-house Recessive Dystrophic Epidermolysis Bullosa (RDEB) scRNA-seq data, scVDMC revealed several new cell types and unknown markers validated by flow cytometry. MATLAB/Octave code available at https://github.com/kuanglab/scVDMC. scRNA-seq enables detailed profiling of heterogeneous cell populations and can be used to reveal lineage relationships or discover new cell types. In the literature, there has been little effort directed towards developing computational methods for cross-population transcriptome analysis of multiple single-cell populations. The cross-cell-population clustering problem is different from the traditional clustering problem because single-cell populations can be collected from different patients, different samples of a tissue, or different experimental replicates. The accompanying biological and technical variation tends to dominate the signals for clustering the pooled single cells from the multiple populations. In this work, we have developed a multitask clustering method to address the cross-population clustering problem. The method simultaneously clusters each individual cell population and controls variance among the cell-type cluster centers within each cell population and across the cell populations. We demonstrate that our multitask clustering method significantly improves clustering accuracy and marker discovery in three public scRNA-seq datasets and also apply the method to an in-house Recessive Dystrophic Epidermolysis Bullosa (RDEB) dataset. Our results make it evident that multitask clustering is a promising new approach for cross-population analysis of scRNA-seq data.
Collapse
Affiliation(s)
- Huanan Zhang
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, Minnesota, United States of America
| | - Catherine A. A. Lee
- Department of Genetics, Cell Biology and Development, University of Minnesota Twin Cities, Minneapolis, Minnesota, United States of America
| | - Zhuliu Li
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, Minnesota, United States of America
| | - John R. Garbe
- Minnesota Supercomputing Institute, University of Minnesota Twin Cities, Minneapolis, Minnesota, United States of America
| | - Cindy R. Eide
- Department of Pediatrics, University of Minnesota Twin Cities, Minneapolis, Minnesota, United States of America
| | - Raphael Petegrosso
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, Minnesota, United States of America
| | - Rui Kuang
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, Minnesota, United States of America
- * E-mail: (RK); (JT)
| | - Jakub Tolar
- Department of Pediatrics, University of Minnesota Twin Cities, Minneapolis, Minnesota, United States of America
- * E-mail: (RK); (JT)
| |
Collapse
|
85
|
Correction: Visualizing the structure of RNA-seq expression data using grade of membership models. PLoS Genet 2017; 13:e1006759. [PMID: 28549067 PMCID: PMC5446108 DOI: 10.1371/journal.pgen.1006759] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|