1
|
Church SH, Mah JL, Dunn CW. Integrating phylogenies into single-cell RNA sequencing analysis allows comparisons across species, genes, and cells. PLoS Biol 2024; 22:e3002633. [PMID: 38787797 PMCID: PMC11125556 DOI: 10.1371/journal.pbio.3002633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2024] Open
Abstract
Comparisons of single-cell RNA sequencing (scRNA-seq) data across species can reveal links between cellular gene expression and the evolution of cell functions, features, and phenotypes. These comparisons evoke evolutionary histories, as depicted by phylogenetic trees, that define relationships between species, genes, and cells. This Essay considers each of these in turn, laying out challenges and solutions derived from a phylogenetic comparative approach and relating these solutions to previously proposed methods for the pairwise alignment of cellular dimensional maps. This Essay contends that species trees, gene trees, cell phylogenies, and cell lineages can all be reconciled as descriptions of the same concept-the tree of cellular life. By integrating phylogenetic approaches into scRNA-seq analyses, challenges for building informed comparisons across species can be overcome, and hypotheses about gene and cell evolution can be robustly tested.
Collapse
Affiliation(s)
- Samuel H. Church
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
| | - Jasmine L. Mah
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
| | - Casey W. Dunn
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
3
|
Paul I, Bolzan D, Youssef A, Gagnon KA, Hook H, Karemore G, Oliphant MUJ, Lin W, Liu Q, Phanse S, White C, Padhorny D, Kotelnikov S, Chen CS, Hu P, Denis GV, Kozakov D, Raught B, Siggers T, Wuchty S, Muthuswamy SK, Emili A. Parallelized multidimensional analytic framework applied to mammary epithelial cells uncovers regulatory principles in EMT. Nat Commun 2023; 14:688. [PMID: 36755019 PMCID: PMC9908882 DOI: 10.1038/s41467-023-36122-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 01/17/2023] [Indexed: 02/10/2023] Open
Abstract
A proper understanding of disease etiology will require longitudinal systems-scale reconstruction of the multitiered architecture of eukaryotic signaling. Here we combine state-of-the-art data acquisition platforms and bioinformatics tools to devise PAMAF, a workflow that simultaneously examines twelve omics modalities, i.e., protein abundance from whole-cells, nucleus, exosomes, secretome and membrane; N-glycosylation, phosphorylation; metabolites; mRNA, miRNA; and, in parallel, single-cell transcriptomes. We apply PAMAF in an established in vitro model of TGFβ-induced epithelial to mesenchymal transition (EMT) to quantify >61,000 molecules from 12 omics and 10 timepoints over 12 days. Bioinformatics analysis of this EMT-ExMap resource allowed us to identify; -topological coupling between omics, -four distinct cell states during EMT, -omics-specific kinetic paths, -stage-specific multi-omics characteristics, -distinct regulatory classes of genes, -ligand-receptor mediated intercellular crosstalk by integrating scRNAseq and subcellular proteomics, and -combinatorial drug targets (e.g., Hedgehog signaling and CAMK-II) to inhibit EMT, which we validate using a 3D mammary duct-on-a-chip platform. Overall, this study provides a resource on TGFβ signaling and EMT.
Collapse
Affiliation(s)
- Indranil Paul
- Department of Biochemistry, Boston University School of Medicine, Boston University, 71 East Concord Street, Boston, MA, 02118, USA
| | - Dante Bolzan
- Department of Computer Science, University of Miami, 1356 Memorial Drive, Coral Gables, FL, 33146, USA
| | - Ahmed Youssef
- Graduate Program in Bioinformatics, Boston University, 24 Cummington Mall, Boston, MA, 02215, USA
| | - Keith A Gagnon
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA, 02215, USA
| | - Heather Hook
- Department of Biology, Boston University, 24 Cummington Mall, Boston, MA, 02115, USA
- Biological Design Center, Boston University, 610 Commonwealth Avenue, Boston, MA, 02215, USA
| | - Gopal Karemore
- Advanced Analytics, Novo Nordisk A/S, 2760, Måløv, Denmark
| | - Michael U J Oliphant
- Cancer Research Institute, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, 02115, USA
| | - Weiwei Lin
- Department of Biochemistry, Boston University School of Medicine, Boston University, 71 East Concord Street, Boston, MA, 02118, USA
| | - Qian Liu
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, Manitoba, R3E 0J9, Canada
| | - Sadhna Phanse
- Department of Biochemistry, Boston University School of Medicine, Boston University, 71 East Concord Street, Boston, MA, 02118, USA
| | - Carl White
- Department of Biochemistry, Boston University School of Medicine, Boston University, 71 East Concord Street, Boston, MA, 02118, USA
| | - Dzmitry Padhorny
- Department of Applied Mathematics and Statistics, Stony Brook University, 11794, Stony Brook, NY, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Sergei Kotelnikov
- Department of Applied Mathematics and Statistics, Stony Brook University, 11794, Stony Brook, NY, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Christopher S Chen
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA, 02215, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, 3 Blackfan Circle, Boston, MA, 02115, USA
| | - Pingzhao Hu
- Department of Biochemistry, Western University, London, ON, N6A 5C1, Canada
| | - Gerald V Denis
- Boston Medical Center Cancer Center, Boston University, Boston University, 72 East Concord Street, Boston, MA, 02118, USA
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, 11794, Stony Brook, NY, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Brian Raught
- Discovery Tower (TMDT), 101 College St, Rm. 9-701A, University of Toronto, Toronto, ON, M5G 1L7, Canada
| | - Trevor Siggers
- Department of Biology, Boston University, 24 Cummington Mall, Boston, MA, 02115, USA
- Biological Design Center, Boston University, 610 Commonwealth Avenue, Boston, MA, 02215, USA
| | - Stefan Wuchty
- Department of Computer Science, University of Miami, 1356 Memorial Drive, Coral Gables, FL, 33146, USA
| | - Senthil K Muthuswamy
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Andrew Emili
- Department of Biochemistry, Boston University School of Medicine, Boston University, 71 East Concord Street, Boston, MA, 02118, USA.
- Department of Biology, Charles River Campus, Boston University, Life Science & Engineering (LSEB-602), 24 Cummington Mall, Boston, MA, 02215, USA.
- Division of Oncological Sciences, Knight Cancer Institute, Oregon Health and Science University, Portland, USA.
| |
Collapse
|
4
|
Reed ER, Monti S. Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data. Nucleic Acids Res 2021; 49:e98. [PMID: 34226941 PMCID: PMC8464061 DOI: 10.1093/nar/gkab552] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 06/07/2021] [Accepted: 06/18/2021] [Indexed: 12/21/2022] Open
Abstract
As high-throughput genomics assays become more efficient and cost effective, their utilization has become standard in large-scale biomedical projects. These studies are often explorative, in that relationships between samples are not explicitly defined a priori, but rather emerge from data-driven discovery and annotation of molecular subtypes, thereby informing hypotheses and independent evaluation. Here, we present K2Taxonomer, a novel unsupervised recursive partitioning algorithm and associated R package that utilize ensemble learning to identify robust subgroups in a 'taxonomy-like' structure. K2Taxonomer was devised to accommodate different data paradigms, and is suitable for the analysis of both bulk and single-cell transcriptomics, and other '-omics', data. For each of these data types, we demonstrate the power of K2Taxonomer to discover known relationships in both simulated and human tissue data. We conclude with a practical application on breast cancer tumor infiltrating lymphocyte (TIL) single-cell profiles, in which we identified co-expression of translational machinery genes as a dominant transcriptional program shared by T cells subtypes, associated with better prognosis in breast cancer tissue bulk expression data.
Collapse
Affiliation(s)
- Eric R Reed
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA 02118, USA
- Bioinformatics Program, College of Engineering, Boston University, Boston, MA 02118, USA
| | - Stefano Monti
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA 02118, USA
- Bioinformatics Program, College of Engineering, Boston University, Boston, MA 02118, USA
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
| |
Collapse
|
5
|
Lopes KDP, Campos-Laborie FJ, Vialle RA, Ortega JM, De Las Rivas J. Evolutionary hallmarks of the human proteome: chasing the age and coregulation of protein-coding genes. BMC Genomics 2016; 17:725. [PMID: 27801289 PMCID: PMC5088522 DOI: 10.1186/s12864-016-3062-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Background The development of large-scale technologies for quantitative transcriptomics has enabled comprehensive analysis of the gene expression profiles in complete genomes. RNA-Seq allows the measurement of gene expression levels in a manner far more precise and global than previous methods. Studies using this technology are altering our view about the extent and complexity of the eukaryotic transcriptomes. In this respect, multiple efforts have been done to determine and analyse the gene expression patterns of human cell types in different conditions, either in normal or pathological states. However, until recently, little has been reported about the evolutionary marks present in human protein-coding genes, particularly from the combined perspective of gene expression and protein evolution. Results We present a combined analysis of human protein-coding gene expression profiling and time-scale ancestry mapping, that places the genes in taxonomy clades and reveals eight evolutionary major steps (“hallmarks”), that include clusters of functionally coherent proteins. The human expressed genes are analysed using a RNA-Seq dataset of 116 samples from 32 tissues. The evolutionary analysis of the human proteins is performed combining the information from: (i) a database of orthologous proteins (OMA), (ii) the taxonomy mapping of genes to lineage clades (from NCBI Taxonomy) and (iii) the evolution time-scale mapping provided by TimeTree (Timescale of Life). The human protein-coding genes are also placed in a relational context based in the construction of a robust gene coexpression network, that reveals tighter links between age-related protein-coding genes and finds functionally coherent gene modules. Conclusions Understanding the relational landscape of the human protein-coding genes is essential for interpreting the functional elements and modules of our active genome. Moreover, decoding the evolutionary history of the human genes can provide very valuable information to reveal or uncover their origin and function. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3062-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Katia de Paiva Lopes
- Bioinformatics and Functional Genomics Group, Cancer Research Center (CiC-IBMCC, CSIC/USAL/IBSAL), Consejo Superior de Investigaciones Cientificas (CSIC), Salamanca, Spain.,Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas (ICB), Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brasil
| | - Francisco José Campos-Laborie
- Bioinformatics and Functional Genomics Group, Cancer Research Center (CiC-IBMCC, CSIC/USAL/IBSAL), Consejo Superior de Investigaciones Cientificas (CSIC), Salamanca, Spain
| | - Ricardo Assunção Vialle
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas (ICB), Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brasil
| | - José Miguel Ortega
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas (ICB), Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brasil
| | - Javier De Las Rivas
- Bioinformatics and Functional Genomics Group, Cancer Research Center (CiC-IBMCC, CSIC/USAL/IBSAL), Consejo Superior de Investigaciones Cientificas (CSIC), Salamanca, Spain.
| |
Collapse
|