51
|
Wnuk K, Sudol J, Givechian KB, Soon-Shiong P, Rabizadeh S, Szeto C, Vaske C. Deep Learning Implicitly Handles Tissue Specific Phenomena to Predict Tumor DNA Accessibility and Immune Activity. iScience 2019; 20:119-136. [PMID: 31563852 PMCID: PMC6823659 DOI: 10.1016/j.isci.2019.09.018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 08/23/2019] [Accepted: 09/11/2019] [Indexed: 01/22/2023] Open
Abstract
DNA accessibility is a key dynamic feature of chromatin regulation that can potentiate transcriptional events and tumor progression. To gain insight into chromatin state across existing tumor data, we improved neural network models for predicting accessibility from DNA sequence and extended them to incorporate a global set of RNA sequencing gene expression inputs. Our expression-informed model expanded the application domain beyond specific tissue types to tissues not present in training and achieved consistently high accuracy in predicting DNA accessibility at promoter and promoter flank regions. We then leveraged our new tool by analyzing the DNA accessibility landscape of promoters across The Cancer Genome Atlas. We show that in lung adenocarcinoma the accessibility perspective uniquely highlights immune pathways inversely correlated with a more open chromatin state and that accessibility patterns learned from even a single tumor type can discriminate immune inflammation across many cancers, often with direct relation to patient prognosis.
Collapse
Affiliation(s)
- Kamil Wnuk
- ImmunityBio Inc., Culver City, CA 90232, USA.
| | | | | | | | | | | | | |
Collapse
|
52
|
de Jong JMA, Sun W, Pires ND, Frontini A, Balaz M, Jespersen NZ, Feizi A, Petrovic K, Fischer AW, Bokhari MH, Niemi T, Nuutila P, Cinti S, Nielsen S, Scheele C, Virtanen K, Cannon B, Nedergaard J, Wolfrum C, Petrovic N. Human brown adipose tissue is phenocopied by classical brown adipose tissue in physiologically humanized mice. Nat Metab 2019; 1:830-843. [PMID: 32694768 DOI: 10.1038/s42255-019-0101-4] [Citation(s) in RCA: 89] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 07/16/2019] [Indexed: 11/10/2022]
Abstract
Human and rodent brown adipose tissues (BAT) appear morphologically and molecularly different. Here we compare human BAT with both classical brown and brite/beige adipose tissues of 'physiologically humanized' mice: middle-aged mice living under conditions approaching human thermal and nutritional conditions, that is, prolonged exposure to thermoneutral temperature (approximately 30 °C) and to an energy-rich (high-fat, high-sugar) diet. We find that the morphological, cellular and molecular characteristics (both marker and adipose-selective gene expression) of classical brown fat, but not of brite/beige fat, of these physiologically humanized mice are notably similar to human BAT. We also demonstrate, both in silico and experimentally, that in physiologically humanized mice only classical BAT possesses a high thermogenic potential. These observations suggest that classical rodent BAT is the tissue of choice for translational studies aimed at recruiting human BAT to counteract the development of obesity and its comorbidities.
Collapse
Affiliation(s)
- Jasper M A de Jong
- Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Stockholm, Sweden
- Department of Comparative Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Wenfei Sun
- Institute of Food, Nutrition and Health, Eidgenössische Technische Hochschule Zürich, Schwerzenbach, Switzerland
| | - Nuno D Pires
- Institute of Food, Nutrition and Health, Eidgenössische Technische Hochschule Zürich, Schwerzenbach, Switzerland
| | - Andrea Frontini
- Department of Public Health, Experimental and Forensic Medicine, University of Pavia, Pavia, Italy
| | - Miroslav Balaz
- Institute of Food, Nutrition and Health, Eidgenössische Technische Hochschule Zürich, Schwerzenbach, Switzerland
| | - Naja Z Jespersen
- The Centre of Inflammation and Metabolism and Centre for Physical Activity Research Rigshospitalet, University Hospital of Copenhagen, Copenhagen, Denmark
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Amir Feizi
- Novo Nordisk Research Centre Oxford, Oxford, UK
| | - Katarina Petrovic
- Department of Chemistry, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Alexander W Fischer
- Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Stockholm, Sweden
- Department of Biochemistry and Molecular Cell Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Department of Genetics and Complex Diseases, Harvard T. H. Chan School of Public Health, Boston, MA, USA
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
| | - Muhammad Hamza Bokhari
- Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Stockholm, Sweden
| | - Tarja Niemi
- Department of Surgery, Turku University Hospital, Turku, Finland
| | - Pirjo Nuutila
- Turku PET Centre, University of Turku, Turku, Finland
| | - Saverio Cinti
- Department of Experimental and Clinical Medicine, University of Ancona, Ancona, Italy
| | - Søren Nielsen
- The Centre of Inflammation and Metabolism and Centre for Physical Activity Research Rigshospitalet, University Hospital of Copenhagen, Copenhagen, Denmark
| | - Camilla Scheele
- The Centre of Inflammation and Metabolism and Centre for Physical Activity Research Rigshospitalet, University Hospital of Copenhagen, Copenhagen, Denmark
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | - Barbara Cannon
- Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Stockholm, Sweden
| | - Jan Nedergaard
- Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Stockholm, Sweden
| | - Christian Wolfrum
- Institute of Food, Nutrition and Health, Eidgenössische Technische Hochschule Zürich, Schwerzenbach, Switzerland
| | - Natasa Petrovic
- Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Stockholm, Sweden.
| |
Collapse
|
53
|
Spies D, Renz PF, Beyer TA, Ciaudo C. Comparative analysis of differential gene expression tools for RNA sequencing time course data. Brief Bioinform 2019; 20:288-298. [PMID: 29028903 PMCID: PMC6357553 DOI: 10.1093/bib/bbx115] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2017] [Indexed: 02/05/2023] Open
Abstract
RNA sequencing (RNA-seq) has become a standard procedure to investigate transcriptional changes between conditions and is routinely used in research and clinics. While standard differential expression (DE) analysis between two conditions has been extensively studied, and improved over the past decades, RNA-seq time course (TC) DE analysis algorithms are still in their early stages. In this study, we compare, for the first time, existing TC RNA-seq tools on an extensive simulation data set and validated the best performing tools on published data. Surprisingly, TC tools were outperformed by the classical pairwise comparison approach on short time series (<8 time points) in terms of overall performance and robustness to noise, mostly because of high number of false positives, with the exception of ImpulseDE2. Overlapping of candidate lists between tools improved this shortcoming, as the majority of false-positive, but not true-positive, candidates were unique for each method. On longer time series, pairwise approach was less efficient on the overall performance compared with splineTC and maSigPro, which did not identify any false-positive candidate.
Collapse
Affiliation(s)
- Daniel Spies
- Swiss Federal Institute of Technology Zurich, Department of Biology, IMHS, Zurich, Switzerland.,Life Science Zurich Graduate School, Molecular Life Science program, University of Zürich, Switzerland
| | - Peter F Renz
- Swiss Federal Institute of Technology Zurich, Department of Biology, IMHS, Zurich, Switzerland.,Life Science Zurich Graduate School, Molecular Life Science program, University of Zürich, Switzerland
| | - Tobias A Beyer
- Swiss Federal Institute of Technology Zurich, Department of Biology, IMHS, Zurich, Switzerland
| | - Constance Ciaudo
- Swiss Federal Institute of Technology Zurich, Department of Biology, IMHS, Zurich, Switzerland
| |
Collapse
|
54
|
Nair S, Kim DS, Perricone J, Kundaje A. Integrating regulatory DNA sequence and gene expression to predict genome-wide chromatin accessibility across cellular contexts. Bioinformatics 2019; 35:i108-i116. [PMID: 31510655 PMCID: PMC6612838 DOI: 10.1093/bioinformatics/btz352] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
MOTIVATION Genome-wide profiles of chromatin accessibility and gene expression in diverse cellular contexts are critical to decipher the dynamics of transcriptional regulation. Recently, convolutional neural networks have been used to learn predictive cis-regulatory DNA sequence models of context-specific chromatin accessibility landscapes. However, these context-specific regulatory sequence models cannot generalize predictions across cell types. RESULTS We introduce multi-modal, residual neural network architectures that integrate cis-regulatory sequence and context-specific expression of trans-regulators to predict genome-wide chromatin accessibility profiles across cellular contexts. We show that the average accessibility of a genomic region across training contexts can be a surprisingly powerful predictor. We leverage this feature and employ novel strategies for training models to enhance genome-wide prediction of shared and context-specific chromatin accessible sites across cell types. We interpret the models to reveal insights into cis- and trans-regulation of chromatin dynamics across 123 diverse cellular contexts. AVAILABILITY AND IMPLEMENTATION The code is available at https://github.com/kundajelab/ChromDragoNN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Surag Nair
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Daniel S Kim
- Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA
| | - Jacob Perricone
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| |
Collapse
|
55
|
Abstract
Background Leishmania development in the sand fly gut leads to highly infective forms called metacyclic promastigotes. This process can be routinely mimicked in culture. Gene expression–profiling studies by transcriptome analysis have been performed with the aim of studying promastigote forms in the sand fly gut, as well as differences between sand fly–and culture-derived promastigotes. Findings Transcriptome analysis has revealed the crucial role of the microenvironment in parasite development within the sand fly gut because substantial differences and moderate correlation between the transcriptomes of cultured and sand fly–derived promastigotes have been found. Sand fly–derived metacyclics are more infective than metacyclics in culture. Therefore, some caution should be exercised when using cultured promastigotes, depending on the experimental design. The most remarkable examples are the hydrophilic acidic surface protein/small endoplasmic reticulum protein (HASP/SHERP) cluster, the glycoprotein 63 (gp63), and autophagy genes, which are up-regulated in sand fly–derived promastigotes compared with cultured promastigotes. Because HASP/SHERP genes are up-regulated in nectomonad and metacyclic promastigotes in the sand fly, the encoded proteins are not metacyclic specific. Metacyclic promastigotes are distinguished by morphology and high infectivity. Isolating them from the sand fly gut is not exempt from technical difficulty, because other promastigote forms remain in the gut even 15 days after infection. Leishmania major procyclic promastigotes within the sand fly gut up-regulate genes involved in cell cycle regulation and glucose catabolism, whereas metacyclics increase transcript levels of fatty acid biosynthesis and ATP-coupled proton transport genes. Most parasite's signal transduction pathways remain uncharacterized. Future elucidation may improve understanding of parasite development, particularly signaling molecule-encoding genes in sand fly versus culture and between promastigote forms in the sand fly gut. Conclusions Transcriptome analysis has been demonstrated to be technically efficacious to study differential gene expression in sand fly gut promastigote forms. Transcript and protein levels are not well correlated in these organisms (approximately 25% quantitative coincidences), especially under stress situations and at differentiation processes. However, transcript and protein levels behave similarly in approximately 60% of cases from a qualitative point of view (increase, decrease, or no variation). Changes in translational efficiency observed in other trypanosomatids strongly suggest that the differences are due to translational regulation and regulation of the steady-state protein levels. The lack of low-input sample strategies does not allow translatome and proteome analysis of sand fly–derived promastigotes so far.
Collapse
|
56
|
Comparative Analysis of Brain and Fat Body Gene Splicing Patterns in the Honey Bee, Apis mellifera. G3-GENES GENOMES GENETICS 2019; 9:1055-1063. [PMID: 30792192 PMCID: PMC6469410 DOI: 10.1534/g3.118.200857] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
RNA-seq has proven to be a powerful tool to unravel various aspects of the transcriptome, especially the quantification of alternative splicing (AS) that leads to isoform diversity. The honey bee (Apis mellifera) is an important model organism for studying the molecular underpinnings of behavioral plasticity and social behavior, and recent RNA-seq studies of honey bees have revealed AS patterns and their regulation by DNA methylation. However, tissue-specific AS patterns have not been fully explored. In this paper, we characterized AS patterns in two different honey bee tissue types, and also explored their conservation and regulation. We used the RNA-seq data from brain and fat body to improve the existing models of honey bee genes and identified tissue-specific AS patterns. We found that AS genes show high conservation between honey bee and Drosophila melanogaster. We also confirmed and extended previous findings of a correlation between gene body DNA methylation and AS patterns, providing further support for the role of DNA methylation in regulating AS. In addition, our analysis suggests distinct functional roles for tissue-specific alternatively spliced genes. Taken together, our work provides new insights into the conservation and dynamics of AS patterns across different tissue types.
Collapse
|
57
|
Winter C, Kosch R, Ludlow M, Osterhaus ADME, Jung K. Network meta-analysis correlates with analysis of merged independent transcriptome expression data. BMC Bioinformatics 2019; 20:144. [PMID: 30876387 PMCID: PMC6420731 DOI: 10.1186/s12859-019-2705-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Accepted: 02/27/2019] [Indexed: 12/15/2022] Open
Abstract
Background Using meta-analysis, high-dimensional transcriptome expression data from public repositories can be merged to make group comparisons that have not been considered in the original studies. Merging of high-dimensional expression data can, however, implicate batch effects that are sometimes difficult to be removed. Removing batch effects becomes even more difficult when expression data was taken using different technologies in the individual studies (e.g. merging of microarray and RNA-seq data). Network meta-analysis has so far not been considered to make indirect comparisons in transcriptome expression data, when data merging appears to yield biased results. Results We demonstrate in a simulation study that the results from analyzing merged data sets and the results from network meta-analysis are highly correlated in simple study networks. In the case that an edge in the network is supported by multiple independent studies, network meta-analysis produces fold changes that are closer to the simulated ones than those obtained from analyzing merged data sets. Finally, we also demonstrate the practicability of network meta-analysis on a real-world data example from neuroinfection research. Conclusions Network meta-analysis is a useful means to make new inferences when combining multiple independent studies of molecular, high-throughput expression data. This method is especially advantageous when batch effects between studies are hard to get removed. Electronic supplementary material The online version of this article (10.1186/s12859-019-2705-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Christine Winter
- Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Bünteweg 17p, Hannover, 30559, Germany
| | - Robin Kosch
- Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Bünteweg 17p, Hannover, 30559, Germany
| | - Martin Ludlow
- Research Center for Emerging Infections and Zoonoses, University of Veterinary Medicine Hannover, Bünteweg 17p, Hannover, 30559, Germany
| | - Albert D M E Osterhaus
- Research Center for Emerging Infections and Zoonoses, University of Veterinary Medicine Hannover, Bünteweg 17p, Hannover, 30559, Germany
| | - Klaus Jung
- Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Bünteweg 17p, Hannover, 30559, Germany.
| |
Collapse
|
58
|
Vilgalys TP, Rogers J, Jolly CJ, Baboon Genome Analysis, Mukherjee S, Tung J. Evolution of DNA Methylation in Papio Baboons. Mol Biol Evol 2019; 36:527-540. [PMID: 30521003 PMCID: PMC6389319 DOI: 10.1093/molbev/msy227] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Changes in gene regulation have long been thought to play an important role in primate evolution. However, although a number of studies have compared genome-wide gene expression patterns across primate species, fewer have investigated the gene regulatory mechanisms that underlie such patterns, or the relative contribution of drift versus selection. Here, we profiled genome-scale DNA methylation levels in blood samples from five of the six extant species of the baboon genus Papio (4-14 individuals per species). This radiation presents the opportunity to investigate DNA methylation divergence at both shallow and deeper timescales (0.380-1.4 My). In contrast to studies in human populations, but similar to studies in great apes, DNA methylation profiles clearly mirror genetic and geographic structure. Divergence in DNA methylation proceeds fastest in unannotated regions of the genome and slowest in regions of the genome that are likely more constrained at the sequence level (e.g., gene exons). Both heuristic approaches and Ornstein-Uhlenbeck models suggest that DNA methylation levels at a small set of sites have been affected by positive selection, and that this class is enriched in functionally relevant contexts, including promoters, enhancers, and CpG islands. Our results thus indicate that the rate and distribution of DNA methylation changes across the genome largely mirror genetic structure. However, at some CpG sites, DNA methylation levels themselves may have been a target of positive selection, pointing to loci that could be important in connecting sequence variation to fitness-related traits.
Collapse
Affiliation(s)
- Tauras P Vilgalys
- Department of Evolutionary Anthropology, Duke University, Durham, NC
| | - Jeffrey Rogers
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX
| | - Clifford J Jolly
- Department of Anthropology, New York University, New York, NY
- Center for the Study of Human Origins, New York University, New York, NY
- New York Consortium for Evolutionary Primatology, New York, NY
| | | | - Sayan Mukherjee
- Department of Statistical Science, Duke University, Durham, NC
- Department of Mathematics, Duke University, Durham, NC
- Department of Computer Science, Duke University, Durham, NC
| | - Jenny Tung
- Department of Evolutionary Anthropology, Duke University, Durham, NC
- Department of Biology, Duke University, Durham, NC
- Duke University Population Research Institute, Duke University, Durham, NC
- Institute of Primate Research, National Museums of Kenya, Karen, Nairobi, Kenya
| |
Collapse
|
59
|
Ray P, Torck A, Quigley L, Wangzhou A, Neiman M, Rao C, Lam T, Kim JY, Kim TH, Zhang MQ, Dussor G, Price TJ. Comparative transcriptome profiling of the human and mouse dorsal root ganglia: an RNA-seq-based resource for pain and sensory neuroscience research. Pain 2019; 159:1325-1345. [PMID: 29561359 DOI: 10.1097/j.pain.0000000000001217] [Citation(s) in RCA: 224] [Impact Index Per Article: 44.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Molecular neurobiological insight into human nervous tissues is needed to generate next-generation therapeutics for neurological disorders such as chronic pain. We obtained human dorsal root ganglia (hDRG) samples from organ donors and performed RNA-sequencing (RNA-seq) to study the hDRG transcriptional landscape, systematically comparing it with publicly available data from a variety of human and orthologous mouse tissues, including mouse DRG (mDRG). We characterized the hDRG transcriptional profile in terms of tissue-restricted gene coexpression patterns and putative transcriptional regulators, and formulated an information-theoretic framework to quantify DRG enrichment. Relevant gene families and pathways were also analyzed, including transcription factors, G-protein-coupled receptors, and ion channels. Our analyses reveal an hDRG-enriched protein-coding gene set (∼140), some of which have not been described in the context of DRG or pain signaling. Most of these show conserved enrichment in mDRG and were mined for known drug-gene product interactions. Conserved enrichment of the vast majority of transcription factors suggests that the mDRG is a faithful model system for studying hDRG, because of evolutionarily conserved regulatory programs. Comparison of hDRG and tibial nerve transcriptomes suggests trafficking of neuronal mRNA to axons in adult hDRG, and are consistent with studies of axonal transport in rodent sensory neurons. We present our work as an online, searchable repository (https://www.utdallas.edu/bbs/painneurosciencelab/sensoryomics/drgtxome), creating a valuable resource for the community. Our analyses provide insight into DRG biology for guiding development of novel therapeutics and a blueprint for cross-species transcriptomic analyses.
Collapse
Affiliation(s)
- Pradipta Ray
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA.,Department of Biological Sciences, The University of Texas at Dallas, Richardson, TX, USA
| | - Andrew Torck
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA
| | - Lilyana Quigley
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA
| | - Andi Wangzhou
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA
| | - Matthew Neiman
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA
| | - Chandranshu Rao
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA
| | - Tiffany Lam
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA
| | - Ji-Young Kim
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA
| | - Tae Hoon Kim
- Department of Biological Sciences, The University of Texas at Dallas, Richardson, TX, USA
| | - Michael Q Zhang
- Department of Biological Sciences, The University of Texas at Dallas, Richardson, TX, USA
| | - Gregory Dussor
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA
| | - Theodore J Price
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA
| |
Collapse
|
60
|
Feltes BC, Grisci BI, Poloni JDF, Dorn M. Perspectives and applications of machine learning for evolutionary developmental biology. Mol Omics 2018; 14:289-306. [PMID: 30168572 DOI: 10.1039/c8mo00111a] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Evolutionary Developmental Biology (Evo-Devo) is an ever-expanding field that aims to understand how development was modulated by the evolutionary process. In this sense, "omic" studies emerged as a powerful ally to unravel the molecular mechanisms underlying development. In this scenario, bioinformatics tools become necessary to analyze the growing amount of information. Among computational approaches, machine learning stands out as a promising field to generate knowledge and trace new research perspectives for bioinformatics. In this review, we aim to expose the current advances of machine learning applied to evolution and development. We draw clear perspectives and argue how evolution impacted machine learning techniques.
Collapse
Affiliation(s)
- Bruno César Feltes
- Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil.
| | | | | | | |
Collapse
|
61
|
Farris SP, Riley BP, Williams RW, Mulligan MK, Miles MF, Lopez MF, Hitzemann R, Iancu OD, Colville A, Walter NAR, Darakjian P, Oberbeck DL, Daunais JB, Zheng CL, Searles RP, McWeeney SK, Grant KA, Mayfield RD. Cross-species molecular dissection across alcohol behavioral domains. Alcohol 2018; 72:19-31. [PMID: 30213503 PMCID: PMC6309876 DOI: 10.1016/j.alcohol.2017.11.036] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2017] [Revised: 11/17/2017] [Accepted: 11/28/2017] [Indexed: 12/14/2022]
Abstract
This review summarizes the proceedings of a symposium presented at the "Alcoholism and Stress: A Framework for Future Treatment Strategies" conference held in Volterra, Italy on May 9-12, 2017. Psychiatric diseases, including alcohol-use disorders (AUDs), are influenced through complex interactions of genes, neurobiological pathways, and environmental influences. A better understanding of the common neurobiological mechanisms underlying an AUD necessitates an integrative approach, involving a systematic assessment of diverse species and phenotype measures. As part of the World Congress on Stress and Alcoholism, this symposium provided a detailed account of current strategies to identify mechanisms underlying the development and progression of AUDs. Dr. Sean Farris discussed the integration and organization of transcriptome and postmortem human brain data to identify brain regional- and cell type-specific differences related to excessive alcohol consumption that are conserved across species. Dr. Brien Riley presented the results of a genome-wide association study of DSM-IV alcohol dependence; although replication of genetic associations with alcohol phenotypes in humans remains challenging, model organism studies show that COL6A3, KLF12, and RYR3 affect behavioral responses to ethanol, and provide substantial evidence for their role in human alcohol-related traits. Dr. Rob Williams expanded upon the systematic characterization of extensive genetic-genomic resources for quantifying and clarifying phenotypes across species that are relevant to precision medicine in human disease. The symposium concluded with Dr. Robert Hitzemann's description of transcriptome studies in a mouse model selectively bred for high alcohol ("binge-like") consumption and a non-human primate model of long-term alcohol consumption. Together, the different components of this session provided an overview of systems-based approaches that are pioneering the experimental prioritization and validation of novel genes and gene networks linked with a range of behavioral phenotypes associated with stress and AUDs.
Collapse
Affiliation(s)
- Sean P Farris
- University of Texas at Austin, Austin, TX, United States
| | - Brien P Riley
- Virginia Commonwealth University, Richmond, VA, United States
| | - Robert W Williams
- University of Tennessee Health Science Center, Memphis, TN, United States
| | - Megan K Mulligan
- University of Tennessee Health Science Center, Memphis, TN, United States
| | - Michael F Miles
- University of Tennessee Health Science Center, Memphis, TN, United States
| | - Marcelo F Lopez
- University of Tennessee Health Science Center, Memphis, TN, United States
| | - Robert Hitzemann
- Oregon Health and Science University, Portland, OR, United States
| | - Ovidiu D Iancu
- Oregon Health and Science University, Portland, OR, United States
| | | | | | | | | | - James B Daunais
- Wake Forest School of Medicine, Winston-Salem, NC, United States
| | | | - Robert P Searles
- Oregon Health and Science University, Portland, OR, United States
| | | | - Kathleen A Grant
- Oregon Health and Science University, Portland, OR, United States
| | | |
Collapse
|
62
|
Liang C, Musser JM, Cloutier A, Prum RO, Wagner GP. Pervasive Correlated Evolution in Gene Expression Shapes Cell and Tissue Type Transcriptomes. Genome Biol Evol 2018; 10:538-552. [PMID: 29373668 PMCID: PMC5800078 DOI: 10.1093/gbe/evy016] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/21/2018] [Indexed: 12/11/2022] Open
Abstract
The evolution and diversification of cell types is a key means by which animal complexity evolves. Recently, hierarchical clustering and phylogenetic methods have been applied to RNA-seq data to infer cell type evolutionary history and homology. A major challenge for interpreting this data is that cell type transcriptomes may not evolve independently due to correlated changes in gene expression. This nonindependence can arise for several reasons, such as common regulatory sequences for genes expressed in multiple tissues, that is, pleiotropic effects of mutations. We develop a model to estimate the level of correlated transcriptome evolution (LCE) and apply it to different data sets. The results reveal pervasive correlated transcriptome evolution among different cell and tissue types. In general, tissues related by morphology or developmental lineage exhibit higher LCE than more distantly related tissues. Analyzing new data collected from bird skin appendages suggests that LCE decreases with the phylogenetic age of tissues compared, with recently evolved tissues exhibiting the highest LCE. Furthermore, we show correlated evolution can alter patterns of hierarchical clustering, causing different tissue types from the same species to cluster together. To identify genes that most strongly contribute to the correlated evolution signal, we performed a gene-wise estimation of LCE on a data set with ten species. Removing genes with high LCE allows for accurate reconstruction of evolutionary relationships among tissue types. Our study provides a statistical method to measure and account for correlated gene expression evolution when interpreting comparative transcriptome data.
Collapse
Affiliation(s)
- Cong Liang
- Yale Systems Biology Institute, West Haven, Connecticut.,Interdepartmental Program in Computational Biology and Bioinformatics, Yale University.,Integrated Graduate Program in Physical and Engineering Biology, Yale University
| | - Jacob M Musser
- Yale Systems Biology Institute, West Haven, Connecticut.,Department of Ecology and Evolutionary Biology, Yale University.,European Molecular Biology Laboratory, Developmental Biology Unit, Heidelberg, Germany
| | - Alison Cloutier
- Department of Ecology and Evolutionary Biology, University of Toronto, Ontario, Canada
| | - Richard O Prum
- Department of Ecology and Evolutionary Biology, Yale University.,Yale Peabody Museum of Natural History, New Haven, Connecticut
| | - Günter P Wagner
- Yale Systems Biology Institute, West Haven, Connecticut.,Department of Ecology and Evolutionary Biology, Yale University.,Department of Obstetrics, Gynecology and Reproductive Sciences, Yale Medical School, New Haven, Connecticut.,Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan
| |
Collapse
|
63
|
Palasca O, Santos A, Stolte C, Gorodkin J, Jensen LJ. TISSUES 2.0: an integrative web resource on mammalian tissue expression. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:4851151. [PMID: 29617745 PMCID: PMC5808782 DOI: 10.1093/database/bay003] [Citation(s) in RCA: 122] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Accepted: 01/04/2018] [Indexed: 11/13/2022]
Abstract
Physiological and molecular similarities between organisms make it possible to translate findings from simpler experimental systems—model organisms—into more complex ones, such as human. This translation facilitates the understanding of biological processes under normal or disease conditions. Researchers aiming to identify the similarities and differences between organisms at the molecular level need resources collecting multi-organism tissue expression data. We have developed a database of gene–tissue associations in human, mouse, rat and pig by integrating multiple sources of evidence: transcriptomics covering all four species and proteomics (human only), manually curated and mined from the scientific literature. Through a scoring scheme, these associations are made comparable across all sources of evidence and across organisms. Furthermore, the scoring produces a confidence score assigned to each of the associations. The TISSUES database (version 2.0) is publicly accessible through a user-friendly web interface and as part of the STRING app for Cytoscape. In addition, we analyzed the agreement between datasets, across and within organisms, and identified that the agreement is mainly affected by the quality of the datasets rather than by the technologies used or organisms compared. Database URL: http://tissues.jensenlab.org/
Collapse
Affiliation(s)
- Oana Palasca
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Center for non-coding RNA in Technology and Health, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Alberto Santos
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Center for non-coding RNA in Technology and Health, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
64
|
Combination of novel and public RNA-seq datasets to generate an mRNA expression atlas for the domestic chicken. BMC Genomics 2018; 19:594. [PMID: 30086717 PMCID: PMC6081845 DOI: 10.1186/s12864-018-4972-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Accepted: 07/31/2018] [Indexed: 12/20/2022] Open
Abstract
Background The domestic chicken (Gallus gallus) is widely used as a model in developmental biology and is also an important livestock species. We describe a novel approach to data integration to generate an mRNA expression atlas for the chicken spanning major tissue types and developmental stages, using a diverse range of publicly-archived RNA-seq datasets and new data derived from immune cells and tissues. Results Randomly down-sampling RNA-seq datasets to a common depth and quantifying expression against a reference transcriptome using the mRNA quantitation tool Kallisto ensured that disparate datasets explored comparable transcriptomic space. The network analysis tool Graphia was used to extract clusters of co-expressed genes from the resulting expression atlas, many of which were tissue or cell-type restricted, contained transcription factors that have previously been implicated in their regulation, or were otherwise associated with biological processes, such as the cell cycle. The atlas provides a resource for the functional annotation of genes that currently have only a locus ID. We cross-referenced the RNA-seq atlas to a publicly available embryonic Cap Analysis of Gene Expression (CAGE) dataset to infer the developmental time course of organ systems, and to identify a signature of the expansion of tissue macrophage populations during development. Conclusion Expression profiles obtained from public RNA-seq datasets – despite being generated by different laboratories using different methodologies – can be made comparable to each other. This meta-analytic approach to RNA-seq can be extended with new datasets from novel tissues, and is applicable to any species. Electronic supplementary material The online version of this article (10.1186/s12864-018-4972-7) contains supplementary material, which is available to authorized users.
Collapse
|
65
|
Fei T, Zhang T, Shi W, Yu T. Mitigating the adverse impact of batch effects in sample pattern detection. Bioinformatics 2018; 34:2634-2641. [PMID: 29506177 PMCID: PMC6061843 DOI: 10.1093/bioinformatics/bty117] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Revised: 02/14/2018] [Accepted: 02/27/2018] [Indexed: 11/14/2022] Open
Abstract
Motivation It is well known that batch effects exist in RNA-seq data and other profiling data. Although some methods do a good job adjusting for batch effects by modifying the data matrices, it is still difficult to remove the batch effects entirely. The remaining batch effect can cause artifacts in the detection of patterns in the data. Results In this study, we consider the batch effect issue in the pattern detection among the samples, such as clustering, dimension reduction and construction of networks between subjects. Instead of adjusting the original data matrices, we design an adaptive method to directly adjust the dissimilarity matrix between samples. In simulation studies, the method achieved better results recovering true underlying clusters, compared to the leading batch effect adjustment method ComBat. In real data analysis, the method effectively corrected distance matrices and improved the performance of clustering algorithms. Availability and implementation The R package is available at: https://github.com/tengfei-emory/QuantNorm. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Teng Fei
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, USA
| | - Tengjiao Zhang
- School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Weiyang Shi
- Ministry of Education Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao, China
| | - Tianwei Yu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, USA
| |
Collapse
|
66
|
Zeng L, Pederson SM, Kortschak RD, Adelson DL. Transposable elements and gene expression during the evolution of amniotes. Mob DNA 2018; 9:17. [PMID: 29942365 PMCID: PMC5998507 DOI: 10.1186/s13100-018-0124-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Accepted: 06/01/2018] [Indexed: 01/24/2023] Open
Abstract
Background Transposable elements (TEs) are primarily responsible for the DNA losses and gains in genome sequences that occur over time within and between species. TEs themselves evolve, with clade specific LTR/ERV, LINEs and SINEs responsible for the bulk of species-specific genomic features. Because TEs can contain regulatory motifs, they can be exapted as regulators of gene expression. While TE insertions can provide evolutionary novelty for the regulation of gene expression, their overall impact on the evolution of gene expression is unclear. Previous investigators have shown that tissue specific gene expression in amniotes is more similar across species than within species, supporting the existence of conserved developmental gene regulation. In order to understand how species-specific TE insertions might affect the evolution/conservation of gene expression, we have looked at the association of gene expression in six tissues with TE insertions in six representative amniote genomes. Results A novel bootstrapping approach has been used to minimise the conflation of effects of repeat types on gene expression. We compared the expression of orthologs containing recent TE insertions to orthologs that contained older TE insertions, and the expression of non-orthologs containing recent TE insertions to non-orthologs with older TE insertions. Both orthologs and non-orthologs showed significant differences in gene expression associated with TE insertions. TEs were found associated with species-specific changes in gene expression, and the magnitude and direction of expression changes were noteworthy. Overall, orthologs containing species-specific TEs were associated with lower gene expression, while in non-orthologs, non-species specific TEs were associated with higher gene expression. Exceptions were SINE elements in human and chicken, which had an opposite association with gene expression compared to other species. Conclusions Our observed species-specific associations of TEs with gene expression support a role for TEs in speciation/response to selection by species. TEs do not exhibit consistent associations with gene expression and observed associations can vary depending on the age of TE insertions. Based on these observations, it would be prudent to refrain from extrapolating these and previously reported associations to distantly related species.
Collapse
Affiliation(s)
- Lu Zeng
- 1School of Biological Sciences, The University of Adelaide, North Terrace, Adelaide, 5005 Australia
| | - Stephen M Pederson
- 2Bioinformatics Hub, The University of Adelaide, North Terrace, Adelaide, 5005 Australia
| | - R Daniel Kortschak
- 1School of Biological Sciences, The University of Adelaide, North Terrace, Adelaide, 5005 Australia
| | - David L Adelson
- 1School of Biological Sciences, The University of Adelaide, North Terrace, Adelaide, 5005 Australia
| |
Collapse
|
67
|
Dunne MP, Kelly S. OMGene: mutual improvement of gene models through optimisation of evolutionary conservation. BMC Genomics 2018; 19:307. [PMID: 29703150 PMCID: PMC5923031 DOI: 10.1186/s12864-018-4704-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 04/19/2018] [Indexed: 12/15/2022] Open
Abstract
Background The accurate determination of the genomic coordinates for a given gene – its gene model – is of vital importance to the utility of its annotation, and the accuracy of bioinformatic analyses derived from it. Currently-available methods of computational gene prediction, while on the whole successful, frequently disagree on the model for a given predicted gene, with some or all of the variant gene models often failing to match the biologically observed structure. Many prediction methods can be bolstered by using experimental data such as RNA-seq. However, these resources are not always available, and rarely give a comprehensive portrait of an organism’s transcriptome due to temporal and tissue-specific expression profiles. Results Orthology between genes provides evolutionary evidence to guide the construction of gene models. OMGene (Optimise My Gene) aims to improve gene model accuracy in the absence of experimental data by optimising the consistency of multiple sequence alignments of orthologous genes from multiple species. Using RNA-seq data sets from plants, mammals, and fungi, considering intron/exon junction representation and exon coverage, and assessing the intra-orthogroup consistency of subcellular localisation predictions, we demonstrate the utility of OMGene for improving gene models in annotated genomes. Conclusions We show that significant improvements in the accuracy of gene model annotations can be made, both in established and in de novo annotated genomes, by leveraging information from multiple species.
Collapse
Affiliation(s)
- Michael P Dunne
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK.
| |
Collapse
|
68
|
Wu P, Zhou D, Lin W, Li Y, Wei H, Qian X, Jiang Y, He F. Cell-type-resolved alternative splicing patterns in mouse liver. DNA Res 2018; 25:4793385. [PMID: 29325017 PMCID: PMC6014294 DOI: 10.1093/dnares/dsx055] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Accepted: 12/26/2017] [Indexed: 12/20/2022] Open
Abstract
Alternative splicing (AS) is an important post-transcriptional regulatory mechanism to generate transcription diversity. However, the functional roles of AS in multiple cell types from one organ have not been reported. Here, we provide the most comprehensive profile for cell-type-resolved AS patterns in mouse liver. A total of 13,637 AS events are detected, representing 81.5% of all known AS events in the database. About 46.2% of multi-exon genes undergo AS from the four cell types of mouse liver: hepatocyte, liver sinusoidal endothelial cell, Kupffer cell and hepatic stellate cell, which regulates cell-specific functions and maintains cell characteristics. We also present a cell-type-specific splicing factors network in these four cell types of mouse liver, allowing data mining and generating knowledge to elucidate the roles of splicing factors in sustaining the cell-type-specialized AS profiles and functions. The splicing switching of Tak1 gene between different cell types is firstly discovered and the specific Tak1 isoform regulates hepatic cell-type-specific functions is verified. Thus, our work constructs a hepatic cell-specific splicing landscape and reveals the considerable contribution of AS to the cell type constitution and organ features.
Collapse
Affiliation(s)
- Peng Wu
- State Key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Donghu Zhou
- State Key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Weiran Lin
- State Key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Yanyan Li
- State Key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Handong Wei
- State Key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Xiaohong Qian
- State Key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Ying Jiang
- State Key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Fuchu He
- State Key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing 102206, China
| |
Collapse
|
69
|
Berthelot C, Villar D, Horvath JE, Odom DT, Flicek P. Complexity and conservation of regulatory landscapes underlie evolutionary resilience of mammalian gene expression. Nat Ecol Evol 2018; 2:152-163. [PMID: 29180706 PMCID: PMC5733139 DOI: 10.1038/s41559-017-0377-2] [Citation(s) in RCA: 84] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Accepted: 10/10/2017] [Indexed: 02/02/2023]
Abstract
To gain insight into how mammalian gene expression is controlled by rapidly evolving regulatory elements, we jointly analysed promoter and enhancer activity with downstream transcription levels in liver samples from 15 species. Genes associated with complex regulatory landscapes generally exhibit high expression levels that remain evolutionarily stable. While the number of regulatory elements is the key driver of transcriptional output and resilience, regulatory conservation matters: elements active across mammals most effectively stabilize gene expression. In contrast, recently evolved enhancers typically contribute weakly, consistent with their high evolutionary plasticity. These effects are observed across the entire mammalian clade and are robust to potential confounders, such as the gene expression level. Using liver as a representative somatic tissue, our results illuminate how the evolutionary stability of gene expression is profoundly entwined with both the number and conservation of surrounding promoters and enhancers.
Collapse
Affiliation(s)
- Camille Berthelot
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- Institut de Biologie de l'Ecole Normale Supérieure, Centre National de la Recherche Scientifique UMR8197, Institut National de la Santé et de la Recherche Médicale U1024, 46 Rue d'Ulm, 75230, Paris, Cedex 05, France
| | - Diego Villar
- University of Cambridge, Cancer Research UK Cambridge Institute, Robinson Way, Cambridge, CB2 0RE, UK
| | - Julie E Horvath
- Biological and Biomedical Sciences, North Carolina Central University, Durham, NC, 27707, USA
- North Carolina Museum of Natural Sciences, Raleigh, NC, 27601, USA
- Evolutionary Anthropology Department, Duke University, Durham, NC, 27707, USA
| | - Duncan T Odom
- University of Cambridge, Cancer Research UK Cambridge Institute, Robinson Way, Cambridge, CB2 0RE, UK.
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| |
Collapse
|
70
|
Abstract
In this methods article, I describe a computational workflow for cross-species visualization and comparison of mRNA-seq transcriptome profiling data. The workflow is based on gene set variation analysis (GSVA) and is illustrated using commands in the R programming language. I provide a complete step-by-step procedure for the workflow using mRNA-seq data sets from dog and human bladder cancer as an example.
Collapse
Affiliation(s)
- Stephen A Ramsey
- Oregon State University, 106 Dryden Hall, Corvallis, OR, 97331, USA.
| |
Collapse
|
71
|
Hoffman AM, Smith MD. Gene expression differs in codominant prairie grasses under drought. Mol Ecol Resour 2017; 18:334-346. [DOI: 10.1111/1755-0998.12733] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Revised: 10/01/2017] [Accepted: 10/17/2017] [Indexed: 11/28/2022]
Affiliation(s)
- Ava M. Hoffman
- Department of Biology and Graduate Degree Program in Ecology Colorado State University Fort Collins CO USA
| | - Melinda D. Smith
- Department of Biology and Graduate Degree Program in Ecology Colorado State University Fort Collins CO USA
| |
Collapse
|
72
|
Ono H, Ogasawara O, Okubo K, Bono H. RefEx, a reference gene expression dataset as a web tool for the functional analysis of genes. Sci Data 2017; 4:170105. [PMID: 28850115 PMCID: PMC5574374 DOI: 10.1038/sdata.2017.105] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Accepted: 06/29/2017] [Indexed: 12/28/2022] Open
Abstract
Gene expression data are exponentially accumulating; thus, the functional annotation of such sequence data from metadata is urgently required. However, life scientists have difficulty utilizing the available data due to its sheer magnitude and complicated access. We have developed a web tool for browsing reference gene expression pattern of mammalian tissues and cell lines measured using different methods, which should facilitate the reuse of the precious data archived in several public databases. The web tool is called Reference Expression dataset (RefEx), and RefEx allows users to search by the gene name, various types of IDs, chromosomal regions in genetic maps, gene family based on InterPro, gene expression patterns, or biological categories based on Gene Ontology. RefEx also provides information about genes with tissue-specific expression, and the relative gene expression values are shown as choropleth maps on 3D human body images from BodyParts3D. Combined with the newly incorporated Functional Annotation of Mammals (FANTOM) dataset, RefEx provides insight regarding the functional interpretation of unfamiliar genes. RefEx is publicly available at http://refex.dbcls.jp/.
Collapse
Affiliation(s)
- Hiromasa Ono
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, 1111 Yata, Mishima 411-8540, Japan
| | - Osamu Ogasawara
- Center for Information Biology, National Institute of Genetics, Research Organization for Information and Systems, 1111 Yata, Mishima 411-8540, Japan
| | - Kosaku Okubo
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, 1111 Yata, Mishima 411-8540, Japan
- Center for Information Biology, National Institute of Genetics, Research Organization for Information and Systems, 1111 Yata, Mishima 411-8540, Japan
| | - Hidemasa Bono
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, 1111 Yata, Mishima 411-8540, Japan
| |
Collapse
|
73
|
Siangphoe U, Archer KJ. Estimation of random effects and identifying heterogeneous genes in meta-analysis of gene expression studies. Brief Bioinform 2017; 18:602-618. [PMID: 27345525 DOI: 10.1093/bib/bbw050] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2016] [Indexed: 11/12/2022] Open
Abstract
Combining effect sizes from individual studies using random-effects meta-analysis models are commonly applied in high-dimensional gene expression data. However, unknown study heterogeneity can arise from inconsistencies in sample quality and experimental conditions. High heterogeneity of effect sizes can reduce statistical power of the models. In this study, we describe three hypothesis-testing frameworks for meta-analysis of microarray data, and review several existing meta-analytic techniques that have been used in the genomic setting. These include P-value-based methods, rank-based methods and effect-size-based methods. We then discuss limitations of some of these methods and describe random-effects-based methods in detail. We introduce two methods for estimating the inter-study variance in random-effects meta-analytic models and another method for identifying heterogeneous genes for gene expression data. We compared various methods with the standard and existing meta-analytic techniques in the genomic framework. We demonstrate our results through a series of simulations and application in Alzheimer's gene expression data.
Collapse
|
74
|
Abstract
Cross-species comparisons of genomes, transcriptomes and gene regulation are now feasible at unprecedented resolution and throughput, enabling the comparison of human and mouse biology at the molecular level. Insights have been gained into the degree of conservation between human and mouse at the level of not only gene expression but also epigenetics and inter-individual variation. However, a number of limitations exist, including incomplete transcriptome characterization and difficulties in identifying orthologous phenotypes and cell types, which are beginning to be addressed by emerging technologies. Ultimately, these comparisons will help to identify the conditions under which the mouse is a suitable model of human physiology and disease, and optimize the use of animal models.
Collapse
Affiliation(s)
- Alessandra Breschi
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Thomas R Gingeras
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11742, USA
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| |
Collapse
|
75
|
Babbitt CC, Haygood R, Nielsen WJ, Wray GA. Gene expression and adaptive noncoding changes during human evolution. BMC Genomics 2017; 18:435. [PMID: 28583075 PMCID: PMC5460488 DOI: 10.1186/s12864-017-3831-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Accepted: 05/31/2017] [Indexed: 01/14/2023] Open
Abstract
Background Despite evidence for adaptive changes in both gene expression and non-protein-coding, putatively regulatory regions of the genome during human evolution, the relationship between gene expression and adaptive changes in cis-regulatory regions remains unclear. Results Here we present new measurements of gene expression in five tissues of humans and chimpanzees, and use them to assess this relationship. We then compare our results with previous studies of adaptive noncoding changes, analyzing correlations at the level of gene ontology groups, in order to gain statistical power to detect correlations. Conclusions Consistent with previous studies, we find little correlation between gene expression and adaptive noncoding changes at the level of individual genes; however, we do find significant correlations at the level of biological function ontology groups. The types of function include processes regulated by specific transcription factors, responses to genetic or chemical perturbations, and differentiation of cell types within the immune system. Among functional categories co-enriched with both differential expression and noncoding adaptation, prominent themes include cancer, particularly epithelial cancers, and neural development and function. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3831-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Courtney C Babbitt
- Department of Biology, Duke University, Durham, NC, 27708, USA. .,Institute for Genome Sciences & Policy, Duke University, Durham, NC, 27708, USA. .,Present Address: Department of Biology, University of Massachusetts Amherst, Amherst, MA, 01003, USA.
| | | | | | - Gregory A Wray
- Department of Biology, Duke University, Durham, NC, 27708, USA.,Institute for Genome Sciences & Policy, Duke University, Durham, NC, 27708, USA.,Department of Evolutionary Anthropology, Duke University, Durham, NC, 27708, USA
| |
Collapse
|
76
|
Hillman PR, Christian SGB, Doan R, Cohen ND, Konganti K, Douglas K, Wang X, Samollow PB, Dindot SV. Genomic imprinting does not reduce the dosage of UBE3A in neurons. Epigenetics Chromatin 2017; 10:27. [PMID: 28515788 PMCID: PMC5433054 DOI: 10.1186/s13072-017-0134-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Accepted: 05/03/2017] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND The ubiquitin protein E3A ligase gene (UBE3A) gene is imprinted with maternal-specific expression in neurons and biallelically expressed in all other cell types. Both loss-of-function and gain-of-function mutations affecting the dosage of UBE3A are associated with several neurodevelopmental syndromes and psychological conditions, suggesting that UBE3A is dosage-sensitive in the brain. The observation that loss of imprinting increases the dosage of UBE3A in brain further suggests that inactivation of the paternal UBE3A allele evolved as a dosage-regulating mechanism. To test this hypothesis, we examined UBE3A transcript and protein levels among cells, tissues, and species with different imprinting states of UBE3A. RESULTS Overall, we found no correlation between the imprinting status and dosage of UBE3A. Importantly, we found that maternal Ube3a protein levels increase in step with decreasing paternal Ube3a protein levels during neurogenesis in mouse, fully compensating for loss of expression of the paternal Ube3a allele in neurons. CONCLUSIONS Based on our findings, we propose that imprinting of UBE3A does not function to reduce the dosage of UBE3A in neurons but rather to regulate some other, as yet unknown, aspect of gene expression or protein function.
Collapse
Affiliation(s)
- Paul R. Hillman
- Department of Veterinary Pathobiology, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX 77845 USA
- Department of Molecular and Cellular Medicine, College of Medicine, Texas A&M Health Science Center, College Station, TX 77845 USA
| | - Sarah G. B. Christian
- Department of Veterinary Pathobiology, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX 77845 USA
| | - Ryan Doan
- Department of Veterinary Pathobiology, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX 77845 USA
- Interdisciplinary Genetics Program, College of Agriculture and Life Sciences, Texas A&M University, College Station, TX 77845 USA
| | - Noah D. Cohen
- Department of Large Animal Clinical Sciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX USA
| | - Kranti Konganti
- Institute for Genome Science and Society, Texas A&M University, College Station, TX 77845 USA
| | - Kory Douglas
- Department of Large Animal Clinical Sciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX USA
- Department of Veterinary Integrative Biosciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX 77843 USA
| | - Xu Wang
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853 USA
| | - Paul B. Samollow
- Department of Veterinary Integrative Biosciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX 77843 USA
| | - Scott V. Dindot
- Department of Veterinary Pathobiology, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX 77845 USA
- Department of Molecular and Cellular Medicine, College of Medicine, Texas A&M Health Science Center, College Station, TX 77845 USA
- Department of Veterinary Pathobiology, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, 4467 TAMU, College Station, TX 77843 USA
| |
Collapse
|
77
|
Park Y, Lim S, Nam JW, Kim S. Measuring intratumor heterogeneity by network entropy using RNA-seq data. Sci Rep 2016; 6:37767. [PMID: 27883053 PMCID: PMC5121893 DOI: 10.1038/srep37767] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Accepted: 10/31/2016] [Indexed: 12/27/2022] Open
Abstract
Intratumor heterogeneity (ITH) is observed at different stages of tumor progression, metastasis and reouccurence, which can be important for clinical applications. We used RNA-sequencing data from tumor samples, and measured the level of ITH in terms of biological network states. To model complex relationships among genes, we used a protein interaction network to consider gene-gene dependency. ITH was measured by using an entropy-based distance metric between two networks, nJSD, with Jensen-Shannon Divergence (JSD). With nJSD, we defined transcriptome-based ITH (tITH). The effectiveness of tITH was extensively tested for the issues related with ITH using real biological data sets. Human cancer cell line data and single-cell sequencing data were investigated to verify our approach. Then, we analyzed TCGA pan-cancer 6,320 patients. Our result was in agreement with widely used genome-based ITH inference methods, while showed better performance at survival analysis. Analysis of mouse clonal evolution data further confirmed that our transcriptome-based ITH was consistent with genetic heterogeneity at different clonal evolution stages. Additionally, we found that cell cycle related pathways have significant contribution to increasing heterogeneity on the network during clonal evolution. We believe that the proposed transcriptome-based ITH is useful to characterize heterogeneity of a tumor sample at RNA level.
Collapse
Affiliation(s)
- Youngjune Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-742, Korea
| | - Sangsoo Lim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-742, Korea
| | - Jin-Wu Nam
- Department of Life Science, College of Natural Sciences, Hanyang University, Seoul, 133-791, Korea
- Research Institute for Natural Sciences, Hanyang University, Seoul, 133-791, Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-742, Korea
- Department of Computer Science and Engineering, Seoul National University, Seoul, 151-742, Korea
- Bioinformatics Institute, Seoul National University, Seoul, 151-742, Korea
| |
Collapse
|
78
|
Ancient Out-of-Africa Mitochondrial DNA Variants Associate with Distinct Mitochondrial Gene Expression Patterns. PLoS Genet 2016; 12:e1006407. [PMID: 27812116 PMCID: PMC5094714 DOI: 10.1371/journal.pgen.1006407] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2016] [Accepted: 10/06/2016] [Indexed: 11/19/2022] Open
Abstract
Mitochondrial DNA (mtDNA) variants have been traditionally used as markers to trace ancient population migrations. Although experiments relying on model organisms and cytoplasmic hybrids, as well as disease association studies, have served to underline the functionality of certain mtDNA SNPs, only little is known of the regulatory impact of ancient mtDNA variants, especially in terms of gene expression. By analyzing RNA-seq data of 454 lymphoblast cell lines from the 1000 Genomes Project, we found that mtDNA variants defining the most common African genetic background, the L haplogroup, exhibit a distinct overall mtDNA gene expression pattern, which was independent of mtDNA copy numbers. Secondly, intra-population analysis revealed subtle, yet significant, expression differences in four tRNA genes. Strikingly, the more prominent African mtDNA gene expression pattern best correlated with the expression of nuclear DNA-encoded RNA-binding proteins, and with SNPs within the mitochondrial RNA-binding proteins PTCD1 and MRPS7. Our results thus support the concept of an ancient regulatory transition of mtDNA-encoded genes as humans left Africa to populate the rest of the world. The mitochondrion is an organelle found in all cells of our body and plays a significant role in the energy and heat production. This is the only organelle in animal cells harboring its own genome outside of the nucleus. Mitochondrial DNA (mtDNA) variants have been traditionally used as neutral markers to trace ancient population migrations. As a result, the functional impact of human mtDNA population variants on gene regulation is poorly understood. To address this question, we analyzed available data of mtDNA gene expression pattern in a large group of individuals (454) from diverse human populations. Here, we show for the first time that the ancient migration of humans out of Africa correlated with differences in mitochondrial gene expression patterns, and could be explained by the activity of certain RNA-binding proteins. These findings suggest a major mitochondrial regulatory transition, as humans left Africa to populate the rest of the world.
Collapse
|
79
|
Critical re-evaluation of neuroglobin expression reveals conserved patterns among mammals. Neuroscience 2016; 337:339-354. [DOI: 10.1016/j.neuroscience.2016.07.042] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Revised: 07/26/2016] [Accepted: 07/26/2016] [Indexed: 01/08/2023]
|
80
|
TreeExp1.0: R Package for Analyzing Expression Evolution Based on RNA-Seq Data. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2016; 326:394-402. [DOI: 10.1002/jez.b.22707] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 09/24/2016] [Indexed: 01/14/2023]
|
81
|
Abstract
A new study helps resolve a controversy about determinants of gene expression variability and might facilitate the effective translation of research results across species.
Collapse
Affiliation(s)
- Ross C Hardison
- Department of Biochemistry and Molecular Biology, Huck Institute for Genome Sciences, The Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
82
|
Breschi A, Djebali S, Gillis J, Pervouchine DD, Dobin A, Davis CA, Gingeras TR, Guigó R. Gene-specific patterns of expression variation across organs and species. Genome Biol 2016; 17:151. [PMID: 27391956 PMCID: PMC4937605 DOI: 10.1186/s13059-016-1008-y] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Accepted: 06/14/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A comparison of transcriptional profiles derived from different tissues in a given species or among different species assumes that commonalities reflect evolutionarily conserved programs and that differences reflect species or tissue responses to environmental conditions or developmental program staging. Apparently conflicting results have been published regarding whether organ-specific transcriptional patterns dominate over species-specific patterns, or vice versa, making it unclear to what extent the biology of a given organism can be extrapolated to another. These studies have in common that they treat the transcriptomes monolithically, implicitly ignoring that each gene is likely to have a specific pattern of transcriptional variation across organs and species. RESULTS We use linear models to quantify this pattern. We find a continuum in the spectrum of expression variation: the expression of some genes varies considerably across species and little across organs, and simply reflects evolutionary distance. At the other extreme are genes whose expression varies considerably across organs and little across species; these genes are much more likely to be associated with diseases than are genes whose expression varies predominantly across species. CONCLUSIONS Whether transcriptomes, when considered globally, cluster preferentially according to one component or the other may not be a property of the transcriptomes, but rather a consequence of the dominant behavior of a subset of genes. Therefore, the values of the components of the variance of expression for each gene could become a useful resource when planning, interpreting, and extrapolating experimental data from mouse to humans.
Collapse
Affiliation(s)
- Alessandra Breschi
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Sarah Djebali
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- GenPhySE, Université de Toulouse, INRA, INPT, INP-ENVT, Castanet Tolosan, France
| | - Jesse Gillis
- Cold Spring Harbor LaboratoryCold Spring Harbor, NY, 11742, USA
| | - Dmitri D Pervouchine
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Alex Dobin
- Cold Spring Harbor LaboratoryCold Spring Harbor, NY, 11742, USA
| | - Carrie A Davis
- Cold Spring Harbor LaboratoryCold Spring Harbor, NY, 11742, USA
| | | | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain.
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.
| |
Collapse
|