1
|
Jackson DJ, Cerveau N, Posnien N. De novo assembly of transcriptomes and differential gene expression analysis using short-read data from emerging model organisms - a brief guide. Front Zool 2024; 21:17. [PMID: 38902827 PMCID: PMC11188175 DOI: 10.1186/s12983-024-00538-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 06/12/2024] [Indexed: 06/22/2024] Open
Abstract
Many questions in biology benefit greatly from the use of a variety of model systems. High-throughput sequencing methods have been a triumph in the democratization of diverse model systems. They allow for the economical sequencing of an entire genome or transcriptome of interest, and with technical variations can even provide insight into genome organization and the expression and regulation of genes. The analysis and biological interpretation of such large datasets can present significant challenges that depend on the 'scientific status' of the model system. While high-quality genome and transcriptome references are readily available for well-established model systems, the establishment of such references for an emerging model system often requires extensive resources such as finances, expertise and computation capabilities. The de novo assembly of a transcriptome represents an excellent entry point for genetic and molecular studies in emerging model systems as it can efficiently assess gene content while also serving as a reference for differential gene expression studies. However, the process of de novo transcriptome assembly is non-trivial, and as a rule must be empirically optimized for every dataset. For the researcher working with an emerging model system, and with little to no experience with assembling and quantifying short-read data from the Illumina platform, these processes can be daunting. In this guide we outline the major challenges faced when establishing a reference transcriptome de novo and we provide advice on how to approach such an endeavor. We describe the major experimental and bioinformatic steps, provide some broad recommendations and cautions for the newcomer to de novo transcriptome assembly and differential gene expression analyses. Moreover, we provide an initial selection of tools that can assist in the journey from raw short-read data to assembled transcriptome and lists of differentially expressed genes.
Collapse
Affiliation(s)
- Daniel J Jackson
- University of Göttingen, Department of Geobiology, Goldschmidtstr.3, Göttingen, 37077, Germany.
| | - Nicolas Cerveau
- University of Göttingen, Department of Geobiology, Goldschmidtstr.3, Göttingen, 37077, Germany
| | - Nico Posnien
- University of Göttingen, Department of Developmental Biology, GZMB, Justus-Von-Liebig-Weg 11, Göttingen, 37077, Germany.
| |
Collapse
|
2
|
Varela-Martínez E, Luigi-Sierra MG, Guan D, López-Béjar M, Casas E, Olvera-Maneu S, Gardela J, Palomo MJ, Osuagwuh UI, Ohaneje UL, Mármol-Sánchez E, Amills M. The landscape of long noncoding RNA expression in the goat brain. J Dairy Sci 2024; 107:4075-4091. [PMID: 38278299 DOI: 10.3168/jds.2023-23966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 12/22/2023] [Indexed: 01/28/2024]
Abstract
The brain regulates multiple metabolic processes, such as food intake, energy expenditure, insulin secretion, hepatic glucose production, and glucose and fatty acid metabolism in adipose tissue, which are fundamental for the maintenance of energy and glucose homeostasis during lactation and pregnancy. In addition, brain expression has a fundamental impact on the development of maternal behavior. Although brain functions are partly regulated by long noncoding RNAs (lncRNAs), their expression profiles have not been characterized in depth in any ruminant species. We have sequenced the transcriptome of 12 brain tissues from 3 goats that were 1 mo pregnant and 4 nonpregnant goats to investigate their lncRNA expression patterns. Between 4,363 (adenohypophysis) and 4,604 (olfactory bulb) lncRNAs were expressed in brain tissues, leading us to establish a set of 794 already annotated lncRNAs and 5,098 novel lncRNA candidates. The detected lncRNAs shared features with those of other mammals, and tissue-specific lncRNAs were enriched in brain development-related terms. Differential expression analyses between goats that were 1 mo pregnant and nonpregnant goats showed that the lncRNA expression profiles of certain brain regions experience substantial changes associated with early pregnancy (238 lncRNAs are differentially expressed in the olfactory bulb), but others do not. Enrichment analysis showed that differentially expressed lncRNAs from the olfactory bulb are co-expressed with genes previously linked to behavioral changes related to pregnancy. These findings provide a first characterization of the landscape of lncRNA expression in the goat brain and provides valuable clues to understand the molecular events triggered by early pregnancy in the central nervous system.
Collapse
Affiliation(s)
- Endika Varela-Martínez
- Department of Genetics, Physical Anthropology and Animal Physiology, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), B. Sarriena, Leioa 48940, Spain; Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus Universitat Autònoma de Barcelona, Bellaterra 08193, Spain
| | - María Gracia Luigi-Sierra
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus Universitat Autònoma de Barcelona, Bellaterra 08193, Spain
| | - Dailu Guan
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus Universitat Autònoma de Barcelona, Bellaterra 08193, Spain
| | - Manel López-Béjar
- Department of Animal Health and Anatomy, Universitat Autònoma de Barcelona, Bellaterra 08193, Spain
| | - Encarna Casas
- Department of Animal Health and Anatomy, Universitat Autònoma de Barcelona, Bellaterra 08193, Spain
| | - Sergi Olvera-Maneu
- Department of Animal Health and Anatomy, Universitat Autònoma de Barcelona, Bellaterra 08193, Spain; Department of Veterinary Medicine, University of Nicosia School of Veterinary Medicine, 2414 Nicosia, Cyprus
| | - Jaume Gardela
- Department of Animal Health and Anatomy, Universitat Autònoma de Barcelona, Bellaterra 08193, Spain
| | - Maria Jesús Palomo
- Department of Animal Medicine and Surgery, Universitat Autònoma de Barcelona, Bellaterra 08193, Spain
| | - Uchebuchi Ike Osuagwuh
- Department of Animal Medicine and Surgery, Universitat Autònoma de Barcelona, Bellaterra 08193, Spain
| | - Uchechi Linda Ohaneje
- Department of Animal Medicine and Surgery, Universitat Autònoma de Barcelona, Bellaterra 08193, Spain
| | - Emilio Mármol-Sánchez
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus Universitat Autònoma de Barcelona, Bellaterra 08193, Spain
| | - Marcel Amills
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus Universitat Autònoma de Barcelona, Bellaterra 08193, Spain; Department de Ciència Animal I dels Aliments, Universitat Autònoma de Barcelona, Bellaterra 08193, Spain.
| |
Collapse
|
3
|
Bonthala VS, Stich B. StCoExpNet: a global co-expression network analysis facilitates identifying genes underlying agronomic traits in potatoes. PLANT CELL REPORTS 2024; 43:117. [PMID: 38622429 PMCID: PMC11018665 DOI: 10.1007/s00299-024-03201-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 03/18/2024] [Indexed: 04/17/2024]
Abstract
KEY MESSAGE We constructed a gene expression atlas and co-expression network for potatoes and identified several novel genes associated with various agronomic traits. This resource will accelerate potato genetics and genomics research. Potato (Solanum tuberosum L.) is the world's most crucial non-cereal food crop and ranks third in food production after wheat and rice. Despite the availability of several potato transcriptome datasets at public databases like NCBI SRA, an effort has yet to be put into developing a global transcriptome atlas and a co-expression network for potatoes. The objectives of our study were to construct a global expression atlas for potatoes using publicly available transcriptome datasets, identify housekeeping and tissue-specific genes, construct a global co-expression network and identify co-expression clusters, investigate the transcriptional complexity of genes involved in various essential biological processes related to agronomic traits, and provide a web server (StCoExpNet) to easily access the newly constructed expression atlas and co-expression network to investigate the expression and co-expression of genes of interest. In this study, we used data from 2299 publicly available potato transcriptome samples obtained from 15 different tissues to construct a global transcriptome atlas. We found that roughly 87% of the annotated genes exhibited detectable expression in at least one sample. Among these, we identified 281 genes with consistent and stable expression levels, indicating their role as housekeeping genes. Conversely, 308 genes exhibited marked tissue-specific expression patterns. We exemplarily linked some co-expression clusters to important agronomic traits of potatoes, such as self-incompatibility, anthocyanin biosynthesis, tuberization, and defense responses against multiple pathogens. The dataset compiled here constitutes a new resource (StCoExpNet), which can be accessed at https://stcoexpnet.julius-kuehn.de . This transcriptome atlas and the co-expression network will accelerate potato genetics and genomics research.
Collapse
Affiliation(s)
- Venkata Suresh Bonthala
- Institute of Quantitative Genetics and Genomics of Plants, Heinrich Heine University of Düsseldorf, Düsseldorf, Germany.
| | - Benjamin Stich
- Institute of Quantitative Genetics and Genomics of Plants, Heinrich Heine University of Düsseldorf, Düsseldorf, Germany
- Julius Kühn-Institut (JKI), Institute for Breeding Research On Agricultural Crops, Rudolf-Schick-Platz 3a, OT Groß Lüsewitz, 18190, Sanitz, Germany
- Max Planck Institute for Plant Breeding Research, Köln, Germany
- Cluster of Excellence On Plant Sciences, From Complex Traits Towards Synthetic Modules, Düsseldorf, Germany
| |
Collapse
|
4
|
Carrillo-Perez F, Pizurica M, Zheng Y, Nandi TN, Madduri R, Shen J, Gevaert O. Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models. Nat Biomed Eng 2024:10.1038/s41551-024-01193-8. [PMID: 38514775 DOI: 10.1038/s41551-024-01193-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 02/29/2024] [Indexed: 03/23/2024]
Abstract
Training machine-learning models with synthetically generated data can alleviate the problem of data scarcity when acquiring diverse and sufficiently large datasets is costly and challenging. Here we show that cascaded diffusion models can be used to synthesize realistic whole-slide image tiles from latent representations of RNA-sequencing data from human tumours. Alterations in gene expression affected the composition of cell types in the generated synthetic image tiles, which accurately preserved the distribution of cell types and maintained the cell fraction observed in bulk RNA-sequencing data, as we show for lung adenocarcinoma, kidney renal papillary cell carcinoma, cervical squamous cell carcinoma, colon adenocarcinoma and glioblastoma. Machine-learning models pretrained with the generated synthetic data performed better than models trained from scratch. Synthetic data may accelerate the development of machine-learning models in scarce-data settings and allow for the imputation of missing data modalities.
Collapse
Affiliation(s)
- Francisco Carrillo-Perez
- Stanford Center for Biomedical Informatics Research (BMIR), Stanford University, School of Medicine, Stanford, CA, USA
| | - Marija Pizurica
- Stanford Center for Biomedical Informatics Research (BMIR), Stanford University, School of Medicine, Stanford, CA, USA
- Internet technology and Data science Lab (IDLab), Ghent University, Ghent, Belgium
| | - Yuanning Zheng
- Stanford Center for Biomedical Informatics Research (BMIR), Stanford University, School of Medicine, Stanford, CA, USA
| | - Tarak Nath Nandi
- Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, USA
| | - Ravi Madduri
- Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, USA
| | - Jeanne Shen
- Department of Pathology, Stanford University, School of Medicine, Palo Alto, CA, USA
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Stanford University, School of Medicine, Stanford, CA, USA.
- Department of Biomedical Data Science, Stanford University, School of Medicine, Stanford, CA, USA.
| |
Collapse
|
5
|
Zinati Z, Nazari L. Deciphering the molecular basis of abiotic stress response in cucumber (Cucumis sativus L.) using RNA-Seq meta-analysis, systems biology, and machine learning approaches. Sci Rep 2023; 13:12942. [PMID: 37558755 PMCID: PMC10412635 DOI: 10.1038/s41598-023-40189-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Accepted: 08/06/2023] [Indexed: 08/11/2023] Open
Abstract
Abiotic stress in cucumber (Cucumis sativus L.) may trigger distinct transcriptome responses, resulting in significant yield loss. More insight into the molecular underpinnings of the stress response can be gained by combining RNA-Seq meta-analysis with systems biology and machine learning. This can help pinpoint possible targets for engineering abiotic tolerance by revealing functional modules and key genes essential for the stress response. Therefore, to investigate the regulatory mechanism and key genes, a combination of these approaches was utilized in cucumber subjected to various abiotic stresses. Three significant abiotic stress-related modules were identified by gene co-expression network analysis (WGCNA). Three hub genes (RPL18, δ-COP, and EXLA2), ten transcription factors (TFs), one transcription regulator, and 12 protein kinases (PKs) were introduced as key genes. The results suggest that the identified PKs probably govern the coordination of cellular responses to abiotic stress in cucumber. Moreover, the C2H2 TF family may play a significant role in cucumber response to abiotic stress. Several C2H2 TF target stress-related genes were identified through co-expression and promoter analyses. Evaluation of the key identified genes using Random Forest, with an area under the curve of ROC (AUC) of 0.974 and an accuracy rate of 88.5%, demonstrates their prominent contributions in the cucumber response to abiotic stresses. These findings provide novel insights into the regulatory mechanism underlying abiotic stress response in cucumber and pave the way for cucumber genetic engineering toward improving tolerance ability under abiotic stress.
Collapse
Affiliation(s)
- Zahra Zinati
- Department of Agroecology, College of Agriculture and Natural Resources of Darab, Shiraz University, Shiraz, Iran.
| | - Leyla Nazari
- Crop and Horticultural Science Research Department, Fars Agricultural and Natural Resources Research and Education Center, Agricultural Research, Education and Extension Organization (AREEO), Shiraz, Iran.
| |
Collapse
|
6
|
Grafanaki K, Grammatikakis I, Ghosh A, Gopalan V, Olgun G, Liu H, Kyriakopoulos GC, Skeparnias I, Georgiou S, Stathopoulos C, Hannenhalli S, Merlino G, Marie KL, Day CP. Noncoding RNA circuitry in melanoma onset, plasticity, and therapeutic response. Pharmacol Ther 2023; 248:108466. [PMID: 37301330 PMCID: PMC10527631 DOI: 10.1016/j.pharmthera.2023.108466] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 05/24/2023] [Accepted: 05/31/2023] [Indexed: 06/12/2023]
Abstract
Melanoma, the cancer of the melanocyte, is the deadliest form of skin cancer with an aggressive nature, propensity to metastasize and tendency to resist therapeutic intervention. Studies have identified that the re-emergence of developmental pathways in melanoma contributes to melanoma onset, plasticity, and therapeutic response. Notably, it is well known that noncoding RNAs play a critical role in the development and stress response of tissues. In this review, we focus on the noncoding RNAs, including microRNAs, long non-coding RNAs, circular RNAs, and other small RNAs, for their functions in developmental mechanisms and plasticity, which drive onset, progression, therapeutic response and resistance in melanoma. Going forward, elucidation of noncoding RNA-mediated mechanisms may provide insights that accelerate development of novel melanoma therapies.
Collapse
Affiliation(s)
- Katerina Grafanaki
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA; Department of Dermatology, School of Medicine, University of Patras, 26504 Patras, Greece
| | - Ioannis Grammatikakis
- Cancer Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Arin Ghosh
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Vishaka Gopalan
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gulden Olgun
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Huaitian Liu
- Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - George C Kyriakopoulos
- Department of Biochemistry, School of Medicine, University of Patras, 26504 Patras, Greece
| | - Ilias Skeparnias
- Laboratory of Molecular Biology, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, USA
| | - Sophia Georgiou
- Department of Dermatology, School of Medicine, University of Patras, 26504 Patras, Greece
| | | | - Sridhar Hannenhalli
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Glenn Merlino
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Kerrie L Marie
- Division of Molecular and Cellular Function, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK.
| | - Chi-Ping Day
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
7
|
Zheng Y, Jun J, Brennan K, Gevaert O. EpiMix is an integrative tool for epigenomic subtyping using DNA methylation. CELL REPORTS METHODS 2023; 3:100515. [PMID: 37533639 PMCID: PMC10391348 DOI: 10.1016/j.crmeth.2023.100515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 04/12/2023] [Accepted: 06/01/2023] [Indexed: 08/04/2023]
Abstract
DNA methylation (DNAme) is a major epigenetic factor influencing gene expression with alterations leading to cancer and immunological and cardiovascular diseases. Recent technological advances have enabled genome-wide profiling of DNAme in large human cohorts. There is a need for analytical methods that can more sensitively detect differential methylation profiles present in subsets of individuals from these heterogeneous, population-level datasets. We developed an end-to-end analytical framework named "EpiMix" for population-level analysis of DNAme and gene expression. Compared with existing methods, EpiMix showed higher sensitivity in detecting abnormal DNAme that was present in only small patient subsets. We extended the model-based analyses of EpiMix to cis-regulatory elements within protein-coding genes, distal enhancers, and genes encoding microRNAs and long non-coding RNAs (lncRNAs). Using cell-type-specific data from two separate studies, we discover epigenetic mechanisms underlying childhood food allergy and survival-associated, methylation-driven ncRNAs in non-small cell lung cancer.
Collapse
Affiliation(s)
- Yuanning Zheng
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine & Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - John Jun
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine & Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Kevin Brennan
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine & Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine & Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
8
|
Carrillo-Perez F, Pizurica M, Zheng Y, Nandi TN, Madduri R, Shen J, Gevaert O. RNA-to-image multi-cancer synthesis using cascaded diffusion models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.13.523899. [PMID: 36711711 PMCID: PMC9882105 DOI: 10.1101/2023.01.13.523899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Data scarcity presents a significant obstacle in the field of biomedicine, where acquiring diverse and sufficient datasets can be costly and challenging. Synthetic data generation offers a potential solution to this problem by expanding dataset sizes, thereby enabling the training of more robust and generalizable machine learning models. Although previous studies have explored synthetic data generation for cancer diagnosis, they have predominantly focused on single modality settings, such as whole-slide image tiles or RNA-Seq data. To bridge this gap, we propose a novel approach, RNA-Cascaded-Diffusion-Model or RNA-CDM, for performing RNA-to-image synthesis in a multi-cancer context, drawing inspiration from successful text-to-image synthesis models used in natural images. In our approach, we employ a variational auto-encoder to reduce the dimensionality of a patient's gene expression profile, effectively distinguishing between different types of cancer. Subsequently, we employ a cascaded diffusion model to synthesize realistic whole-slide image tiles using the latent representation derived from the patient's RNA-Seq data. Our results demonstrate that the generated tiles accurately preserve the distribution of cell types observed in real-world data, with state-of-the-art cell identification models successfully detecting important cell types in the synthetic samples. Furthermore, we illustrate that the synthetic tiles maintain the cell fraction observed in bulk RNA-Seq data and that modifications in gene expression affect the composition of cell types in the synthetic tiles. Next, we utilize the synthetic data generated by RNA-CDM to pretrain machine learning models and observe improved performance compared to training from scratch. Our study emphasizes the potential usefulness of synthetic data in developing machine learning models in sarce-data settings, while also highlighting the possibility of imputing missing data modalities by leveraging the available information. In conclusion, our proposed RNA-CDM approach for synthetic data generation in biomedicine, particularly in the context of cancer diagnosis, offers a novel and promising solution to address data scarcity. By generating synthetic data that aligns with real-world distributions and leveraging it to pretrain machine learning models, we contribute to the development of robust clinical decision support systems and potential advancements in precision medicine.
Collapse
|
9
|
Zheng Y, Jun J, Brennan K, Gevaert O. EpiMix: an integrative tool for epigenomic subtyping using DNA methylation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.03.522660. [PMID: 36711917 PMCID: PMC9881910 DOI: 10.1101/2023.01.03.522660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
DNA methylation (DNAme) is a major epigenetic factor influencing gene expression with alterations leading to cancer, immunological, and cardiovascular diseases. Recent technological advances enable genome-wide quantification of DNAme in large human cohorts. So far, existing methods have not been evaluated to identify differential DNAme present in large and heterogeneous patient cohorts. We developed an end-to-end analytical framework named "EpiMix" for population-level analysis of DNAme and gene expression. Compared to existing methods, EpiMix showed higher sensitivity in detecting abnormal DNAme that was present in only small patient subsets. We extended the model-based analyses of EpiMix to cis-regulatory elements within protein-coding genes, distal enhancers, and genes encoding microRNAs and lncRNAs. Using cell-type specific data from two separate studies, we discovered novel epigenetic mechanisms underlying childhood food allergy and survival-associated, methylation-driven non-coding RNAs in non-small cell lung cancer.
Collapse
Affiliation(s)
- Yuanning Zheng
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine & Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - John Jun
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine & Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Kevin Brennan
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine & Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine & Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
10
|
Brennan K, Zheng H, Fahrner JA, Shin JH, Gentles AJ, Schaefer B, Sunwoo JB, Bernstein JA, Gevaert O. NSD1 mutations deregulate transcription and DNA methylation of bivalent developmental genes in Sotos syndrome. Hum Mol Genet 2022; 31:2164-2184. [PMID: 35094088 PMCID: PMC9262396 DOI: 10.1093/hmg/ddac026] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 01/04/2022] [Accepted: 01/19/2022] [Indexed: 11/13/2022] Open
Abstract
Sotos syndrome (SS), the most common overgrowth with intellectual disability (OGID) disorder, is caused by inactivating germline mutations of NSD1, which encodes a histone H3 lysine 36 methyltransferase. To understand how NSD1 inactivation deregulates transcription and DNA methylation (DNAm), and to explore how these abnormalities affect human development, we profiled transcription and DNAm in SS patients and healthy control individuals. We identified a transcriptional signature that distinguishes individuals with SS from controls and was also deregulated in NSD1-mutated cancers. Most abnormally expressed genes displayed reduced expression in SS; these downregulated genes consisted mostly of bivalent genes and were enriched for regulators of development and neural synapse function. DNA hypomethylation was strongly enriched within promoters of transcriptionally deregulated genes: overexpressed genes displayed hypomethylation at their transcription start sites while underexpressed genes featured hypomethylation at polycomb binding sites within their promoter CpG island shores. SS patients featured accelerated molecular aging at the levels of both transcription and DNAm. Overall, these findings indicate that NSD1-deposited H3K36 methylation regulates transcription by directing promoter DNA methylation, partially by repressing polycomb repressive complex 2 (PRC2) activity. These findings could explain the phenotypic similarity of SS to OGID disorders that are caused by mutations in PRC2 complex-encoding genes.
Collapse
Affiliation(s)
- Kevin Brennan
- Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Hong Zheng
- Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Jill A Fahrner
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - June Ho Shin
- Department of Otolaryngology – Head and Neck Surgery, Stanford University School of Medicine, Palo Alto, CA 94305, USA
| | - Andrew J Gentles
- Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Bradley Schaefer
- Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - John B Sunwoo
- Department of Otolaryngology – Head and Neck Surgery, Stanford University School of Medicine, Palo Alto, CA 94305, USA
| | - Jonathan A Bernstein
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
11
|
You Y, Tian L, Su S, Dong X, Jabbari JS, Hickey PF, Ritchie ME. Benchmarking UMI-based single-cell RNA-seq preprocessing workflows. Genome Biol 2021; 22:339. [PMID: 34906205 PMCID: PMC8672463 DOI: 10.1186/s13059-021-02552-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 11/22/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Single-cell RNA-sequencing (scRNA-seq) technologies and associated analysis methods have rapidly developed in recent years. This includes preprocessing methods, which assign sequencing reads to genes to create count matrices for downstream analysis. While several packaged preprocessing workflows have been developed to provide users with convenient tools for handling this process, how they compare to one another and how they influence downstream analysis have not been well studied. RESULTS Here, we systematically benchmark the performance of 10 end-to-end preprocessing workflows (Cell Ranger, Optimus, salmon alevin, alevin-fry, kallisto bustools, dropSeqPipe, scPipe, zUMIs, celseq2, and scruff) using datasets yielding different biological complexity levels generated by CEL-Seq2 and 10x Chromium platforms. We compare these workflows in terms of their quantification properties directly and their impact on normalization and clustering by evaluating the performance of different method combinations. While the scRNA-seq preprocessing workflows compared vary in their detection and quantification of genes across datasets, after downstream analysis with performant normalization and clustering methods, almost all combinations produce clustering results that agree well with the known cell type labels that provided the ground truth in our analysis. CONCLUSIONS In summary, the choice of preprocessing method was found to be less important than other steps in the scRNA-seq analysis process. Our study comprehensively compares common scRNA-seq preprocessing workflows and summarizes their characteristics to guide workflow users.
Collapse
Affiliation(s)
- Yue You
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Luyi Tian
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Shian Su
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Xueyi Dong
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Jafar S. Jabbari
- Australian Genome Research Facility, Victorian Comprehensive Cancer Centre, Melbourne, Australia
- Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology and Immunology, The University of Melbourne at The Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Peter F. Hickey
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
- Single-Cell Open Research Endeavour (SCORE), The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
| | - Matthew E. Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
- School of Mathematics and Statistics, The University of Melbourne, Parkville, Australia
| |
Collapse
|
12
|
Pu W, Zhao C, Wazir J, Su Z, Niu M, Song S, Wei L, Li L, Zhang X, Shi X, Wang H. Comparative transcriptomic analysis of THP-1-derived macrophages infected with Mycobacterium tuberculosis H37Rv, H37Ra and BCG. J Cell Mol Med 2021; 25:10504-10520. [PMID: 34632719 PMCID: PMC8581329 DOI: 10.1111/jcmm.16980] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 08/31/2021] [Accepted: 09/08/2021] [Indexed: 12/12/2022] Open
Abstract
Tuberculosis (TB) remains a worldwide healthcare concern, and the exploration of the host‐pathogen interaction is essential to develop therapeutic modalities and strategies to control Mycobacterium tuberculosis (M.tb). In this study, RNA sequencing (transcriptome sequencing) was employed to investigate the global transcriptome changes in the macrophages during the different strains of M.tb infection. THP‐1 cells derived from macrophages were exposed to the virulent M.tb strain H37Rv (Rv) or the avirulent M.tb strain H37Ra (Ra), and the M.tb BCG vaccine strain was used as a control. The cDNA libraries were prepared from M.tb‐infected macrophages and then sequenced. To assess the transcriptional differences between the expressed genes, the bioinformatics analysis was performed using a standard pipeline of quality control, reference mapping, differential expression analysis, protein‐protein interaction (PPI) networks, gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. Q‐PCR and Western blot assays were also performed to validate the data. Our findings indicated that, when compared to BCG or M.tb H37Ra infection, the transcriptome analysis identified 66 differentially expressed genes in the M.tb H37Rv‐infected macrophages, out of which 36 genes were up‐regulated, and 30 genes were down‐regulated. The up‐regulated genes were associated with immune response regulation, chemokine secretion, and leucocyte chemotaxis. In contrast, the down‐regulated genes were associated with amino acid biosynthetic and energy metabolism, connective tissue development and extracellular matrix organization. The Q‐PCR and Western blot assays confirmed increased expression of pro‐inflammatory factors, altered energy metabolic processes, enhanced activation of pro‐inflammatory signalling pathways and increased pyroptosis in H37Rv‐infected macrophage. Overall, our RNA sequencing‐based transcriptome study successfully identified a comprehensive, in‐depth gene expression/regulation profile in M.tb‐infected macrophages. The results demonstrated that virulent M.tb strain H37Rv infection triggers a more severe inflammatory immune response associated with increased tissue damage, which helps in understanding the host‐pathogen interaction dynamics and pathogenesis features in different strains of M.tb infection.
Collapse
Affiliation(s)
- Wenyuan Pu
- State Key Laboratory of Analytical Chemistry for Life Science, Medical School of Nanjing University, Nanjing, China.,Center for Translational Medicine and Jiangsu Key Laboratory of Molecular Medicine, Medical School of Nanjing University, Nanjing, China
| | - Chen Zhao
- State Key Laboratory of Analytical Chemistry for Life Science, Medical School of Nanjing University, Nanjing, China.,Center for Translational Medicine and Jiangsu Key Laboratory of Molecular Medicine, Medical School of Nanjing University, Nanjing, China
| | - Junaid Wazir
- State Key Laboratory of Analytical Chemistry for Life Science, Medical School of Nanjing University, Nanjing, China.,Center for Translational Medicine and Jiangsu Key Laboratory of Molecular Medicine, Medical School of Nanjing University, Nanjing, China
| | - Zhonglan Su
- Department of Dermatology, the First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Mengyuan Niu
- State Key Laboratory of Analytical Chemistry for Life Science, Medical School of Nanjing University, Nanjing, China.,Center for Translational Medicine and Jiangsu Key Laboratory of Molecular Medicine, Medical School of Nanjing University, Nanjing, China
| | - Shiyu Song
- State Key Laboratory of Analytical Chemistry for Life Science, Medical School of Nanjing University, Nanjing, China.,Center for Translational Medicine and Jiangsu Key Laboratory of Molecular Medicine, Medical School of Nanjing University, Nanjing, China
| | - Lulu Wei
- State Key Laboratory of Analytical Chemistry for Life Science, Medical School of Nanjing University, Nanjing, China.,Center for Translational Medicine and Jiangsu Key Laboratory of Molecular Medicine, Medical School of Nanjing University, Nanjing, China
| | - Li Li
- State Key Laboratory of Analytical Chemistry for Life Science, Medical School of Nanjing University, Nanjing, China.,Center for Translational Medicine and Jiangsu Key Laboratory of Molecular Medicine, Medical School of Nanjing University, Nanjing, China
| | - Xia Zhang
- Nanjing Public Health Clinical Center, the Second hospital of Nanjing, Nanjing University of Chinese Medicine, Nanjing, China
| | - Xudong Shi
- Nanjing Public Health Clinical Center, the Second hospital of Nanjing, Nanjing University of Chinese Medicine, Nanjing, China
| | - Hongwei Wang
- State Key Laboratory of Analytical Chemistry for Life Science, Medical School of Nanjing University, Nanjing, China.,Center for Translational Medicine and Jiangsu Key Laboratory of Molecular Medicine, Medical School of Nanjing University, Nanjing, China
| |
Collapse
|
13
|
Hamaguchi Y, Zeng C, Hamada M. Impact of human gene annotations on RNA-seq differential expression analysis. BMC Genomics 2021; 22:730. [PMID: 34625021 PMCID: PMC8501603 DOI: 10.1186/s12864-021-08038-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Accepted: 09/23/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Differential expression (DE) analysis of RNA-seq data typically depends on gene annotations. Different sets of gene annotations are available for the human genome and are continually updated-a process complicated with the development and application of high-throughput sequencing technologies. However, the impact of the complexity of gene annotations on DE analysis remains unclear. RESULTS Using "mappability", a metric of the complexity of gene annotation, we compared three distinct human gene annotations, GENCODE, RefSeq, and NONCODE, and evaluated how mappability affected DE analysis. We found that mappability was significantly different among the human gene annotations. We also found that increasing mappability improved the performance of DE analysis, and the impact of mappability mainly evident in the quantification step and propagated downstream of DE analysis systematically. CONCLUSIONS We assessed how the complexity of gene annotations affects DE analysis using mappability. Our findings indicate that the growth and complexity of gene annotations negatively impact the performance of DE analysis, suggesting that an approach that excludes unnecessary gene models from gene annotations improves the performance of DE analysis.
Collapse
Affiliation(s)
- Yu Hamaguchi
- Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1 Okubo Shinjuku-ku, Tokyo, 169-8555 Japan
| | - Chao Zeng
- Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1 Okubo Shinjuku-ku, Tokyo, 169-8555 Japan
- AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), 3-4-1, Okubo Shinjuku-ku, Tokyo, 169-8555 Japan
| | - Michiaki Hamada
- Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1 Okubo Shinjuku-ku, Tokyo, 169-8555 Japan
- AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), 3-4-1, Okubo Shinjuku-ku, Tokyo, 169-8555 Japan
- Institute for Medical-oriented Structural Biology, Waseda University, 2-2, Wakamatsu-cho Shinjuku-ku, Tokyo, 162-8480 Japan
- Graduate School of Medicine, Nippon Medical School, 1-1-5, Sendagi, Bunkyo-ku, Tokyo, 113-8602 Japan
| |
Collapse
|
14
|
Transcriptional Reprogramming and Constitutive PD-L1 Expression in Melanoma Are Associated with Dedifferentiation and Activation of Interferon and Tumour Necrosis Factor Signalling Pathways. Cancers (Basel) 2021; 13:cancers13174250. [PMID: 34503064 PMCID: PMC8428231 DOI: 10.3390/cancers13174250] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 08/07/2021] [Accepted: 08/13/2021] [Indexed: 12/13/2022] Open
Abstract
Simple Summary Melanoma, an aggressive form of skin cancer, is frequently associated with drug resistance in the advanced stages. For instance, frequently resistance is observed in sequential treatment of melanoma with targeted therapy and immunotherapy. In this research, the authors investigated whether potential transcriptional mechanisms and pathways associated with PD-L1 protein expression could underlie targeted therapy drug resistance in melanoma. The authors found a PD-L1 expression transcriptional pattern underlies resistance to targeted therapy in a subgroup of melanomas. These melanomas were markedly dedifferentiated, as compared to melanomas that were not drug resistant. Understanding changes in transcription and molecular pathways that lead to drug resistance could allow researchers to develop interventions to prevent drug resistance from occurring in melanoma, which could also be relevant to other cancer types. Abstract Melanoma is the most aggressive type of skin cancer, with increasing incidence worldwide. Advances in targeted therapy and immunotherapy have improved the survival of melanoma patients experiencing recurrent disease, but unfortunately treatment resistance frequently reduces patient survival. Resistance to targeted therapy is associated with transcriptomic changes and has also been shown to be accompanied by increased expression of programmed death ligand 1 (PD-L1), a potent inhibitor of immune response. Intrinsic upregulation of PD-L1 is associated with genome-wide DNA hypomethylation and widespread alterations in gene expression in melanoma cell lines. However, an in-depth analysis of the transcriptomic landscape of melanoma cells with intrinsically upregulated PD-L1 expression is lacking. To determine the transcriptomic landscape of intrinsically upregulated PD-L1 expression in melanoma, we investigated transcriptomes in melanomas with constitutive versus inducible PD-L1 expression (referred to as PD-L1CON and PD-L1IND). RNA-Seq analysis was performed on seven PD-L1CON melanoma cell lines and ten melanoma cell lines with low inducible PD-L1IND expression. We observed that PD-L1CON melanoma cells had a reprogrammed transcriptome with a characteristic pattern of dedifferentiated gene expression, together with active interferon (IFN) and tumour necrosis factor (TNF) signalling pathways. Furthermore, we identified key transcription factors that were also differentially expressed in PD-L1CON versus PD-L1IND melanoma cell lines. Overall, our studies describe transcriptomic reprogramming of melanomas with PD-L1CON expression.
Collapse
|
15
|
Krappinger JC, Bonstingl L, Pansy K, Sallinger K, Wreglesworth NI, Grinninger L, Deutsch A, El-Heliebi A, Kroneis T, Mcfarlane RJ, Sensen CW, Feichtinger J. Non-coding Natural Antisense Transcripts: Analysis and Application. J Biotechnol 2021; 340:75-101. [PMID: 34371054 DOI: 10.1016/j.jbiotec.2021.08.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 06/30/2021] [Accepted: 08/04/2021] [Indexed: 12/12/2022]
Abstract
Non-coding natural antisense transcripts (ncNATs) are regulatory RNA sequences that are transcribed in the opposite direction to protein-coding or non-coding transcripts. These transcripts are implicated in a broad variety of biological and pathological processes, including tumorigenesis and oncogenic progression. With this complex field still in its infancy, annotations, expression profiling and functional characterisations of ncNATs are far less comprehensive than those for protein-coding genes, pointing out substantial gaps in the analysis and characterisation of these regulatory transcripts. In this review, we discuss ncNATs from an analysis perspective, in particular regarding the use of high-throughput sequencing strategies, such as RNA-sequencing, and summarize the unique challenges of investigating the antisense transcriptome. Finally, we elaborate on their potential as biomarkers and future targets for treatment, focusing on cancer.
Collapse
Affiliation(s)
- Julian C Krappinger
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Christian Doppler Laboratory for innovative Pichia pastoris host and vector systems, Division of Cell Biology, Histology and Embryology, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria
| | - Lilli Bonstingl
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Center for Biomarker Research in Medicine, Stiftingtalstraße 5, 8010 Graz, Austria
| | - Katrin Pansy
- Division of Haematology, Medical University of Graz, Stiftingtalstrasse 24, 8010 Graz, Austria
| | - Katja Sallinger
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Center for Biomarker Research in Medicine, Stiftingtalstraße 5, 8010 Graz, Austria
| | - Nick I Wreglesworth
- North West Cancer Research Institute, School of Medical Sciences, Bangor University, LL57 2UW Bangor, United Kingdom
| | - Lukas Grinninger
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Austrian Biotech University of Applied Sciences, Konrad Lorenz-Straße 10, 3430 Tulln an der Donau, Austria
| | - Alexander Deutsch
- Division of Haematology, Medical University of Graz, Stiftingtalstrasse 24, 8010 Graz, Austria; BioTechMed-Graz, Mozartgasse 12/II, 8010 Graz, Austria
| | - Amin El-Heliebi
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Center for Biomarker Research in Medicine, Stiftingtalstraße 5, 8010 Graz, Austria
| | - Thomas Kroneis
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Center for Biomarker Research in Medicine, Stiftingtalstraße 5, 8010 Graz, Austria
| | - Ramsay J Mcfarlane
- North West Cancer Research Institute, School of Medical Sciences, Bangor University, LL57 2UW Bangor, United Kingdom
| | - Christoph W Sensen
- BioTechMed-Graz, Mozartgasse 12/II, 8010 Graz, Austria; Institute of Computational Biotechnology, Graz University of Technology, Petersgasse 14/V, 8010 Graz, Austria; HCEMM Kft., Római blvd. 21, 6723 Szeged, Hungary
| | - Julia Feichtinger
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Christian Doppler Laboratory for innovative Pichia pastoris host and vector systems, Division of Cell Biology, Histology and Embryology, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; BioTechMed-Graz, Mozartgasse 12/II, 8010 Graz, Austria.
| |
Collapse
|
16
|
Singh N. Role of mammalian long non-coding RNAs in normal and neuro oncological disorders. Genomics 2021; 113:3250-3273. [PMID: 34302945 DOI: 10.1016/j.ygeno.2021.07.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/10/2021] [Accepted: 07/14/2021] [Indexed: 12/09/2022]
Abstract
Long non-coding RNAs (lncRNAs) are expressed at lower levels than protein-coding genes but have a crucial role in gene regulation. LncRNA is distinct, they are being transcribed using RNA polymerase II, and their functionality depends on subcellular localization. Depending on their niche, they specifically interact with DNA, RNA, and proteins and modify chromatin function, regulate transcription at various stages, forms nuclear condensation bodies and nucleolar organization. lncRNAs may also change the stability and translation of cytoplasmic mRNAs and hamper signaling pathways. Thus, lncRNAs affect the physio-pathological states and lead to the development of various disorders, immune responses, and cancer. To date, ~40% of lncRNAs have been reported in the nervous system (NS) and are involved in the early development/differentiation of the NS to synaptogenesis. LncRNA expression patterns in the most common adult and pediatric tumor suggest them as potential biomarkers and provide a rationale for targeting them pharmaceutically. Here, we discuss the mechanisms of lncRNA synthesis, localization, and functions in transcriptional, post-transcriptional, and other forms of gene regulation, methods of lncRNA identification, and their potential therapeutic applications in neuro oncological disorders as explained by molecular mechanisms in other malignant disorders.
Collapse
Affiliation(s)
- Neetu Singh
- Molecular Biology Unit, Department of Centre for Advance Research, King George's Medical University, Lucknow, Uttar Pradesh 226 003, India.
| |
Collapse
|
17
|
Chowdhary A, Satagopam V, Schneider R. Long Non-coding RNAs: Mechanisms, Experimental, and Computational Approaches in Identification, Characterization, and Their Biomarker Potential in Cancer. Front Genet 2021; 12:649619. [PMID: 34276764 PMCID: PMC8281131 DOI: 10.3389/fgene.2021.649619] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Accepted: 04/20/2021] [Indexed: 01/09/2023] Open
Abstract
Long non-coding RNAs are diverse class of non-coding RNA molecules >200 base pairs of length having various functions like gene regulation, dosage compensation, epigenetic regulation. Dysregulation and genomic variations of several lncRNAs have been implicated in several diseases. Their tissue and developmental specific expression are contributing factors for them to be viable indicators of physiological states of the cells. Here we present an comprehensive review the molecular mechanisms and functions, state of the art experimental and computational pipelines and challenges involved in the identification and functional annotation of lncRNAs and their prospects as biomarkers. We also illustrate the application of co-expression networks on the TCGA-LIHC dataset for putative functional predictions of lncRNAs having a therapeutic potential in Hepatocellular carcinoma (HCC).
Collapse
Affiliation(s)
- Anshika Chowdhary
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Venkata Satagopam
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| |
Collapse
|
18
|
Sex-Biased lncRNA Signature in Fetal Growth Restriction (FGR). Cells 2021; 10:cells10040921. [PMID: 33923632 PMCID: PMC8072961 DOI: 10.3390/cells10040921] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 04/14/2021] [Accepted: 04/14/2021] [Indexed: 12/13/2022] Open
Abstract
Impaired fetal growth is one of the most important causes of prematurity, stillbirth and infant mortality. The pathogenesis of idiopathic fetal growth restriction (FGR) is poorly understood but is thought to be multifactorial and comprise a range of genetic causes. This research aimed to investigate non-coding RNAs (lncRNAs) in the placentas of male and female fetuses affected by FGR. RNA-Seq data were analyzed to detect lncRNAs, their potential target genes and circular RNAs (circRNAs); a differential analysis was also performed. The multilevel bioinformatic analysis enabled the detection of 23,137 placental lncRNAs and 4263 of them were classified as novel. In FGR-affected female fetuses’ placentas (ff-FGR), among 19 transcriptionally active regions (TARs), five differentially expressed lncRNAs (DELs) and 12 differentially expressed protein-coding genes (DEGs) were identified. Within 232 differentially expressed TARs identified in male fetuses (mf-FGR), 33 encompassed novel and 176 known lncRNAs, and 52 DEGs were upregulated, while 180 revealed decreased expression. In ff-FGR ACTA2-AS1, lncRNA expression was significantly correlated with five DEGs, and in mf-FGR, 25 TARs were associated with DELs correlated with 157 unique DEGs. Backsplicing circRNA processes were detected in the range of H19 lncRNA, in both ff- and mf-FGR placentas. The performed global lncRNAs characteristics in terms of fetal sex showed dysregulation of DELs, DEGs and circRNAs that may affect fetus growth and pregnancy outcomes. In female placentas, DELs and DEGs were associated mainly with the vasculature, while in male placentas, disturbed expression predominantly affected immune processes.
Collapse
|
19
|
Angiogenesis regulation by microRNAs and long non-coding RNAs in human breast cancer. Pathol Res Pract 2021; 219:153326. [PMID: 33601152 DOI: 10.1016/j.prp.2020.153326] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 12/18/2020] [Accepted: 12/22/2020] [Indexed: 02/07/2023]
Abstract
MicroRNAs (miRNAs) and long non-coding RNAs (lncRNAs) are capable of regulating gene expression post-transcriptionally. Since the past decade, a number of in vitro, in vivo, and clinical studies reported the roles of these non-coding RNAs (ncRNAs) in regulating angiogenesis, an important cancer hallmark that is associated with metastases and poor prognosis. The specific roles of various miRNAs and lncRNAs in regulating angiogenesis in breast cancer, with particular focus on the downstream targets and signalling pathways regulated by these ncRNAs will be discussed in this review. In light of the recent trend in exploiting ncRNAs as cancer therapeutics, the potential use of miRNAs and lncRNAs as biomarkers and novel therapeutic agent against angiogenesis was also discussed.
Collapse
|
20
|
Vivek AT, Kumar S. Computational methods for annotation of plant regulatory non-coding RNAs using RNA-seq. Brief Bioinform 2020; 22:6041165. [PMID: 33333550 DOI: 10.1093/bib/bbaa322] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 10/19/2020] [Accepted: 10/20/2020] [Indexed: 12/19/2022] Open
Abstract
Plant transcriptome encompasses numerous endogenous, regulatory non-coding RNAs (ncRNAs) that play a major biological role in regulating key physiological mechanisms. While studies have shown that ncRNAs are extremely diverse and ubiquitous, the functions of the vast majority of ncRNAs are still unknown. With ever-increasing ncRNAs under study, it is essential to identify, categorize and annotate these ncRNAs on a genome-wide scale. The use of high-throughput RNA sequencing (RNA-seq) technologies provides a broader picture of the non-coding component of transcriptome, enabling the comprehensive identification and annotation of all major ncRNAs across samples. However, the detection of known and emerging class of ncRNAs from RNA-seq data demands complex computational methods owing to their unique as well as similar characteristics. Here, we discuss major plant endogenous, regulatory ncRNAs in an RNA sample followed by computational strategies applied to discover each class of ncRNAs using RNA-seq. We also provide a collection of relevant software packages and databases to present a comprehensive bioinformatics toolbox for plant ncRNA researchers. We assume that the discussions in this review will provide a rationale for the discovery of all major categories of plant ncRNAs.
Collapse
Affiliation(s)
- A T Vivek
- National Institute of Plant Genome Research in New Delhi, India
| | - Shailesh Kumar
- National Institute of Plant Genome Research in New Delhi
| |
Collapse
|
21
|
In Silico and In Vitro Analysis of lncRNA XIST Reveals a Panel of Possible Lung Cancer Regulators and a Five-Gene Diagnostic Signature. Cancers (Basel) 2020; 12:cancers12123499. [PMID: 33255394 PMCID: PMC7760781 DOI: 10.3390/cancers12123499] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 11/16/2020] [Accepted: 11/19/2020] [Indexed: 11/20/2022] Open
Abstract
Simple Summary Long non-coding RNAs (lncRNA) have been associated with a number of diseases including cancer. A well-studied lncRNA called XIST (X-inactive specific transcript) acts as a major effector of the X-inactivation process. It is expressed on the inactive X chromosome providing a dosage equivalence between males and females. Recently XIST has been implicated in the development of lung cancer. Using a bioinformatics approach, we demonstrate the XIST is over-expressed in female patients compared to males. When XIST gene was silenced in two different cell lines (of male and female origin), a number of genes were differentially expressed; playing a role in signal transduction pathways, energy balance and metabolism, thus providing a better insight of the role of this lncRNA in cancer. Finally, we showed that expression of XIST with another 4 genes provided a strong diagnostic potential to discriminate lung cancer from healthy controls. Abstract Long non-coding RNAs (lncRNAs) perform a wide functional repertoire of roles in cell biology, ranging from RNA editing to gene regulation, as well as tumour genesis and tumour progression. The lncRNA X-inactive specific transcript (XIST) is involved in the aetiopathogenesis of non-small cell lung cancer (NSCLC). However, its role at the molecular level is not fully elucidated. The expression of XIST and co-regulated genes TSIX, hnRNPu, Bcl-2, and BRCA1 analyses in lung cancer (LC) and controls were performed in silico. Differentially expressed genes (DEGs) were determined using RNA-seq in H1975 and A549 NSCLC cell lines following siRNA for XIST. XIST exhibited sexual dimorphism, being up-regulated in females compared to males in both control and LC patient cohorts. RNA-seq revealed 944 and 751 DEGs for A549 and H1975 cell lines, respectively. These DEGs are involved in signal transduction, cell communication, energy pathways, and nucleic acid metabolism. XIST expression associated with TSIX, hnRNPu, Bcl-2, and BRCA1 provided a strong collective feature to discriminate between controls and LC, implying a diagnostic potential. There is a much more complex role for XIST in lung cancer. Further studies should concentrate on sex-specific changes and investigate the signalling pathways of the DEGs following silencing of this lncRNA.
Collapse
|
22
|
Target Enrichment Enables the Discovery of lncRNAs with Somatic Mutations or Altered Expression in Paraffin-Embedded Colorectal Cancer Samples. Cancers (Basel) 2020; 12:cancers12102844. [PMID: 33019720 PMCID: PMC7650602 DOI: 10.3390/cancers12102844] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 09/20/2020] [Accepted: 09/23/2020] [Indexed: 12/25/2022] Open
Abstract
Simple Summary Alterations in long noncoding RNAs and their mutations have been increasingly recognized in tumorogenesis and cancer progression awakening especial interest as potential novel cancer biomarkers and therapeutic targets. The use of adjuvant chemotherapy in stage II colorectal cancer patients is challenging, and new biomarkers are required to identify patients with high probability of relapse. We focused on translational potential of non-coding RNAs in colorectal cancer. In this study, we aim to validate a new tool which couples target enrichment and RNAseq for transcriptomics studies of lncRNAs in formalin-fixed paraffin embedded (FFPE) tissue samples. Our results show that this new approach efficiently detects lncRNAs and differences in their expression between healthy and tumor FFPE tissues, as well as somatic mutations in expressed lncRNAs, identifying novel lncRNAs as potential candidates for colorectal cancer. This new approach could represent a promising avenue that would reduce costs and enable more efficient translational research. Abstract Long non-coding RNAs (lncRNAs) play important roles in cancer and are potential new biomarkers or targets for therapy. However, given the low and tissue-specific expression of lncRNAs, linking these molecules to particular cancer types and processes through transcriptional profiling is challenging. Formalin-fixed, paraffin-embedded (FFPE) tissues are abundant resources for research but are prone to nucleic acid degradation, thereby complicating the study of lncRNAs. Here, we designed and validated a probe-based enrichment strategy to efficiently profile lncRNA expression in FFPE samples, and we applied it for the detection of lncRNAs associated with colorectal cancer (CRC). Our approach efficiently enriched targeted lncRNAs from FFPE samples, while preserving their relative abundance, and enabled the detection of tumor-specific mutations. We identified 379 lncRNAs differentially expressed between CRC tumors and matched healthy tissues and found tumor-specific lncRNA variants. Our results show that numerous lncRNAs are differentially expressed and/or accumulate variants in CRC tumors, thereby suggesting a role in CRC progression. More generally, our approach unlocks the study of lncRNAs in FFPE samples, thus enabling the retrospective use of abundant, well documented material available in hospital biobanks.
Collapse
|
23
|
Xie X, Liu M, Zhang Y, Wang B, Zhu C, Wang C, Li Q, Huo Y, Guo J, Xu C, Hu L, Pang A, Ma S, Wang L, Cao W, Chen S, Li Q, Zhang S, Zhao X, Zhou W, Luo H, Zheng G, Jiang E, Feng S, Chen L, Shi L, Cheng H, Hao S, Zhu P, Cheng T. Single-cell transcriptomic landscape of human blood cells. Natl Sci Rev 2020; 8:nwaa180. [PMID: 34691592 PMCID: PMC8288407 DOI: 10.1093/nsr/nwaa180] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Revised: 07/16/2020] [Accepted: 07/31/2020] [Indexed: 12/20/2022] Open
Abstract
High throughput single-cell RNA-seq has been successfully implemented to dissect the cellular and molecular features underlying hematopoiesis. However, an elaborate and comprehensive transcriptome reference of the whole blood system is lacking. Here, we profiled the transcriptomes of 7551 human blood cells representing 32 immunophenotypic cell types, including hematopoietic stem cells, progenitors and mature blood cells derived from 21 healthy donors. With high sequencing depth and coverage, we constructed a single-cell transcriptional atlas of blood cells (ABC) on the basis of both protein-coding genes and long noncoding RNAs (lncRNAs), and showed a high consistence between them. Notably, putative lncRNAs and transcription factors regulating hematopoietic cell differentiation were identified. While common transcription factor regulatory networks were activated in neutrophils and monocytes, lymphoid cells dramatically changed their regulatory networks during differentiation. Furthermore, we showed a subset of nucleated erythrocytes actively expressing immune signals, suggesting the existence of erythroid precursors with immune functions. Finally, a web portal offering transcriptome browsing and blood cell type prediction has been established. Thus, our work provides a transcriptional map of human blood cells at single-cell resolution, thereby offering a comprehensive reference for the exploration of physiological and pathological hematopoiesis.
Collapse
Affiliation(s)
- Xiaowei Xie
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Mengyao Liu
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Yawen Zhang
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Bingrui Wang
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Caiying Zhu
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Chenchen Wang
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Qing Li
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Yingying Huo
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Jiaojiao Guo
- Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha 410078, China
| | - Changlu Xu
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Linping Hu
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Aiming Pang
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Shihui Ma
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Lina Wang
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Wenbin Cao
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Shulian Chen
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Qiuling Li
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Sudong Zhang
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Xueying Zhao
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Wen Zhou
- Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha 410078, China
| | - Hongbo Luo
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Guoguang Zheng
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Erlie Jiang
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Sizhou Feng
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Lixiang Chen
- School of Life Sciences, Zhengzhou University, Zhengzhou 450001, China
| | - Lihong Shi
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Hui Cheng
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Sha Hao
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Ping Zhu
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Tao Cheng
- State Key Laboratory of Experimental Hematology and National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
- Center for Stem Cell Medicine and Department of Stem Cell & Regenerative Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| |
Collapse
|
24
|
Qiu YL, Zheng H, Gevaert O. Genomic data imputation with variational auto-encoders. Gigascience 2020; 9:giaa082. [PMID: 32761097 PMCID: PMC7407276 DOI: 10.1093/gigascience/giaa082] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 05/14/2020] [Accepted: 07/03/2020] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND As missing values are frequently present in genomic data, practical methods to handle missing data are necessary for downstream analyses that require complete data sets. State-of-the-art imputation techniques, including methods based on singular value decomposition and K-nearest neighbors, can be computationally expensive for large data sets and it is difficult to modify these algorithms to handle certain cases not missing at random. RESULTS In this work, we use a deep-learning framework based on the variational auto-encoder (VAE) for genomic missing value imputation and demonstrate its effectiveness in transcriptome and methylome data analysis. We show that in the vast majority of our testing scenarios, VAE achieves similar or better performances than the most widely used imputation standards, while having a computational advantage at evaluation time. When dealing with data missing not at random (e.g., few values are missing), we develop simple yet effective methodologies to leverage the prior knowledge about missing data. Furthermore, we investigate the effect of varying latent space regularization strength in VAE on the imputation performances and, in this context, show why VAE has a better imputation capacity compared to a regular deterministic auto-encoder. CONCLUSIONS We describe a deep learning imputation framework for transcriptome and methylome data using a VAE and show that it can be a preferable alternative to traditional methods for data imputation, especially in the setting of large-scale data and certain missing-not-at-random scenarios.
Collapse
Affiliation(s)
- Yeping Lina Qiu
- Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University, Stanford, CA 94305, USA
- Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA
| | - Hong Zheng
- Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University, Stanford, CA 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|