1
|
Hwang H, Jeon H, Yeo N, Baek D. Big data and deep learning for RNA biology. Exp Mol Med 2024; 56:1293-1321. [PMID: 38871816 PMCID: PMC11263376 DOI: 10.1038/s12276-024-01243-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 06/15/2024] Open
Abstract
The exponential growth of big data in RNA biology (RB) has led to the development of deep learning (DL) models that have driven crucial discoveries. As constantly evidenced by DL studies in other fields, the successful implementation of DL in RB depends heavily on the effective utilization of large-scale datasets from public databases. In achieving this goal, data encoding methods, learning algorithms, and techniques that align well with biological domain knowledge have played pivotal roles. In this review, we provide guiding principles for applying these DL concepts to various problems in RB by demonstrating successful examples and associated methodologies. We also discuss the remaining challenges in developing DL models for RB and suggest strategies to overcome these challenges. Overall, this review aims to illuminate the compelling potential of DL for RB and ways to apply this powerful technology to investigate the intriguing biology of RNA more effectively.
Collapse
Affiliation(s)
- Hyeonseo Hwang
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Hyeonseong Jeon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Genome4me Inc., Seoul, Republic of Korea
| | - Nagyeong Yeo
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Daehyun Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- Genome4me Inc., Seoul, Republic of Korea.
| |
Collapse
|
2
|
Olufs ZPG, Wassarman DA, Perouansky M. Stress Pathways Induced by Volatile Anesthetics and Failure of Preconditioning in a Mitochondrial Complex I Mutant. Anesthesiology 2024; 140:463-482. [PMID: 38118175 DOI: 10.1097/aln.0000000000004874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2023]
Abstract
BACKGROUND Carriers of mutations in the mitochondrial electron transport chain are at increased risk of anesthetic-induced neurotoxicity. To investigate the neurotoxicity mechanism and to test preconditioning as a protective strategy, this study used a Drosophila melanogaster model of Leigh syndrome. Model flies carried a mutation in ND23 (ND2360114) that encodes a mitochondrial electron transport chain complex I subunit. This study investigated why ND2360114 mutants become susceptible to lethal, oxygen-modulated neurotoxicity within 24 h of exposure to isoflurane but not sevoflurane. METHODS This study used transcriptomics and quantitative real-time reverse transcription polymerase chain reaction to identify genes that are differentially expressed in ND2360114 but not wild-type fly heads at 30 min after exposure to high- versus low-toxicity conditions. This study also subjected ND2360114 flies to diverse stressors before isoflurane exposure to test whether isoflurane toxicity could be diminished by preconditioning. RESULTS The ND2360114 mutation had a greater effect on isoflurane- than sevoflurane-mediated changes in gene expression. Isoflurane and sevoflurane did not affect expression of heat shock protein (Hsp) genes (Hsp22, Hsp27, and Hsp68) in wild-type flies, but isoflurane substantially increased expression of these genes in ND2360114 mutant flies. Furthermore, isoflurane and sevoflurane induced expression of oxidative (GstD1 and GstD2) and xenobiotic (Cyp6a8 and Cyp6a14) stress genes to a similar extent in wild-type flies, but the effect of isoflurane was largely reduced in ND2360114 flies. In addition, activating stress response pathways by pre-exposure to anesthetics, heat shock, hyperoxia, hypoxia, or oxidative stress did not suppress isoflurane-induced toxicity in ND2360114 mutant flies. CONCLUSIONS Mutation of a mitochondrial electron transport chain complex I subunit generates differential effects of isoflurane and sevoflurane on gene expression that may underlie their differential effects on neurotoxicity. Additionally, the mutation produces resistance to preconditioning by stresses that protect the brain in other contexts. Therefore, complex I activity modifies molecular and physiologic effects of anesthetics in an anesthetic-specific manner. EDITOR’S PERSPECTIVE
Collapse
Affiliation(s)
- Zachariah P G Olufs
- Department of Anesthesiology, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin
| | - David A Wassarman
- Department of Medical Genetics, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin
| | - Misha Perouansky
- Department of Anesthesiology, School of Medicine and Public Health and Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin
| |
Collapse
|
3
|
Bernasconi A, Canakoglu A, Masseroli M, Ceri S. META-BASE: A Novel Architecture for Large-Scale Genomic Metadata Integration. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:543-557. [PMID: 32750853 DOI: 10.1109/tcbb.2020.2998954] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The integration of genomic metadata is, at the same time, an important, difficult, and well-recognized challenge. It is important because a wealth of public data repositories is available to drive biological and clinical research; combining information from various heterogeneous and widely dispersed sources is paramount to a number of biological discoveries. It is difficult because the domain is complex and there is no agreement among the various metadata definitions, which refer to different vocabularies and ontologies. It is well-recognized in the bioinformatics community because, in the common practice, repositories are accessed one-by-one, learning their specific metadata definitions as result of long and tedious efforts, and such practice is error-prone. In this paper, we describe META-BASE, an architecture for integrating metadata extracted from a variety of genomic data sources, based upon a structured transformation process. We present a variety of innovative techniques for data extraction, cleaning, normalization and enrichment. We propose a general, open and extensible pipeline that can easily incorporate any number of new data sources, and propose the resulting repository-already integrating several important sources-which is exposed by means of practical user interfaces to respond biological researchers' needs.
Collapse
|
4
|
Dong K, Shen J, He X, Hu G, Wang L, Osman I, Bunting KM, Dixon-Melvin R, Zheng Z, Xin H, Xiang M, Vazdarjanova A, Fulton DJR, Zhou J. CARMN Is an Evolutionarily Conserved Smooth Muscle Cell-Specific LncRNA That Maintains Contractile Phenotype by Binding Myocardin. Circulation 2021; 144:1856-1875. [PMID: 34694145 PMCID: PMC8726016 DOI: 10.1161/circulationaha.121.055949] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND Vascular homeostasis is maintained by the differentiated phenotype of vascular smooth muscle cells (VSMCs). The landscape of protein coding genes comprising the transcriptome of differentiated VSMCs has been intensively investigated but many gaps remain including the emerging roles of noncoding genes. METHODS We reanalyzed large-scale, publicly available bulk and single-cell RNA sequencing datasets from multiple tissues and cell types to identify VSMC-enriched long noncoding RNAs. The in vivo expression pattern of a novel smooth muscle cell (SMC)-expressed long noncoding RNA, Carmn (cardiac mesoderm enhancer-associated noncoding RNA), was investigated using a novel Carmn green fluorescent protein knock-in reporter mouse model. Bioinformatics and quantitative real-time polymerase chain reaction analysis were used to assess CARMN expression changes during VSMC phenotypic modulation in human and murine vascular disease models. In vitro, functional assays were performed by knocking down CARMN with antisense oligonucleotides and overexpressing Carmn by adenovirus in human coronary artery SMCs. Carotid artery injury was performed in SMC-specific Carmn knockout mice to assess neointima formation and the therapeutic potential of reversing CARMN loss was tested in a rat carotid artery balloon injury model. The molecular mechanisms underlying CARMN function were investigated using RNA pull-down, RNA immunoprecipitation, and luciferase reporter assays. RESULTS We identified CARMN, which was initially annotated as the host gene of the MIR143/145 cluster and recently reported to play a role in cardiac differentiation, as a highly abundant and conserved, SMC-specific long noncoding RNA. Analysis of the Carmn GFP knock-in mouse model confirmed that Carmn is transiently expressed in embryonic cardiomyocytes and thereafter becomes restricted to SMCs. We also found that Carmn is transcribed independently of Mir143/145. CARMN expression is dramatically decreased by vascular disease in humans and murine models and regulates the contractile phenotype of VSMCs in vitro. In vivo, SMC-specific deletion of Carmn significantly exacerbated, whereas overexpression of Carmn markedly attenuated, injury-induced neointima formation in mouse and rat, respectively. Mechanistically, we found that Carmn physically binds to the key transcriptional cofactor myocardin, facilitating its activity and thereby maintaining the contractile phenotype of VSMCs. CONCLUSIONS CARMN is an evolutionarily conserved SMC-specific long noncoding RNA with a previously unappreciated role in maintaining the contractile phenotype of VSMCs and is the first noncoding RNA discovered to interact with myocardin.
Collapse
Affiliation(s)
- Kunzhe Dong
- Department of Pharmacology and Toxicology, Medical College of Georgia, Augusta University, Augusta, Georgia, 30912, USA
| | - Jian Shen
- Department of Pharmacology and Toxicology, Medical College of Georgia, Augusta University, Augusta, Georgia, 30912, USA
- Department of Cardiology, The Second Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, 310009, China
| | - Xiangqin He
- Department of Pharmacology and Toxicology, Medical College of Georgia, Augusta University, Augusta, Georgia, 30912, USA
| | - Guoqing Hu
- Department of Pharmacology and Toxicology, Medical College of Georgia, Augusta University, Augusta, Georgia, 30912, USA
| | - Liang Wang
- Department of Pharmacology and Toxicology, Medical College of Georgia, Augusta University, Augusta, Georgia, 30912, USA
- Department of Cardiology, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, 330006, China
| | - Islam Osman
- Department of Pharmacology and Toxicology, Medical College of Georgia, Augusta University, Augusta, Georgia, 30912, USA
| | - Kristopher M. Bunting
- Department of Pharmacology and Toxicology, Medical College of Georgia, Augusta University, Augusta, Georgia, 30912, USA
| | - Rachael Dixon-Melvin
- Department of Pharmacology and Toxicology, Medical College of Georgia, Augusta University, Augusta, Georgia, 30912, USA
| | - Zeqi Zheng
- Department of Cardiology, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, 330006, China
| | - Hongbo Xin
- The National Engineering Research Center for Bioengineering Drugs and the Technologies, Institute of Translational Medicine, Nanchang University, Nanchang, Jiangxi, 330031, China
- School of Life Sciences, Nanchang University, Nanchang, Jiangxi, 330031, China
| | - Meixiang Xiang
- Department of Cardiology, The Second Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, 310009, China
| | - Almira Vazdarjanova
- Department of Pharmacology and Toxicology, Medical College of Georgia, Augusta University, Augusta, Georgia, 30912, USA
| | - David J. R. Fulton
- Department of Pharmacology and Toxicology, Medical College of Georgia, Augusta University, Augusta, Georgia, 30912, USA
- Vascular Biology Center, Medical College of Georgia, Augusta University, Augusta, Georgia, 30912, USA
| | - Jiliang Zhou
- Department of Pharmacology and Toxicology, Medical College of Georgia, Augusta University, Augusta, Georgia, 30912, USA
| |
Collapse
|
5
|
Lange M, Begolli R, Giakountis A. Non-Coding Variants in Cancer: Mechanistic Insights and Clinical Potential for Personalized Medicine. Noncoding RNA 2021; 7:47. [PMID: 34449663 PMCID: PMC8395730 DOI: 10.3390/ncrna7030047] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 07/26/2021] [Accepted: 08/01/2021] [Indexed: 12/11/2022] Open
Abstract
The cancer genome is characterized by extensive variability, in the form of Single Nucleotide Polymorphisms (SNPs) or structural variations such as Copy Number Alterations (CNAs) across wider genomic areas. At the molecular level, most SNPs and/or CNAs reside in non-coding sequences, ultimately affecting the regulation of oncogenes and/or tumor-suppressors in a cancer-specific manner. Notably, inherited non-coding variants can predispose for cancer decades prior to disease onset. Furthermore, accumulation of additional non-coding driver mutations during progression of the disease, gives rise to genomic instability, acting as the driving force of neoplastic development and malignant evolution. Therefore, detection and characterization of such mutations can improve risk assessment for healthy carriers and expand the diagnostic and therapeutic toolbox for the patient. This review focuses on functional variants that reside in transcribed or not transcribed non-coding regions of the cancer genome and presents a collection of appropriate state-of-the-art methodologies to study them.
Collapse
Affiliation(s)
- Marios Lange
- Department of Biochemistry and Biotechnology, University of Thessaly, Biopolis, 41500 Larissa, Greece; (M.L.); (R.B.)
| | - Rodiola Begolli
- Department of Biochemistry and Biotechnology, University of Thessaly, Biopolis, 41500 Larissa, Greece; (M.L.); (R.B.)
| | - Antonis Giakountis
- Department of Biochemistry and Biotechnology, University of Thessaly, Biopolis, 41500 Larissa, Greece; (M.L.); (R.B.)
- Institute for Fundamental Biomedical Research, B.S.R.C “Alexander Fleming”, 34 Fleming Str., 16672 Vari, Greece
| |
Collapse
|
6
|
Wang MFZ, Mantri M, Chou SP, Scuderi GJ, McKellar DW, Butcher JT, Danko CG, De Vlaminck I. Uncovering transcriptional dark matter via gene annotation independent single-cell RNA sequencing analysis. Nat Commun 2021; 12:2158. [PMID: 33846360 PMCID: PMC8042062 DOI: 10.1038/s41467-021-22496-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 03/17/2021] [Indexed: 11/09/2022] Open
Abstract
Conventional scRNA-seq expression analyses rely on the availability of a high quality genome annotation. Yet, as we show here with scRNA-seq experiments and analyses spanning human, mouse, chicken, mole rat, lemur and sea urchin, genome annotations are often incomplete, in particular for organisms that are not routinely studied. To overcome this hurdle, we created a scRNA-seq analysis routine that recovers biologically relevant transcriptional activity beyond the scope of the best available genome annotation by performing scRNA-seq analysis on any region in the genome for which transcriptional products are detected. Our tool generates a single-cell expression matrix for all transcriptionally active regions (TARs), performs single-cell TAR expression analysis to identify biologically significant TARs, and then annotates TARs using gene homology analysis. This procedure uses single-cell expression analyses as a filter to direct annotation efforts to biologically significant transcripts and thereby uncovers biology to which scRNA-seq would otherwise be in the dark. Conventional single-cell RNA sequencing analysis rely on genome annotations that may be incomplete or inaccurate especially for understudied organisms. Here the authors present a bioinformatic tool that leverages single-cell data to uncover biologically relevant transcripts beyond the best available genome annotation.
Collapse
Affiliation(s)
- Michael F Z Wang
- Nancy E. and Peter C. Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, 14853, USA
| | - Madhav Mantri
- Nancy E. and Peter C. Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, 14853, USA
| | - Shao-Pei Chou
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, 14853, USA
| | - Gaetano J Scuderi
- Nancy E. and Peter C. Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, 14853, USA
| | - David W McKellar
- Nancy E. and Peter C. Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, 14853, USA
| | - Jonathan T Butcher
- Nancy E. and Peter C. Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, 14853, USA
| | - Charles G Danko
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, 14853, USA
| | - Iwijn De Vlaminck
- Nancy E. and Peter C. Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, 14853, USA.
| |
Collapse
|
7
|
Jung YL, Kirli K, Alver BH, Park PJ. Resources and challenges for integrative analysis of nuclear architecture data. Curr Opin Genet Dev 2021; 67:103-110. [PMID: 33450522 PMCID: PMC8084903 DOI: 10.1016/j.gde.2020.12.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 12/09/2020] [Accepted: 12/13/2020] [Indexed: 11/22/2022]
Abstract
A large amount of genomic data for profiling three-dimensional genome architecture have accumulated from large-scale consortium projects as well as from individual laboratories. In this review, we summarize recent landmark datasets and collections in the field. We describe the challenges in collection, annotation, and analysis of these data, particularly for integration of sequencing and microscopy data. We introduce efforts from consortia and independent groups to harmonize diverse datasets. As the resolution and throughput of sequencing and imaging technologies continue to increase, more efficient utilization and integration of collected data will be critical for a better understanding of nuclear architecture.
Collapse
Affiliation(s)
- Youngsook L Jung
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Koray Kirli
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Burak H Alver
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Peter J Park
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
8
|
Stevens I, Mukarram AK, Hörtenhuber M, Meehan TF, Rung J, Daub CO. Ten simple rules for annotating sequencing experiments. PLoS Comput Biol 2020; 16:e1008260. [PMID: 33017400 PMCID: PMC7535046 DOI: 10.1371/journal.pcbi.1008260] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Affiliation(s)
- Irene Stevens
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden
- Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden
- * E-mail:
| | - Abdul Kadir Mukarram
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden
| | - Matthias Hörtenhuber
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden
| | - Terrence F. Meehan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Johan Rung
- Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Carsten O. Daub
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden
- Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
9
|
Jou J, Gabdank I, Luo Y, Lin K, Sud P, Myers Z, Hilton JA, Kagda MS, Lam B, O'Neill E, Adenekan P, Graham K, Baymuradov UK, R Miyasato S, Strattan JS, Jolanki O, Lee JW, Litton C, Y Tanaka F, Hitz BC, Cherry JM. The ENCODE Portal as an Epigenomics Resource. ACTA ACUST UNITED AC 2020; 68:e89. [PMID: 31751002 PMCID: PMC7307447 DOI: 10.1002/cpbi.89] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The Encyclopedia of DNA Elements (ENCODE) web portal hosts genomic data generated by the ENCODE Consortium, Genomics of Gene Regulation, The NIH Roadmap Epigenomics Consortium, and the modENCODE and modERN projects. The goal of the ENCODE project is to build a comprehensive map of the functional elements of the human and mouse genomes. Currently, the portal database stores over 500 TB of raw and processed data from over 15,000 experiments spanning assays that measure gene expression, DNA accessibility, DNA and RNA binding, DNA methylation, and 3D chromatin structure across numerous cell lines, tissue types, and differentiation states with selected genetic and molecular perturbations. The ENCODE portal provides unrestricted access to the aforementioned data and relevant metadata as a service to the scientific community. The metadata model captures the details of the experiments, raw and processed data files, and processing pipelines in human and machine‐readable form and enables the user to search for specific data either using a web browser or programmatically via REST API. Furthermore, ENCODE data can be freely visualized or downloaded for additional analyses. © 2019 The Authors. Basic Protocol: Query the portal Support Protocol 1: Batch downloading Support Protocol 2: Using the cart to download files Support Protocol 3: Visualize data Alternate Protocol: Query building and programmatic access
Collapse
Affiliation(s)
- Jennifer Jou
- Department of Genetics, Stanford University, Stanford, California
| | - Idan Gabdank
- Department of Genetics, Stanford University, Stanford, California
| | - Yunhai Luo
- Department of Genetics, Stanford University, Stanford, California
| | - Khine Lin
- Department of Genetics, Stanford University, Stanford, California
| | - Paul Sud
- Department of Genetics, Stanford University, Stanford, California
| | - Zachary Myers
- Department of Genetics, Stanford University, Stanford, California
| | - Jason A Hilton
- Department of Genetics, Stanford University, Stanford, California
| | | | - Bonita Lam
- Department of Genetics, Stanford University, Stanford, California
| | - Emma O'Neill
- Department of Genetics, Stanford University, Stanford, California
| | - Philip Adenekan
- Department of Genetics, Stanford University, Stanford, California
| | - Keenan Graham
- Department of Genetics, Stanford University, Stanford, California
| | | | | | - J Seth Strattan
- Department of Genetics, Stanford University, Stanford, California
| | - Otto Jolanki
- Department of Genetics, Stanford University, Stanford, California
| | - Jin-Wook Lee
- Department of Genetics, Stanford University, Stanford, California
| | - Casey Litton
- Department of Genetics, Stanford University, Stanford, California
| | - Forrest Y Tanaka
- Department of Genetics, Stanford University, Stanford, California
| | - Benjamin C Hitz
- Department of Genetics, Stanford University, Stanford, California
| | - J Michael Cherry
- Department of Genetics, Stanford University, Stanford, California
| |
Collapse
|
10
|
Bernasconi A, Canakoglu A, Masseroli M, Ceri S. The road towards data integration in human genomics: players, steps and interactions. Brief Bioinform 2020; 22:30-44. [PMID: 32496509 DOI: 10.1093/bib/bbaa080] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Revised: 03/09/2020] [Accepted: 04/18/2020] [Indexed: 12/15/2022] Open
Abstract
Thousands of new experimental datasets are becoming available every day; in many cases, they are produced within the scope of large cooperative efforts, involving a variety of laboratories spread all over the world, and typically open for public use. Although the potential collective amount of available information is huge, the effective combination of such public sources is hindered by data heterogeneity, as the datasets exhibit a wide variety of notations and formats, concerning both experimental values and metadata. Thus, data integration is becoming a fundamental activity, to be performed prior to data analysis and biological knowledge discovery, consisting of subsequent steps of data extraction, normalization, matching and enrichment; once applied to heterogeneous data sources, it builds multiple perspectives over the genome, leading to the identification of meaningful relationships that could not be perceived by using incompatible data formats. In this paper, we first describe a technological pipeline from data production to data integration; we then propose a taxonomy of genomic data players (based on the distinction between contributors, repository hosts, consortia, integrators and consumers) and apply the taxonomy to describe about 30 important players in genomic data management. We specifically focus on the integrator players and analyse the issues in solving the genomic data integration challenges, as well as evaluate the computational environments that they provide to follow up data integration by means of visualization and analysis tools.
Collapse
|
11
|
Luo Y, Hitz BC, Gabdank I, Hilton JA, Kagda MS, Lam B, Myers Z, Sud P, Jou J, Lin K, Baymuradov UK, Graham K, Litton C, Miyasato SR, Strattan JS, Jolanki O, Lee JW, Tanaka FY, Adenekan P, O'Neill E, Cherry JM. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic Acids Res 2020; 48:D882-D889. [PMID: 31713622 PMCID: PMC7061942 DOI: 10.1093/nar/gkz1062] [Citation(s) in RCA: 355] [Impact Index Per Article: 88.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 10/18/2019] [Accepted: 10/25/2019] [Indexed: 02/06/2023] Open
Abstract
The Encyclopedia of DNA Elements (ENCODE) is an ongoing collaborative research project aimed at identifying all the functional elements in the human and mouse genomes. Data generated by the ENCODE consortium are freely accessible at the ENCODE portal (https://www.encodeproject.org/), which is developed and maintained by the ENCODE Data Coordinating Center (DCC). Since the initial portal release in 2013, the ENCODE DCC has updated the portal to make ENCODE data more findable, accessible, interoperable and reusable. Here, we report on recent updates, including new ENCODE data and assays, ENCODE uniform data processing pipelines, new visualization tools, a dataset cart feature, unrestricted public access to ENCODE data on the cloud (Amazon Web Services open data registry, https://registry.opendata.aws/encode-project/) and more comprehensive tutorials and documentation.
Collapse
Affiliation(s)
- Yunhai Luo
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Benjamin C Hitz
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Idan Gabdank
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Jason A Hilton
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Meenakshi S Kagda
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Bonita Lam
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Zachary Myers
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Paul Sud
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Jennifer Jou
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Khine Lin
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | | | - Keenan Graham
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Casey Litton
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Stuart R Miyasato
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - J Seth Strattan
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Otto Jolanki
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Jin-Wook Lee
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Forrest Y Tanaka
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Philip Adenekan
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Emma O'Neill
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| |
Collapse
|
12
|
Kumuthini J, Chimenti M, Nahnsen S, Peltzer A, Meraba R, McFadyen R, Wells G, Taylor D, Maienschein-Cline M, Li JL, Thimmapuram J, Murthy-Karuturi R, Zass L. Ten simple rules for providing effective bioinformatics research support. PLoS Comput Biol 2020; 16:e1007531. [PMID: 32214318 PMCID: PMC7098546 DOI: 10.1371/journal.pcbi.1007531] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Life scientists are increasingly turning to high-throughput sequencing technologies in their research programs, owing to the enormous potential of these methods. In a parallel manner, the number of core facilities that provide bioinformatics support are also increasing. Notably, the generation of complex large datasets has necessitated the development of bioinformatics support core facilities that aid laboratory scientists with cost-effective and efficient data management, analysis, and interpretation. In this article, we address the challenges-related to communication, good laboratory practice, and data handling-that may be encountered in core support facilities when providing bioinformatics support, drawing on our own experiences working as support bioinformaticians on multidisciplinary research projects. Most importantly, the article proposes a list of guidelines that outline how these challenges can be preemptively avoided and effectively managed to increase the value of outputs to the end user, covering the entire research project lifecycle, including experimental design, data analysis, and management (i.e., sharing and storage). In addition, we highlight the importance of clear and transparent communication, comprehensive preparation, appropriate handling of samples and data using monitoring systems, and the employment of appropriate tools and standard operating procedures to provide effective bioinformatics support.
Collapse
Affiliation(s)
- Judit Kumuthini
- H3ABioNet, Centre for Proteomic and Genomic Research, Cape Town, South Africa
| | - Michael Chimenti
- Iowa Institute of Human Genetics, Bioinformatics Division, Carver College of Medicine, University of Iowa, Iowa City, United States of America
| | - Sven Nahnsen
- Quantitative Biology Centre, Eberhard Karls University of Tübingen, Tübingen, Baden-Württemberg, Germany
| | - Alexander Peltzer
- Quantitative Biology Centre, Eberhard Karls University of Tübingen, Tübingen, Baden-Württemberg, Germany
| | - Rebone Meraba
- H3ABioNet, Centre for Proteomic and Genomic Research, Cape Town, South Africa
| | - Ross McFadyen
- H3ABioNet, Centre for Proteomic and Genomic Research, Cape Town, South Africa
| | - Gordon Wells
- H3ABioNet, Centre for Proteomic and Genomic Research, Cape Town, South Africa
| | - Deanne Taylor
- Department of Biomedical and Health Informatics, The Children’s Hospital of Philadelphia, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Mark Maienschein-Cline
- Research Informatics Core, University of Illinois at Chicago, Chicago, Illinois, United States of America
| | - Jian-Liang Li
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Jyothi Thimmapuram
- Bioinformatics Core, Purdue University, West Lafayette, Indiana, United States of America
| | - Radha Murthy-Karuturi
- Department of Computational Sciences, The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, United States of America
| | - Lyndon Zass
- H3ABioNet, Centre for Proteomic and Genomic Research, Cape Town, South Africa
| |
Collapse
|
13
|
Snyder MP, Lin S, Posgai A, Atkinson M, Regev A, Rood J, Rozenblatt-Rosen O, Gaffney L, Hupalowska A, Satija R, Gehlenborg N, Shendure J, Laskin J, Harbury P, Nystrom NA, Silverstein JC, Bar-Joseph Z, Zhang K, Börner K, Lin Y, Conroy R, Procaccini D, Roy AL, Pillai A, Brown M, Galis ZS, Cai L, Shendure J, Trapnell C, Lin S, Jackson D, Snyder MP, Nolan G, Greenleaf WJ, Lin Y, Plevritis S, Ahadi S, Nevins SA, Lee H, Schuerch CM, Black S, Venkataraaman VG, Esplin E, Horning A, Bahmani A, Zhang K, Sun X, Jain S, Hagood J, Pryhuber G, Kharchenko P, Atkinson M, Bodenmiller B, Brusko T, Clare-Salzler M, Nick H, Otto K, Posgai A, Wasserfall C, Jorgensen M, Brusko M, Maffioletti S, Caprioli RM, Spraggins JM, Gutierrez D, Patterson NH, Neumann EK, Harris R, deCaestecker M, Fogo AB, van de Plas R, Lau K, Cai L, Yuan GC, Zhu Q, Dries R, Yin P, Saka SK, Kishi JY, Wang Y, Goldaracena I, Laskin J, Ye D, Burnum-Johnson KE, Piehowski PD, Ansong C, Zhu Y, Harbury P, Desai T, Mulye J, Chou P, Nagendran M, Bar-Joseph Z, Teichmann SA, Paten B, Murphy RF, Ma J, Kiselev VY, Kingsford C, Ricarte A, Keays M, Akoju SA, Ruffalo M, Gehlenborg N, Kharchenko P, Vella M, McCallum C, Börner K, Cross LE, Friedman SH, Heiland R, Herr B, Macklin P, Quardokus EM, Record L, Sluka JP, Weber GM, Nystrom NA, Silverstein JC, Blood PD, Ropelewski AJ, Shirey WE, Scibek RM, Mabee P, Lenhardt WC, Robasky K, Michailidis S, Satija R, Marioni J, Regev A, Butler A, Stuart T, Fisher E, Ghazanfar S, Rood J, Gaffney L, Eraslan G, Biancalani T, Vaishnav ED, Conroy R, Procaccini D, Roy A, Pillai A, Brown M, Galis Z, Srinivas P, Pawlyk A, Sechi S, Wilder E, Anderson J. The human body at cellular resolution: the NIH Human Biomolecular Atlas Program. Nature 2019; 574:187-192. [PMID: 31597973 PMCID: PMC6800388 DOI: 10.1038/s41586-019-1629-x] [Citation(s) in RCA: 290] [Impact Index Per Article: 58.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 09/09/2019] [Indexed: 12/12/2022]
Abstract
Transformative technologies are enabling the construction of three-dimensional maps of tissues with unprecedented spatial and molecular resolution. Over the next seven years, the NIH Common Fund Human Biomolecular Atlas Program (HuBMAP) intends to develop a widely accessible framework for comprehensively mapping the human body at single-cell resolution by supporting technology development, data acquisition, and detailed spatial mapping. HuBMAP will integrate its efforts with other funding agencies, programs, consortia, and the biomedical research community at large towards the shared vision of a comprehensive, accessible three-dimensional molecular and cellular atlas of the human body, in health and under various disease conditions.
Collapse
|
14
|
Zhu L, Hofestadt R, Ester M. Tissue-Specific Subcellular Localization Prediction Using Multi-Label Markov Random Fields. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1471-1482. [PMID: 30736003 DOI: 10.1109/tcbb.2019.2897683] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The understanding of subcellular localization (SCL) of proteins and proteome variation in the different tissues and organs of the human body are two crucial aspects for increasing our knowledge of the dynamic rules of proteins, the cell biology, and the mechanism of diseases. Although there have been tremendous contributions to these two fields independently, the lack of knowledge of the variation of spatial distribution of proteins in the different tissues still exists. Here, we proposed an approach that allows predicting protein SCL on tissue specificity through the use of tissue-specific functional associations and physical protein-protein interactions (PPIs). We applied our previously developed Bayesian collective Markov random fields (BCMRFs) on tissue-specific protein-protein interaction network (PPI network) for nine types of tissues focusing on eight high-level SCL. The evaluated results demonstrate the strength of our approach in predicting tissue-specific SCL. We identified 1,314 proteins that their SCL were previously proven cell line dependent. We predicted 549 novel tissue-specific localized candidate proteins while some of them were validated via text-mining.
Collapse
|
15
|
Davis CA, Hitz BC, Sloan CA, Chan ET, Davidson JM, Gabdank I, Hilton JA, Jain K, Baymuradov UK, Narayanan AK, Onate KC, Graham K, Miyasato SR, Dreszer TR, Strattan JS, Jolanki O, Tanaka FY, Cherry JM. The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res 2019; 46:D794-D801. [PMID: 29126249 PMCID: PMC5753278 DOI: 10.1093/nar/gkx1081] [Citation(s) in RCA: 1095] [Impact Index Per Article: 219.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 10/19/2017] [Indexed: 12/30/2022] Open
Abstract
The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center has developed the ENCODE Portal database and website as the source for the data and metadata generated by the ENCODE Consortium. Two principles have motivated the design. First, experimental protocols, analytical procedures and the data themselves should be made publicly accessible through a coherent, web-based search and download interface. Second, the same interface should serve carefully curated metadata that record the provenance of the data and justify its interpretation in biological terms. Since its initial release in 2013 and in response to recommendations from consortium members and the wider community of scientists who use the Portal to access ENCODE data, the Portal has been regularly updated to better reflect these design principles. Here we report on these updates, including results from new experiments, uniformly-processed data from other projects, new visualization tools and more comprehensive metadata to describe experiments and analyses. Additionally, the Portal is now home to meta(data) from related projects including Genomics of Gene Regulation, Roadmap Epigenome Project, Model organism ENCODE (modENCODE) and modERN. The Portal now makes available over 13000 datasets and their accompanying metadata and can be accessed at: https://www.encodeproject.org/.
Collapse
Affiliation(s)
- Carrie A Davis
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Benjamin C Hitz
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Cricket A Sloan
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Esther T Chan
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Jean M Davidson
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Idan Gabdank
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Jason A Hilton
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Kriti Jain
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | | | - Aditi K Narayanan
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Kathrina C Onate
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Keenan Graham
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Stuart R Miyasato
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Timothy R Dreszer
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - J Seth Strattan
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Otto Jolanki
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Forrest Y Tanaka
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| |
Collapse
|
16
|
Holding AN, Giorgi FM, Donnelly A, Cullen AE, Nagarajan S, Selth LA, Markowetz F. VULCAN integrates ChIP-seq with patient-derived co-expression networks to identify GRHL2 as a key co-regulator of ERa at enhancers in breast cancer. Genome Biol 2019; 20:91. [PMID: 31084623 PMCID: PMC6515683 DOI: 10.1186/s13059-019-1698-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Accepted: 04/23/2019] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND VirtUaL ChIP-seq Analysis through Networks (VULCAN) infers regulatory interactions of transcription factors by overlaying networks generated from publicly available tumor expression data onto ChIP-seq data. We apply our method to dissect the regulation of estrogen receptor-alpha activation in breast cancer to identify potential co-regulators of the estrogen receptor's transcriptional response. RESULTS VULCAN analysis of estrogen receptor activation in breast cancer highlights the key components of the estrogen receptor complex alongside a novel interaction with GRHL2. We demonstrate that GRHL2 is recruited to a subset of estrogen receptor binding sites and regulates transcriptional output, as evidenced by changes in estrogen receptor-associated eRNA expression and stronger estrogen receptor binding at active enhancers after GRHL2 knockdown. CONCLUSIONS Our findings provide new insight into the role of GRHL2 in regulating eRNA transcription as part of estrogen receptor signaling. These results demonstrate VULCAN, available from Bioconductor, as a powerful predictive tool.
Collapse
Affiliation(s)
- Andrew N Holding
- CRUK Cambridge Institute, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, UK.
- The Alan Turing Institute, 96 Euston Road, Kings Cross, London, NW1 2DB, UK.
| | - Federico M Giorgi
- CRUK Cambridge Institute, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, UK
- Department of Pharmacy and Biotechnology, University of Bologna, Via Selmi 3, Bologna, Italy
| | - Amanda Donnelly
- CRUK Cambridge Institute, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, UK
| | - Amy E Cullen
- CRUK Cambridge Institute, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, UK
| | - Sankari Nagarajan
- CRUK Cambridge Institute, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, UK
| | - Luke A Selth
- Dame Roma Mitchell Cancer Research Laboratories and Freemasons Foundation Centre for Men's Health, Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia
| | - Florian Markowetz
- CRUK Cambridge Institute, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, UK
| |
Collapse
|
17
|
Min JL, Hemani G, Davey Smith G, Relton C, Suderman M. Meffil: efficient normalization and analysis of very large DNA methylation datasets. Bioinformatics 2018; 34:3983-3989. [PMID: 29931280 PMCID: PMC6247925 DOI: 10.1093/bioinformatics/bty476] [Citation(s) in RCA: 105] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2018] [Accepted: 06/18/2018] [Indexed: 12/11/2022] Open
Abstract
Motivation DNA methylation datasets are growing ever larger both in sample size and genome coverage. Novel computational solutions are required to efficiently handle these data. Results We have developed meffil, an R package designed for efficient quality control, normalization and epigenome-wide association studies of large samples of Illumina Methylation BeadChip microarrays. A complete re-implementation of functional normalization minimizes computational memory without increasing running time. Incorporating fixed and random effects within functional normalization, and automated estimation of functional normalization parameters reduces technical variation in DNA methylation levels, thus reducing false positive rates and improving power. Support for normalization of datasets distributed across physically different locations without needing to share biologically-based individual-level data means that meffil can be used to reduce heterogeneity in meta-analyses of epigenome-wide association studies. Availability and implementation https://github.com/perishky/meffil/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- J L Min
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK.,Bristol Medical School, University of Bristol, Bristol, UK
| | - G Hemani
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK.,Bristol Medical School, University of Bristol, Bristol, UK
| | - G Davey Smith
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK.,Bristol Medical School, University of Bristol, Bristol, UK
| | - C Relton
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK.,Bristol Medical School, University of Bristol, Bristol, UK
| | - M Suderman
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK.,Bristol Medical School, University of Bristol, Bristol, UK
| |
Collapse
|
18
|
Wu SM, Liu H, Huang PJ, Chang IYF, Lee CC, Yang CY, Tsai WS, Tan BCM. circlncRNAnet: an integrated web-based resource for mapping functional networks of long or circular forms of noncoding RNAs. Gigascience 2018; 7:1-10. [PMID: 29194536 PMCID: PMC5765557 DOI: 10.1093/gigascience/gix118] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2017] [Accepted: 11/22/2017] [Indexed: 12/26/2022] Open
Abstract
Background Despite their lack of protein-coding potential, long noncoding RNAs (lncRNAs) and circular RNAs (circRNAs) have emerged as key determinants in gene regulation, acting to fine-tune transcriptional and signaling output. These noncoding RNA transcripts are known to affect expression of messenger RNAs (mRNAs) via epigenetic and post-transcriptional regulation. Given their widespread target spectrum, as well as extensive modes of action, a complete understanding of their biological relevance will depend on integrative analyses of systems data at various levels. Findings While a handful of publicly available databases have been reported, existing tools do not fully capture, from a network perspective, the functional implications of lncRNAs or circRNAs of interest. Through an integrated and streamlined design, circlncRNAnet aims to broaden the understanding of ncRNA candidates by testing in silico several hypotheses of ncRNA-based functions, on the basis of large-scale RNA-seq data. This web server is implemented with several features that represent advances in the bioinformatics of ncRNAs: (1) a flexible framework that accepts and processes user-defined next-generation sequencing–based expression data; (2) multiple analytic modules that assign and productively assess the regulatory networks of user-selected ncRNAs by cross-referencing extensively curated databases; (3) an all-purpose, information-rich workflow design that is tailored to all types of ncRNAs. Outputs on expression profiles, co-expression networks and pathways, and molecular interactomes, are dynamically and interactively displayed according to user-defined criteria. Conclusions In short, users may apply circlncRNAnet to obtain, in real time, multiple lines of functionally relevant information on circRNAs/lncRNAs of their interest. In summary, circlncRNAnet provides a “one-stop” resource for in-depth analyses of ncRNA biology. circlncRNAnet is freely available at http://app.cgu.edu.tw/circlnc/.
Collapse
Affiliation(s)
- Shao-Min Wu
- Graduate Institute of Biomedical Sciences, College of Medicine, Chang Gung University, Guishan, Taoyuan, Taiwan
| | - Hsuan Liu
- Graduate Institute of Biomedical Sciences, College of Medicine, Chang Gung University, Guishan, Taoyuan, Taiwan.,Department of Cell and Molecular Biology, College of Medicine, Chang Gung University, Guishan, Taoyuan, Taiwan.,Molecular Medicine Research Center, Chang Gung University, Guishan, Taoyuan, Taiwan.,Division of Colon and Rectal Surgery, Department of Surgery, Chang Gung Memorial Hospital, Linkou, Taiwan
| | - Po-Jung Huang
- Molecular Medicine Research Center, Chang Gung University, Guishan, Taoyuan, Taiwan.,Department of Biomedical Sciences, College of Medicine, Chang Gung University, Guishan, Taoyuan, Taiwan.,Genomic Medicine Research Core Laboratory, Chang Gung Memorial Hospital, Linkou, Taiwan
| | - Ian Yi-Feng Chang
- Molecular Medicine Research Center, Chang Gung University, Guishan, Taoyuan, Taiwan
| | - Chi-Ching Lee
- Department of Computer Science and Information Engineering, College of Engineering, Chang Gung University, Guishan, Taoyuan, Taiwan
| | - Chia-Yu Yang
- Graduate Institute of Biomedical Sciences, College of Medicine, Chang Gung University, Guishan, Taoyuan, Taiwan.,Molecular Medicine Research Center, Chang Gung University, Guishan, Taoyuan, Taiwan.,Division of Colon and Rectal Surgery, Department of Surgery, Chang Gung Memorial Hospital, Linkou, Taiwan.,Department of Microbiology and Immunology, College of Medicine, Chang Gung University, Guishan, Taoyuan, Taiwan
| | - Wen-Sy Tsai
- Division of Colon and Rectal Surgery, Department of Surgery, Chang Gung Memorial Hospital, Linkou, Taiwan
| | - Bertrand Chin-Ming Tan
- Graduate Institute of Biomedical Sciences, College of Medicine, Chang Gung University, Guishan, Taoyuan, Taiwan.,Molecular Medicine Research Center, Chang Gung University, Guishan, Taoyuan, Taiwan.,Department of Biomedical Sciences, College of Medicine, Chang Gung University, Guishan, Taoyuan, Taiwan.,Department of Neurosurgery, Linkou Medical Center, Chang Gung Memorial Hospital, Linkou, Taiwan
| |
Collapse
|
19
|
Alfimova MV, Kondratiev NV, Golov AK, Golimbet VE. Methylation of the Reelin Gene Promoter in Peripheral Blood and Its Relationship with the Cognitive Function of Schizophrenia Patients. Mol Biol 2018. [DOI: 10.1134/s0026893318050023] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
|
20
|
Gabdank I, Chan ET, Davidson JM, Hilton JA, Davis CA, Baymuradov UK, Narayanan A, Onate KC, Graham K, Miyasato SR, Dreszer TR, Strattan JS, Jolanki O, Tanaka FY, Hitz BC, Sloan CA, Cherry JM. Prevention of data duplication for high throughput sequencing repositories. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:4913687. [PMID: 29688363 PMCID: PMC5829560 DOI: 10.1093/database/bay008] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 01/10/2018] [Indexed: 01/01/2023]
Abstract
Database URL https://www.encodeproject.org/.
Collapse
Affiliation(s)
- Idan Gabdank
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Esther T Chan
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Jean M Davidson
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Jason A Hilton
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Carrie A Davis
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | | | - Aditi Narayanan
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Kathrina C Onate
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Keenan Graham
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Stuart R Miyasato
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Timothy R Dreszer
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - J Seth Strattan
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Otto Jolanki
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Forrest Y Tanaka
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Benjamin C Hitz
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Cricket A Sloan
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| |
Collapse
|
21
|
Heuston EF, Keller CA, Lichtenberg J, Giardine B, Anderson SM, Hardison RC, Bodine DM. Establishment of regulatory elements during erythro-megakaryopoiesis identifies hematopoietic lineage-commitment points. Epigenetics Chromatin 2018; 11:22. [PMID: 29807547 PMCID: PMC5971425 DOI: 10.1186/s13072-018-0195-z] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2018] [Accepted: 05/21/2018] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Enhancers and promoters are cis-acting regulatory elements associated with lineage-specific gene expression. Previous studies showed that different categories of active regulatory elements are in regions of open chromatin, and each category is associated with a specific subset of post-translationally marked histones. These regulatory elements are systematically activated and repressed to promote commitment of hematopoietic stem cells along separate differentiation paths, including the closely related erythrocyte (ERY) and megakaryocyte (MK) lineages. However, the order in which these decisions are made remains unclear. RESULTS To characterize the order of cell fate decisions during hematopoiesis, we collected primary cells from mouse bone marrow and isolated 10 hematopoietic populations to generate transcriptomes and genome-wide maps of chromatin accessibility and histone H3 acetylated at lysine 27 binding (H3K27ac). Principle component analysis of transcriptional and open chromatin profiles demonstrated that cells of the megakaryocyte lineage group closely with multipotent progenitor populations, whereas erythroid cells form a separate group distinct from other populations. Using H3K27ac and open chromatin profiles, we showed that 89% of immature MK (iMK)-specific active regulatory regions are present in the most primitive hematopoietic cells, 46% of which contain active enhancer marks. These candidate active enhancers are enriched for transcription factor binding site motifs for megakaryopoiesis-essential proteins, including ERG and ETS1. In comparison, only 64% of ERY-specific active regulatory regions are present in the most primitive hematopoietic cells, 20% of which containing active enhancer marks. These regions were not enriched for any transcription factor consensus sequences. Incorporation of genome-wide DNA methylation identified significant levels of de novo methylation in iMK, but not ERY. CONCLUSIONS Our results demonstrate that megakaryopoietic profiles are established early in hematopoiesis and are present in the majority of the hematopoietic progenitor population. However, megakaryopoiesis does not constitute a "default" differentiation pathway, as extensive de novo DNA methylation accompanies megakaryopoietic commitment. In contrast, erythropoietic profiles are not established until a later stage of hematopoiesis, and require more dramatic changes to the transcriptional and epigenetic programs. These data provide important insights into lineage commitment and can contribute to ongoing studies related to diseases associated with differentiation defects.
Collapse
|
22
|
Fort RS, Mathó C, Oliveira-Rizzo C, Garat B, Sotelo-Silveira JR, Duhagon MA. An integrated view of the role of miR-130b/301b miRNA cluster in prostate cancer. Exp Hematol Oncol 2018; 7:10. [PMID: 29744254 PMCID: PMC5930504 DOI: 10.1186/s40164-018-0102-0] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 04/20/2018] [Indexed: 12/26/2022] Open
Abstract
Prostate cancer is a major health problem worldwide due to its high incidence morbidity and mortality. There is currently a need of improved biomarkers, capable to distinguish mild versus aggressive forms of the disease, and thus guide therapeutic decisions. Although miRNAs deregulated in cancer represent exciting candidates as biomarkers, its scientific literature is frequently fragmented in dispersed studies. This problem is aggravated for miRNAs belonging to miRNA gene clusters with shared target genes. The miRNA cluster composed by hsa-mir-130b and hsa-mir-301b precursors was recently involved in prostate cancer pathogenesis, yet different studies assigned it opposite effects on the disease. We sought to elucidate the role of the human miR-130b/301b miRNA cluster in prostate cancer through a comprehensive data analysis of most published clinical cohorts. We interrogated methylomes, transcriptomes and patient clinical data, unifying previous reports and adding original analysis using the largest available cohort (TCGA-PRAD). We found that hsa-miR-130b-3p and hsa-miR-301b-3p are upregulated in neoplastic vs normal prostate tissue, as well as in metastatic vs primary sites. However, this increase in expression is not due to a decrease of the global DNA methylation of the genes in prostate tissues, as the promoter of the gene remains lowly methylated in normal and neoplastic tissue. A comparison of the levels of human miR-130b/301b and all the clinical variables reported for the major available cohorts, yielded positive correlations with malignance, specifically significant for T-stage, residual tumor status and primary therapy outcome. The assessment of the correlations between the hsa-miR-130b-3p and hsa-miR-301b-3p and candidate target genes in clinical samples, supports their repression of tumor suppressor genes in prostate cancer. Altogether, these results favor an oncogenic role of miR-130b/301b cluster in prostate cancer.
Collapse
Affiliation(s)
- Rafael Sebastián Fort
- 1Laboratorio de Interacciones Moleculares, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay.,2Depto. de Genética, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - Cecilia Mathó
- 1Laboratorio de Interacciones Moleculares, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay.,2Depto. de Genética, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - Carolina Oliveira-Rizzo
- 1Laboratorio de Interacciones Moleculares, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay.,2Depto. de Genética, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - Beatriz Garat
- 1Laboratorio de Interacciones Moleculares, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - José Roberto Sotelo-Silveira
- 3Depto. de Genómica, Instituto de Investigaciones Biológicas Clemente Estable, Ministerio de Educación y Cultura, Montevideo, Uruguay.,4Depto. de Biología Celular y Molecular, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - María Ana Duhagon
- 1Laboratorio de Interacciones Moleculares, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay.,2Depto. de Genética, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| |
Collapse
|
23
|
Mazin PV, Jiang X, Fu N, Han D, Guo M, Gelfand MS, Khaitovich P. Conservation, evolution, and regulation of splicing during prefrontal cortex development in humans, chimpanzees, and macaques. RNA (NEW YORK, N.Y.) 2018; 24:585-596. [PMID: 29363555 PMCID: PMC5855957 DOI: 10.1261/rna.064931.117] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2017] [Accepted: 01/10/2018] [Indexed: 05/03/2023]
Abstract
Changes in splicing are known to affect the function and regulation of genes. We analyzed splicing events that take place during the postnatal development of the prefrontal cortex in humans, chimpanzees, and rhesus macaques based on data obtained from 168 individuals. Our study revealed that among the 38,822 quantified alternative exons, 15% are differentially spliced among species, and more than 6% splice differently at different ages. Mutations in splicing acceptor and/or donor sites might explain more than 14% of all splicing differences among species and up to 64% of high-amplitude differences. A reconstructed trans-regulatory network containing 21 RNA-binding proteins explains a further 4% of splicing variations within species. While most age-dependent splicing patterns are conserved among the three species, developmental changes in intron retention are substantially more pronounced in humans.
Collapse
Affiliation(s)
- Pavel V Mazin
- Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow 143028, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow 127051, Russia
- Faculty of Computer Science, Higher School of Economics, Moscow 125319, Russia
| | - Xi Jiang
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai 200031, China
| | - Ning Fu
- Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow 143028, Russia
| | - Dingding Han
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai 200031, China
| | - Meng Guo
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Mikhail S Gelfand
- Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow 143028, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow 127051, Russia
- Faculty of Computer Science, Higher School of Economics, Moscow 125319, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow 119992, Russia
| | - Philipp Khaitovich
- Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow 143028, Russia
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai 200031, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
- Max Planck Institute for Evolutionary Anthropology, Leipzig 04103, Germany
| |
Collapse
|
24
|
Horie M, Kaczkowski B, Ohshima M, Matsuzaki H, Noguchi S, Mikami Y, Lizio M, Itoh M, Kawaji H, Lassmann T, Carninci P, Hayashizaki Y, Forrest ARR, Takai D, Yamaguchi Y, Micke P, Saito A, Nagase T. Integrative CAGE and DNA Methylation Profiling Identify Epigenetically Regulated Genes in NSCLC. Mol Cancer Res 2017; 15:1354-1365. [PMID: 28698358 DOI: 10.1158/1541-7786.mcr-17-0191] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Revised: 06/12/2017] [Accepted: 06/28/2017] [Indexed: 11/16/2022]
Abstract
Lung cancer is the leading cause of cancer-related deaths worldwide. The majority of cancer driver mutations have been identified; however, relevant epigenetic regulation involved in tumorigenesis has only been fragmentarily analyzed. Epigenetically regulated genes have a great theranostic potential, especially in tumors with no apparent driver mutations. Here, epigenetically regulated genes were identified in lung cancer by an integrative analysis of promoter-level expression profiles from Cap Analysis of Gene Expression (CAGE) of 16 non-small cell lung cancer (NSCLC) cell lines and 16 normal lung primary cell specimens with DNA methylation data of 69 NSCLC cell lines and 6 normal lung epithelial cells. A core set of 49 coding genes and 10 long noncoding RNAs (lncRNA), which are upregulated in NSCLC cell lines due to promoter hypomethylation, was uncovered. Twenty-two epigenetically regulated genes were validated (upregulated genes with hypomethylated promoters) in the adenocarcinoma and squamous cell cancer subtypes of lung cancer using The Cancer Genome Atlas data. Furthermore, it was demonstrated that multiple copies of the REP522 DNA repeat family are prominently upregulated due to hypomethylation in NSCLC cell lines, which leads to cancer-specific expression of lncRNAs, such as RP1-90G24.10, AL022344.4, and PCAT7. Finally, Myeloma Overexpressed (MYEOV) was identified as the most promising candidate. Functional studies demonstrated that MYEOV promotes cell proliferation, survival, and invasion. Moreover, high MYEOV expression levels were associated with poor prognosis.Implications: This report identifies a robust list of 22 candidate driver genes that are epigenetically regulated in lung cancer; such genes may complement the known mutational drivers.Visual Overview: http://mcr.aacrjournals.org/content/molcanres/15/10/1354/F1.large.jpg Mol Cancer Res; 15(10); 1354-65. ©2017 AACR.
Collapse
Affiliation(s)
- Masafumi Horie
- Department of Respiratory Medicine, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
- Division for Health Service Promotion, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
- Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies, Yokohama, Kanagawa, Japan
| | - Bogumil Kaczkowski
- Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies, Yokohama, Kanagawa, Japan.
| | - Mitsuhiro Ohshima
- Department of Biochemistry, Ohu University School of Pharmaceutical Sciences, Koriyama, Fukushima, Japan
| | - Hirotaka Matsuzaki
- Department of Respiratory Medicine, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Satoshi Noguchi
- Department of Respiratory Medicine, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Yu Mikami
- Department of Respiratory Medicine, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
- Department of Clinical Laboratory, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Marina Lizio
- Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies, Yokohama, Kanagawa, Japan
| | - Masayoshi Itoh
- Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies, Yokohama, Kanagawa, Japan
- RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama, Japan
| | - Hideya Kawaji
- Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies, Yokohama, Kanagawa, Japan
- RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama, Japan
| | - Timo Lassmann
- Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies, Yokohama, Kanagawa, Japan
- Telethon Kids Institute, the University of Western Australia, Perth, Western Australia, Australia
| | - Piero Carninci
- Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies, Yokohama, Kanagawa, Japan
| | | | - Alistair R R Forrest
- Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies, Yokohama, Kanagawa, Japan
- Harry Perkins Institute of Medical Research, QEII Medical Centre and Centre for Medical Research, the University of Western Australia, Nedlands, Western Australia, Australia
| | - Daiya Takai
- Department of Clinical Laboratory, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Yoko Yamaguchi
- Department of Biochemistry, Nihon University School of Dentistry, Chiyoda-ku, Tokyo, Japan
- Division of Functional Morphology Dental Research Center Nihon University School of Dentistry, Chiyoda-ku, Tokyo, Japan
| | - Patrick Micke
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Akira Saito
- Department of Respiratory Medicine, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.
- Division for Health Service Promotion, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Takahide Nagase
- Department of Respiratory Medicine, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| |
Collapse
|
25
|
Whole-genome sequencing of chronic lymphocytic leukaemia reveals distinct differences in the mutational landscape between IgHV mut and IgHV unmut subgroups. Leukemia 2017; 32:332-342. [PMID: 28584254 PMCID: PMC5808074 DOI: 10.1038/leu.2017.177] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Revised: 04/18/2017] [Accepted: 05/17/2017] [Indexed: 01/02/2023]
Abstract
Chronic lymphocytic leukaemia (CLL) consists of two biologically and clinically distinct subtypes defined by the abundance of somatic hypermutation (SHM) affecting the Ig variable heavy-chain locus (IgHV). The molecular mechanisms underlying these subtypes are incompletely understood. Here, we present a comprehensive whole-genome sequencing analysis of somatically acquired genetic events from 46 CLL patients, including a systematic comparison of coding and non-coding single-nucleotide variants, copy number variants and structural variants, regions of kataegis and mutation signatures between IgHVmut and IgHVunmut subtypes. We demonstrate that one-quarter of non-coding mutations in regions of kataegis outside the Ig loci are located in genes relevant to CLL. We show that non-coding mutations in ATM may negatively impact on ATM expression and find non-coding and regulatory region mutations in TCL1A, and in IgHVunmut CLL in IKZF3, SAMHD1,PAX5 and BIRC3. Finally, we show that IgHVunmut CLL is dominated by coding mutations in driver genes and an aging signature, whereas IgHVmut CLL has a high incidence of promoter and enhancer mutations caused by aberrant activation-induced cytidine deaminase activity. Taken together, our data support the hypothesis that differences in clinical outcome and biological characteristics between the two subgroups might reflect differences in mutation distribution, incidence and distinct underlying mutagenic mechanisms.
Collapse
|
26
|
SnoVault and encodeD: A novel object-based storage system and applications to ENCODE metadata. PLoS One 2017; 12:e0175310. [PMID: 28403240 PMCID: PMC5389787 DOI: 10.1371/journal.pone.0175310] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2016] [Accepted: 03/23/2017] [Indexed: 12/16/2022] Open
Abstract
The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database) and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data) has been released as a separate Python package.
Collapse
|
27
|
Hatakeyama M, Opitz L, Russo G, Qi W, Schlapbach R, Rehrauer H. SUSHI: an exquisite recipe for fully documented, reproducible and reusable NGS data analysis. BMC Bioinformatics 2016; 17:228. [PMID: 27255077 PMCID: PMC4890512 DOI: 10.1186/s12859-016-1104-8] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2016] [Accepted: 05/26/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Next generation sequencing (NGS) produces massive datasets consisting of billions of reads and up to thousands of samples. Subsequent bioinformatic analysis is typically done with the help of open source tools, where each application performs a single step towards the final result. This situation leaves the bioinformaticians with the tasks to combine the tools, manage the data files and meta-information, document the analysis, and ensure reproducibility. RESULTS We present SUSHI, an agile data analysis framework that relieves bioinformaticians from the administrative challenges of their data analysis. SUSHI lets users build reproducible data analysis workflows from individual applications and manages the input data, the parameters, meta-information with user-driven semantics, and the job scripts. As distinguishing features, SUSHI provides an expert command line interface as well as a convenient web interface to run bioinformatics tools. SUSHI datasets are self-contained and self-documented on the file system. This makes them fully reproducible and ready to be shared. With the associated meta-information being formatted as plain text tables, the datasets can be readily further analyzed and interpreted outside SUSHI. CONCLUSION SUSHI provides an exquisite recipe for analysing NGS data. By following the SUSHI recipe, SUSHI makes data analysis straightforward and takes care of documentation and administration tasks. Thus, the user can fully dedicate his time to the analysis itself. SUSHI is suitable for use by bioinformaticians as well as life science researchers. It is targeted for, but by no means constrained to, NGS data analysis. Our SUSHI instance is in productive use and has served as data analysis interface for more than 1000 data analysis projects. SUSHI source code as well as a demo server are freely available.
Collapse
Affiliation(s)
- Masaomi Hatakeyama
- Functional Genomics Center Zurich, ETH Zurich/University of Zurich, Winterthurerstrasse. 190, 8057, Zurich, Switzerland.,Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse. 190, 8057, Zurich, Switzerland
| | - Lennart Opitz
- Functional Genomics Center Zurich, ETH Zurich/University of Zurich, Winterthurerstrasse. 190, 8057, Zurich, Switzerland
| | - Giancarlo Russo
- Functional Genomics Center Zurich, ETH Zurich/University of Zurich, Winterthurerstrasse. 190, 8057, Zurich, Switzerland
| | - Weihong Qi
- Functional Genomics Center Zurich, ETH Zurich/University of Zurich, Winterthurerstrasse. 190, 8057, Zurich, Switzerland
| | - Ralph Schlapbach
- Functional Genomics Center Zurich, ETH Zurich/University of Zurich, Winterthurerstrasse. 190, 8057, Zurich, Switzerland
| | - Hubert Rehrauer
- Functional Genomics Center Zurich, ETH Zurich/University of Zurich, Winterthurerstrasse. 190, 8057, Zurich, Switzerland.
| |
Collapse
|