1
|
The Galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update. Nucleic Acids Res 2024:gkae410. [PMID: 38769056 DOI: 10.1093/nar/gkae410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 04/18/2024] [Accepted: 05/02/2024] [Indexed: 05/22/2024] Open
Abstract
Galaxy (https://galaxyproject.org) is deployed globally, predominantly through free-to-use services, supporting user-driven research that broadens in scope each year. Users are attracted to public Galaxy services by platform stability, tool and reference dataset diversity, training, support and integration, which enables complex, reproducible, shareable data analysis. Applying the principles of user experience design (UXD), has driven improvements in accessibility, tool discoverability through Galaxy Labs/subdomains, and a redesigned Galaxy ToolShed. Galaxy tool capabilities are progressing in two strategic directions: integrating general purpose graphical processing units (GPGPU) access for cutting-edge methods, and licensed tool support. Engagement with global research consortia is being increased by developing more workflows in Galaxy and by resourcing the public Galaxy services to run them. The Galaxy Training Network (GTN) portfolio has grown in both size, and accessibility, through learning paths and direct integration with Galaxy tools that feature in training courses. Code development continues in line with the Galaxy Project roadmap, with improvements to job scheduling and the user interface. Environmental impact assessment is also helping engage users and developers, reminding them of their role in sustainability, by displaying estimated CO2 emissions generated by each Galaxy job.
Collapse
|
2
|
Catching the Wave: Detecting Strain-Specific SARS-CoV-2 Peptides in Clinical Samples Collected during Infection Waves from Diverse Geographical Locations. Viruses 2022; 14:2205. [PMID: 36298760 PMCID: PMC9609567 DOI: 10.3390/v14102205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 10/04/2022] [Accepted: 10/05/2022] [Indexed: 11/05/2022] Open
Abstract
The Coronavirus disease 2019 (COVID-19) pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) resulted in a major health crisis worldwide with its continuously emerging new strains, resulting in new viral variants that drive "waves" of infection. PCR or antigen detection assays have been routinely used to detect clinical infections; however, the emergence of these newer strains has presented challenges in detection. One of the alternatives has been to detect and characterize variant-specific peptide sequences from viral proteins using mass spectrometry (MS)-based methods. MS methods can potentially help in both diagnostics and vaccine development by understanding the dynamic changes in the viral proteome associated with specific strains and infection waves. In this study, we developed an accessible, flexible, and shareable bioinformatics workflow that was implemented in the Galaxy Platform to detect variant-specific peptide sequences from MS data derived from the clinical samples. We demonstrated the utility of the workflow by characterizing published clinical data from across the world during various pandemic waves. Our analysis identified six SARS-CoV-2 variant-specific peptides suitable for confident detection by MS in commonly collected clinical samples.
Collapse
|
3
|
The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res 2022; 50:W345-W351. [PMID: 35446428 PMCID: PMC9252830 DOI: 10.1093/nar/gkac247] [Citation(s) in RCA: 250] [Impact Index Per Article: 125.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 03/17/2022] [Accepted: 03/30/2022] [Indexed: 01/19/2023] Open
Abstract
Galaxy is a mature, browser accessible workbench for scientific computing. It enables scientists to share, analyze and visualize their own data, with minimal technical impediments. A thriving global community continues to use, maintain and contribute to the project, with support from multiple national infrastructure providers that enable freely accessible analysis and training services. The Galaxy Training Network supports free, self-directed, virtual training with >230 integrated tutorials. Project engagement metrics have continued to grow over the last 2 years, including source code contributions, publications, software packages wrapped as tools, registered users and their daily analysis jobs, and new independent specialized servers. Key Galaxy technical developments include an improved user interface for launching large-scale analyses with many files, interactive tools for exploratory data analysis, and a complete suite of machine learning tools. Important scientific developments enabled by Galaxy include Vertebrate Genome Project (VGP) assembly workflows and global SARS-CoV-2 collaborations.
Collapse
|
4
|
Impact of high platelet turnover on the platelet transcriptome: Results from platelet RNA-sequencing in patients with sepsis. PLoS One 2022; 17:e0260222. [PMID: 35085240 PMCID: PMC8794123 DOI: 10.1371/journal.pone.0260222] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 11/04/2021] [Indexed: 12/13/2022] Open
Abstract
Background
Sepsis is associated with high platelet turnover and elevated levels of immature platelets. Changes in the platelet transcriptome and the specific impact of immature platelets on the platelet transcriptome remain unclear. Thus, this study sought to address whether and how elevated levels of immature platelets affect the platelet transcriptome in patients with sepsis.
Methods
Blood samples were obtained from patients with sepsis requiring vasopressor therapy (n = 8) and from a control group of patients with stable coronary artery disease and otherwise similar demographic characteristics (n = 8). Immature platelet fraction (IPF) was determined on a Sysmex XE 2100 analyser and platelet function was tested by impedance aggregometry. RNA from leukocyte-depleted platelets was used for transcriptome analysis by Next Generation Sequencing integrating the use of unique molecular identifiers.
Results
IPF (median [interquartile range]) was significantly elevated in sepsis patients (6.4 [5.3–8.7] % vs. 3.6 [2.6–4.6] %, p = 0.005). Platelet function testing revealed no differences in adenosine diphosphate- or thrombin receptor activating peptide-induced platelet aggregation between control and sepsis patients. Putative circular RNA transcripts were decreased in platelets from septic patients. Leukocyte contamination defined by CD45 abundance levels in RNA-sequencing was absent in both groups. Principal component analysis of transcripts showed only partial overlap of clustering with IPF levels. RNA sequencing showed up-regulation of 524 and down-regulation of 118 genes in platelets from sepsis patients compared to controls. Upregulated genes were mostly related to catabolic processes and protein translation. Comparison to published platelet transcriptomes showed a large overlap of changes observed in sepsis and COVID-19 but not with reticulated platelets from healthy donors.
Conclusions
Patients with sepsis appear to have a less degraded platelet transcriptome as indicated by increased levels of immature platelets and decreased levels of putative circular RNA transcripts. The present data suggests that increased protein translation is a characteristic mechanism of systemic inflammation.
Collapse
|
5
|
Abstract
The COVID-19 pandemic is shifting teaching to an online setting all over the world. The Galaxy framework facilitates the online learning process and makes it accessible by providing a library of high-quality community-curated training materials, enabling easy access to data and tools, and facilitates sharing achievements and progress between students and instructors. By combining Galaxy with robust communication channels, effective instruction can be designed inclusively, regardless of the students' environments.
Collapse
|
6
|
A rigorous evaluation of optimal peptide targets for MS-based clinical diagnostics of Coronavirus Disease 2019 (COVID-19). MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021:2021.02.09.21251427. [PMID: 33688669 PMCID: PMC7941646 DOI: 10.1101/2021.02.09.21251427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The Coronavirus Disease 2019 (COVID-19) global pandemic has had a profound, lasting impact on the world's population. A key aspect to providing care for those with COVID-19 and checking its further spread is early and accurate diagnosis of infection, which has been generally done via methods for amplifying and detecting viral RNA molecules. Detection and quantitation of peptides using targeted mass spectrometry-based strategies has been proposed as an alternative diagnostic tool due to direct detection of molecular indicators from non-invasively collected samples as well as the potential for high-throughput analysis in a clinical setting; many studies have revealed the presence of viral peptides within easily accessed patient samples. However, evidence suggests that some viral peptides could serve as better indicators of COVID-19 infection status than others, due to potential misidentification of peptides derived from human host proteins, poor spectral quality, high limits of detection etc. In this study we have compiled a list of 639 peptides identified from Sudden Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) samples, including from in vitro and clinical sources. These datasets were rigorously analyzed using automated, Galaxy-based workflows containing tools such as PepQuery, BLAST-P, and the Multi-omic Visualization Platform as well as the open-source tools MetaTryp and Proteomics Data Viewer (PDV). Using PepQuery for confirming peptide spectrum matches, we were able to narrow down the 639 peptide possibilities to 87 peptides which were most robustly detected and specific to the SARS-CoV-2 virus. The specificity of these sequences to coronavirus taxa was confirmed using Unipept and BLAST-P. Applying stringent statistical scoring thresholds, combined with manual verification of peptide spectrum match quality, 4 peptides derived from the nucleocapsid phosphoprotein and membrane protein were found to be most robustly detected across all cell culture and clinical samples, including those collected non-invasively. We propose that these peptides would be of the most value for clinical proteomics applications seeking to detect COVID-19 from a variety of sample types. We also contend that samples taken from the upper respiratory tract and oral cavity have the highest potential for diagnosis of SARS-CoV-2 infection from easily collected patient samples using mass spectrometry-based proteomics assays.
Collapse
|
7
|
Decitabine Induces Gene Derepression on Monosomic Chromosomes: In Vitro and In Vivo Effects in Adverse-Risk Cytogenetics AML. Cancer Res 2020; 81:834-846. [PMID: 33203699 DOI: 10.1158/0008-5472.can-20-1430] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 08/21/2020] [Accepted: 11/12/2020] [Indexed: 11/16/2022]
Abstract
Hypomethylating agents (HMA) have become the backbone of nonintensive acute myeloid leukemia/myelodysplastic syndrome (AML/MDS) treatment, also by virtue of their activity in patients with adverse genetics, for example, monosomal karyotypes, often with losses on chromosome 7, 5, or 17. No comparable activity is observed with cytarabine, a cytidine analogue without DNA-hypomethylating properties. As evidence exists for compounding hypermethylation and gene silencing of hemizygous tumor suppressor genes (TSG), we thus hypothesized that this effect may preferentially be reversed by the HMAs decitabine and azacitidine. An unbiased RNA-sequencing approach was developed to interrogate decitabine-induced transcriptome changes in AML cell lines with or without a deletion of chromosomes 7q, 5q or 17p. HMA treatment preferentially upregulated several hemizygous TSG in this genomic region, significantly derepressing endogenous retrovirus (ERV)3-1, with promoter demethylation, enhanced chromatin accessibility, and increased H3K4me3 levels. Decitabine globally reactivated multiple transposable elements, with activation of the dsRNA sensor RIG-I and interferon regulatory factor (IRF)7. Induction of ERV3-1 and RIG-I mRNA was also observed during decitabine treatment in vivo in serially sorted peripheral blood AML blasts. In patient-derived monosomal karyotype AML murine xenografts, decitabine treatment resulted in superior survival rates compared with cytarabine. Collectively, these data demonstrate preferential gene derepression and ERV reactivation in AML with chromosomal deletions, providing a mechanistic explanation that supports the clinical observation of superiority of HMA over cytarabine in this difficult-to-treat patient group. SIGNIFICANCE: These findings unravel the molecular mechanism underlying the intriguing clinical activity of HMAs in AML/MDS patients with chromosome 7 deletions and other monosomal karyotypes.See related commentary by O'Hagan et al., p. 813.
Collapse
|
8
|
Intuitive, reproducible high-throughput molecular dynamics in Galaxy: a tutorial. J Cheminform 2020; 12:54. [PMID: 33431030 PMCID: PMC7488338 DOI: 10.1186/s13321-020-00451-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Accepted: 07/27/2020] [Indexed: 11/10/2022] Open
Abstract
This paper is a tutorial developed for the data analysis platform Galaxy. The purpose of Galaxy is to make high-throughput computational data analysis, such as molecular dynamics, a structured, reproducible and transparent process. In this tutorial we focus on 3 questions: How are protein-ligand systems parameterized for molecular dynamics simulation? What kind of analysis can be carried out on molecular trajectories? How can high-throughput MD be used to study multiple ligands? After finishing you will have learned about force-fields and MD parameterization, how to conduct MD simulation and analysis for a protein-ligand system, and understand how different molecular interactions contribute to the binding affinity of ligands to the Hsp90 protein.
Collapse
|
9
|
GLASSgo in Galaxy: high-throughput, reproducible and easy-to-integrate prediction of sRNA homologs. Bioinformatics 2020; 36:4357-4359. [PMID: 32492127 PMCID: PMC7520042 DOI: 10.1093/bioinformatics/btaa556] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 05/13/2020] [Accepted: 05/29/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The correct prediction of bacterial sRNA homologs is a prerequisite for many downstream analyses based on comparative genomics, but it is frequently challenging due to the short length and distinct heterogeneity of such homologs. GLobal Automatic Small RNA Search go (GLASSgo) is an efficient tool for the prediction of sRNA homologs from a single input query. To make the algorithm available to a broader community, we offer a Docker container along with a free-access web service. For non-computer scientists, the web service provides a user-friendly interface. However, capabilities were lacking so far for batch processing, version control and direct interaction with compatible software applications as a workflow management system can provide. RESULTS Here, we present GLASSgo 1.5.2, an updated version that is fully incorporated into the workflow management system Galaxy. The improved version contains a new feature for extracting the upstream regions, allowing the search for conserved promoter elements. Additionally, it supports the use of accession numbers instead of the outdated GI numbers, which widens the applicability of the tool. AVAILABILITY AND IMPLEMENTATION GLASSgo is available at https://github.com/lotts/GLASSgo/ under the MIT license and is accompanied by instruction and application data. Furthermore, it can be installed into any Galaxy instance using the Galaxy ToolShed.
Collapse
|
10
|
Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization. Nucleic Acids Res 2020; 48:W177-W184. [PMID: 32301980 PMCID: PMC7319437 DOI: 10.1093/nar/gkaa220] [Citation(s) in RCA: 132] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 03/11/2020] [Accepted: 03/24/2020] [Indexed: 12/20/2022] Open
Abstract
The Galaxy HiCExplorer provides a web service at https://hicexplorer.usegalaxy.eu. It enables the integrative analysis of chromosome conformation by providing tools and computational resources to pre-process, analyse and visualize Hi-C, Capture Hi-C (cHi-C) and single-cell Hi-C (scHi-C) data. Since the last publication, Galaxy HiCExplorer has been expanded considerably with new tools to facilitate the analysis of cHi-C and to provide an in-depth analysis of Hi-C data. Moreover, it supports the analysis of scHi-C data by offering a broad range of tools. With the help of the standard graphical user interface of Galaxy, presented workflows, extensive documentation and tutorials, novices as well as Hi-C experts are supported in their Hi-C data analysis with Galaxy HiCExplorer.
Collapse
|
11
|
The ChemicalToolbox: reproducible, user-friendly cheminformatics analysis on the Galaxy platform. J Cheminform 2020; 12:40. [PMID: 33431029 PMCID: PMC7268608 DOI: 10.1186/s13321-020-00442-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 05/16/2020] [Indexed: 01/14/2023] Open
Abstract
Here, we introduce the ChemicalToolbox, a publicly available web server for performing cheminformatics analysis. The ChemicalToolbox provides an intuitive, graphical interface for common tools for downloading, filtering, visualizing and simulating small molecules and proteins. The ChemicalToolbox is based on Galaxy, an open-source web-based platform which enables accessible and reproducible data analysis. There is already an active Galaxy cheminformatics community using and developing tools. Based on their work, we provide four example workflows which illustrate the capabilities of the ChemicalToolbox, covering assembly of a compound library, hole filling, protein-ligand docking, and construction of a quantitative structure-activity relationship (QSAR) model. These workflows may be modified and combined flexibly, together with the many other tools available, to fit the needs of a particular project. The ChemicalToolbox is hosted on the European Galaxy server and may be accessed via https://cheminformatics.usegalaxy.eu.
Collapse
|
12
|
Galaxy HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization. Nucleic Acids Res 2019; 46:W11-W16. [PMID: 29901812 PMCID: PMC6031062 DOI: 10.1093/nar/gky504] [Citation(s) in RCA: 123] [Impact Index Per Article: 24.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Accepted: 05/22/2018] [Indexed: 11/13/2022] Open
Abstract
Galaxy HiCExplorer is a web server that facilitates the study of the 3D conformation of chromatin by allowing Hi-C data processing, analysis and visualization. With the Galaxy HiCExplorer web server, users with little bioinformatic background can perform every step of the analysis in one workflow: mapping of the raw sequence data, creation of Hi-C contact matrices, quality assessment, correction of contact matrices and identification of topological associated domains (TADs) and A/B compartments. Users can create publication ready plots of the contact matrix, A/B compartments, and TADs on a selected genomic locus, along with additional information like gene tracks or ChIP-seq signals. Galaxy HiCExplorer is freely usable at: https://hicexplorer.usegalaxy.eu and is available as a Docker container: https://github.com/deeptools/docker-galaxy-hicexplorer.
Collapse
|
13
|
The RNA workbench: best practices for RNA and high-throughput sequencing bioinformatics in Galaxy. Nucleic Acids Res 2019; 45:W560-W566. [PMID: 28582575 PMCID: PMC5570170 DOI: 10.1093/nar/gkx409] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2017] [Accepted: 05/31/2017] [Indexed: 01/23/2023] Open
Abstract
RNA-based regulation has become a major research topic in molecular biology. The analysis of epigenetic and expression data is therefore incomplete if RNA-based regulation is not taken into account. Thus, it is increasingly important but not yet standard to combine RNA-centric data and analysis tools with other types of experimental data such as RNA-seq or ChIP-seq. Here, we present the RNA workbench, a comprehensive set of analysis tools and consolidated workflows that enable the researcher to combine these two worlds. Based on the Galaxy framework the workbench guarantees simple access, easy extension, flexible adaption to personal and security needs, and sophisticated analyses that are independent of command-line knowledge. Currently, it includes more than 50 bioinformatics tools that are dedicated to different research areas of RNA biology including RNA structure analysis, RNA alignment, RNA annotation, RNA-protein interaction, ribosome profiling, RNA-seq analysis and RNA target prediction. The workbench is developed and maintained by experts in RNA bioinformatics and the Galaxy framework. Together with the growing community evolving around this workbench, we are committed to keep the workbench up-to-date for future standards and needs, providing researchers with a reliable and robust framework for RNA data analysis. AVAILABILITY The RNA workbench is available at https://github.com/bgruening/galaxy-rna-workbench.
Collapse
|
14
|
Software engineering for scientific big data analysis. Gigascience 2019; 8:giz054. [PMID: 31121028 PMCID: PMC6532757 DOI: 10.1093/gigascience/giz054] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Revised: 01/20/2019] [Accepted: 04/18/2019] [Indexed: 11/14/2022] Open
Abstract
The increasing complexity of data and analysis methods has created an environment where scientists, who may not have formal training, are finding themselves playing the impromptu role of software engineer. While several resources are available for introducing scientists to the basics of programming, researchers have been left with little guidance on approaches needed to advance to the next level for the development of robust, large-scale data analysis tools that are amenable to integration into workflow management systems, tools, and frameworks. The integration into such workflow systems necessitates additional requirements on computational tools, such as adherence to standard conventions for robustness, data input, output, logging, and flow control. Here we provide a set of 10 guidelines to steer the creation of command-line computational tools that are usable, reliable, extensible, and in line with standards of modern coding practices.
Collapse
|
15
|
Parkour LIMS: high-quality sample preparation in next generation sequencing. Bioinformatics 2019; 35:1422-1424. [PMID: 30239601 DOI: 10.1093/bioinformatics/bty820] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 08/29/2018] [Accepted: 09/18/2018] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION This paper presents Parkour, a software package for sample processing and quality management of next generation sequencing data and samples. RESULTS Starting with user requests, Parkour allows tracking and assessing samples based on predefined quality criteria through different stages of the sample preparation workflow. Ideally suited for academic core laboratories, the software aims to maximize efficiency and reduce turnaround time by intelligent sample grouping and a clear assignment of staff to work units. Tools for automated invoicing, interactive statistics on facility usage and simple report generation minimize administrative tasks. Provided as a web application, Parkour is a convenient tool for both deep sequencing service users and laboratory personal. A set of web APIs allow coordinated information sharing with local and remote bioinformaticians. The flexible structure allows workflow customization and simple addition of new features as well as the expansion to other domains. AVAILABILITY AND IMPLEMENTATION The code and documentation are available at https://github.com/maxplanck-ie/parkour. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
16
|
Uncontrolled Diabetes Mellitus Has No Major Influence on the Platelet Transcriptome. BIOMED RESEARCH INTERNATIONAL 2018; 2018:8989252. [PMID: 30519591 PMCID: PMC6241365 DOI: 10.1155/2018/8989252] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2018] [Revised: 09/26/2018] [Accepted: 10/11/2018] [Indexed: 12/18/2022]
Abstract
BACKGROUND Diabetes mellitus (DM) has been associated with increased platelet reactivity as well as increased levels of platelet RNAs in plasma. Here, we sought to evaluate whether the platelet transcriptome is altered in the presence of uncontrolled DM. METHODS Next-generation sequencing (NGS) was performed on platelet RNA for 5 patients with uncontrolled DM (HbA1c 9.0%) and 5 control patients (HbA1c 5.5%) with otherwise similar clinical characteristics. RNA was isolated from leucocyte-depleted platelet-rich plasma. Libraries of platelet RNAs were created separately for long RNAs after ribosomal depletion and for small RNAs from total RNA, followed by next-generation sequencing. RESULTS Platelets in both groups demonstrated RNA expression profiles characterized by absence of leukocyte-specific transcripts, high expression of well-known platelet transcripts, and in total 6,343 consistently detectable transcripts. Extensive statistical bioinformatic analysis yielded 12 genes with consistently differential expression at a lenient FDR < 0.1, thereof 8 protein-coding genes and 2 genes with known expression in platelets (MACF1 and ITGB3BP). Three of the four differentially expressed noncoding genes were YRNAs (RNY1, RNY3, and RNY4) which were all downregulated in DM. 23 miRNAs were differentially expressed between the two groups. Of the 13 miRNAs with decreased expression in the diabetic group, 8 belonged to the DLK1-DIO3 gene region on chromosome 14q32.2. CONCLUSIONS In this study, uncontrolled DM had a remote impact on different components of the platelet transcriptome. Increased expression of MACF1, together with supporting predicted mRNA-miRNA interactions as well as reduced expression of RNYs in platelets, may reflect subclinical platelet activation in uncontrolled DM.
Collapse
|
17
|
The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res 2018; 46:W537-W544. [PMID: 29790989 PMCID: PMC6030816 DOI: 10.1093/nar/gky379] [Citation(s) in RCA: 2148] [Impact Index Per Article: 358.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Revised: 04/25/2018] [Accepted: 05/02/2018] [Indexed: 02/06/2023] Open
Abstract
Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially.
Collapse
|
18
|
BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics 2018; 33:2580-2582. [PMID: 28379341 PMCID: PMC5870671 DOI: 10.1093/bioinformatics/btx192] [Citation(s) in RCA: 133] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2016] [Accepted: 03/29/2017] [Indexed: 12/16/2022] Open
Abstract
Motivation BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters). Availability and Implementation The software is freely available at github.com/BioContainers/.
Collapse
|
19
|
Distinct epigenetic programs regulate cardiac myocyte development and disease in the human heart in vivo. Nat Commun 2018; 9:391. [PMID: 29374152 PMCID: PMC5786002 DOI: 10.1038/s41467-017-02762-z] [Citation(s) in RCA: 153] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Accepted: 12/22/2017] [Indexed: 02/04/2023] Open
Abstract
Epigenetic mechanisms and transcription factor networks essential for differentiation of cardiac myocytes have been uncovered. However, reshaping of the epigenome of these terminally differentiated cells during fetal development, postnatal maturation, and in disease remains unknown. Here, we investigate the dynamics of the cardiac myocyte epigenome during development and in chronic heart failure. We find that prenatal development and postnatal maturation are characterized by a cooperation of active CpG methylation and histone marks at cis-regulatory and genic regions to shape the cardiac myocyte transcriptome. In contrast, pathological gene expression in terminal heart failure is accompanied by changes in active histone marks without major alterations in CpG methylation and repressive chromatin marks. Notably, cis-regulatory regions in cardiac myocytes are significantly enriched for cardiovascular disease-associated variants. This study uncovers distinct layers of epigenetic regulation not only during prenatal development and postnatal maturation but also in diseased human cardiac myocytes.
Collapse
|
20
|
High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat Commun 2018; 9:189. [PMID: 29335486 PMCID: PMC5768762 DOI: 10.1038/s41467-017-02525-w] [Citation(s) in RCA: 453] [Impact Index Per Article: 75.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2017] [Accepted: 12/06/2017] [Indexed: 11/09/2022] Open
Abstract
Despite an abundance of new studies about topologically associating domains (TADs), the role of genetic information in TAD formation is still not fully understood. Here we use our software, HiCExplorer (hicexplorer.readthedocs.io) to annotate >2800 high-resolution (570 bp) TAD boundaries in Drosophila melanogaster. We identify eight DNA motifs enriched at boundaries, including a motif bound by the M1BP protein, and two new boundary motifs. In contrast to mammals, the CTCF motif is only enriched on a small fraction of boundaries flanking inactive chromatin while most active boundaries contain the motifs bound by the M1BP or Beaf-32 proteins. We demonstrate that boundaries can be accurately predicted using only the motif sequences at open chromatin sites. We propose that DNA sequence guides the genome architecture by allocation of boundary proteins in the genome. Finally, we present an interactive online database to access and explore the spatial organization of fly, mouse and human genomes, available at http://chorogenome.ie-freiburg.mpg.de .
Collapse
|
21
|
DNA methylation signatures follow preformed chromatin compartments in cardiac myocytes. Nat Commun 2017; 8:1667. [PMID: 29162810 PMCID: PMC5698409 DOI: 10.1038/s41467-017-01724-9] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Accepted: 10/10/2017] [Indexed: 12/29/2022] Open
Abstract
Storage of chromatin in restricted nuclear space requires dense packing while ensuring DNA accessibility. Thus, different layers of chromatin organization and epigenetic control mechanisms exist. Genome-wide chromatin interaction maps revealed large interaction domains (TADs) and higher order A and B compartments, reflecting active and inactive chromatin, respectively. The mutual dependencies between chromatin organization and patterns of epigenetic marks, including DNA methylation, remain poorly understood. Here, we demonstrate that establishment of A/B compartments precedes and defines DNA methylation signatures during differentiation and maturation of cardiac myocytes. Remarkably, dynamic CpG and non-CpG methylation in cardiac myocytes is confined to A compartments. Furthermore, genetic ablation or reduction of DNA methylation in embryonic stem cells or cardiac myocytes, respectively, does not alter genome-wide chromatin organization. Thus, DNA methylation appears to be established in preformed chromatin compartments and may be dispensable for the formation of higher order chromatin organization. Chromatin is organized in higher order A and B compartments, reflecting active and inactive chromatin. Here, the authors provide evidence that in cardiac myocytes DNA methylation is established in preformed chromatin compartments and may be dispensable for higher order chromatin organization.
Collapse
|
22
|
5'-Hydroxymethylcytosine Precedes Loss of CpG Methylation in Enhancers and Genes Undergoing Activation in Cardiomyocyte Maturation. PLoS One 2016; 11:e0166575. [PMID: 27851806 PMCID: PMC5112848 DOI: 10.1371/journal.pone.0166575] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 10/31/2016] [Indexed: 01/17/2023] Open
Abstract
Background Cardiomyocytes undergo major changes in DNA methylation during maturation and transition to a non-proliferative state after birth. 5’-hydroxylation of methylated cytosines (5hmC) is not only involved in DNA loss of CpG methylation but is also thought to be an epigenetic mark with unique distribution and functions. Here, we sought to get insight into the dynamics of 5’-hydroxymethylcytosine in newborn and adult cardiomyocytes. Methods Cardiomyocyte nuclei from newborn and adult C57BL/6 mice were purified by flow cytometric sorting. 5hmC-containing DNA was captured by selective chemical labeling, followed by deep sequencing. Sequencing reads of library replicates were mapped independently (n = 3 for newborn, n = 2 for adult mice) and merged for further analysis steps. 5hmC coverage was normalized to read length and the total number of mapped reads (RPKM). MethylC-Seq, ChIP-Seq and RNA-Seq data sets of newborn and adult cardiomyocytes served to elucidate specific features of 5hmC at gene bodies and around low methylated regions (LMRs) representing regulatory genomic regions with enhancer function. Results 163,544 and 315,220 5hmC peaks were identified in P1 and adult cardiomyocytes, respectively. Of these peaks, 66,641 were common between P1 and adult cardiomyocytes with more than 50% reciprocal overlap. P1 and adult 5hmC peaks were overrepresented in genic features such as exons, introns, 3’- and 5’-untranslated regions (UTRs), promotors and transcription end sites (TES). During cardiomyocyte maturation, 5hmC was found to be enriched at sites of subsequent DNA loss of CpG methylation such as gene bodies of upregulated genes (i.e. Atp2a2, Tnni3, Mb, Pdk4). Additionally, centers of postnatally established enhancers were premarked by 5hmC before DNA loss of CpG methylation. Conclusions Simultaneous analysis of 5hmC-Seq, MethylC-Seq, RNA-Seq and ChIP-Seq data at two defined time points of cardiomyocyte maturation demonstrates that 5hmC is positively associated with gene expression and decorates sites of subsequent DNA loss of CpG methylation.
Collapse
|
23
|
PubMedPortable: A Framework for Supporting the Development of Text Mining Applications. PLoS One 2016; 11:e0163794. [PMID: 27706202 PMCID: PMC5051953 DOI: 10.1371/journal.pone.0163794] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Accepted: 09/14/2016] [Indexed: 11/18/2022] Open
Abstract
Information extraction from biomedical literature is continuously growing in scope and importance. Many tools exist that perform named entity recognition, e.g. of proteins, chemical compounds, and diseases. Furthermore, several approaches deal with the extraction of relations between identified entities. The BioCreative community supports these developments with yearly open challenges, which led to a standardised XML text annotation format called BioC. PubMed provides access to the largest open biomedical literature repository, but there is no unified way of connecting its data to natural language processing tools. Therefore, an appropriate data environment is needed as a basis to combine different software solutions and to develop customised text mining applications. PubMedPortable builds a relational database and a full text index on PubMed citations. It can be applied either to the complete PubMed data set or an arbitrary subset of downloaded PubMed XML files. The software provides the infrastructure to combine stand-alone applications by exporting different data formats, e.g. BioC. The presented workflows show how to use PubMedPortable to retrieve, store, and analyse a disease-specific data set. The provided use cases are well documented in the PubMedPortable wiki. The open-source software library is small, easy to use, and scalable to the user's system requirements. It is freely available for Linux on the web at https://github.com/KerstenDoering/PubMedPortable and for other operating systems as a virtual container. The approach was tested extensively and applied successfully in several projects.
Collapse
|
24
|
Echinocandin B biosynthesis: a biosynthetic cluster from Aspergillus nidulans NRRL 8112 and reassembly of the subclusters Ecd and Hty from Aspergillus pachycristatus NRRL 11440 reveals a single coherent gene cluster. BMC Genomics 2016; 17:570. [PMID: 27502607 PMCID: PMC4977696 DOI: 10.1186/s12864-016-2885-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Accepted: 07/06/2016] [Indexed: 11/10/2022] Open
Abstract
Background Echinocandins are nonribosomal lipopeptides produced by ascommycete fungi. Due to their strong inhibitory effect on fungal cell wall biosynthesis and lack of human toxicity, they have been developed to an important class of antifungal drugs. Since 2012, the biosynthetic gene clusters of most of the main echinocandin variants have been characterized. Especially the comparison of the clusters allows a deeper insight for the biosynthesis of these complex structures. Results In the genome of the echinocandin B producer Aspergillus nidulans NRRL 8112 we have identified a gene cluster (Ani) that encodes echinocandin biosynthesis. Sequence analyses showed that Ani is clearly delimited from the genomic context and forms a monophyletic lineage with the other echinocandin gene clusters. Importantly, we found that the disjunct genomic location of the echinocandin B gene cluster in A. pachycristatus NRRL 11440 on two separate subclusters, Ecd and Hty, at two loci was likely an artifact of genome misassembly in the absence of a reference sequence. We show that both sequences can be aligned resulting a single cluster with a gene arrangement collinear compared to other clusters of Aspergillus section Nidulantes. The reassembled gene cluster (Ecd/Hty) is identical to a putative gene cluster (AE) that was previously deposited at the NCBI as a sequence from A. delacroxii NRRL 3860. PCR amplification of a part of the gene cluster resulted a sequence that was very similar (97 % identity), but not identical to that of AE. Conclusions The Echinocandin B biosynthetic cluster from A. nidulans NRRL 8112 (Ani) is particularly similar to that of A. pachycristatus NRRL 11440 (Ecd/Hty). Ecd/Hty was originally reported as two disjunct sub-clusters Ecd and Hty, but is in fact a continuous sequence with the same gene order as in Ani. According to sequences of PCR products amplified from genomic DNA, the echinocandin B producer A. delacroxii NRRL 3860 is closely related to A. pachycristatus NRRL 11440. A PCR-product from the gene cluster was very similar, but clearly distinct from the sequence published for A. delacroxii NRRL 3860 at the NCBI (No. AB720074). As the NCBI entry is virtually identical with the re-assembled Ecd/Hty cluster, it is likely that it originates from A. pachycristatus NRRL 11440 rather than A. delacroxii NRRL 3860. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2885-x) contains supplementary material, which is available to authorized users.
Collapse
|
25
|
DOT1L Activity Promotes Proliferation and Protects Cortical Neural Stem Cells from Activation of ATF4-DDIT3-Mediated ER Stress In Vitro. Stem Cells 2015; 34:233-45. [PMID: 26299268 DOI: 10.1002/stem.2187] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2015] [Revised: 07/30/2015] [Accepted: 08/07/2015] [Indexed: 12/28/2022]
Abstract
Growing evidence suggests that the lysine methyltransferase DOT1L/KMT4 has important roles in proliferation, survival, and differentiation of stem cells in development and in disease. We investigated the function of DOT1L in neural stem cells (NSCs) of the cerebral cortex. The pharmacological inhibition and shRNA-mediated knockdown of DOT1L impaired proliferation and survival of NSCs. DOT1L inhibition specifically induced genes that are activated during the unfolded protein response (UPR) in the endoplasmic reticulum (ER). Chromatin-immunoprecipitation analyses revealed that two genes encoding for central molecules involved in the ER stress response, Atf4 and Ddit3 (Chop), are marked with H3K79 methylation. Interference with DOT1L activity resulted in transcriptional activation of both genes accompanied by decreased levels of H3K79 dimethylation. Although downstream effectors of the UPR, such as Ppp1r15a/Gadd34, Atf3, and Tnfrsf10b/Dr5 were also transcriptionally activated, this most likely occurred in response to increased ATF4 expression rather than as a direct consequence of altered H3K79 methylation. While stem cells are particularly vulnerable to stress, the UPR and ER stress have not been extensively studied in these cells yet. Since activation of the ER stress program is also implicated in directing stem cells into differentiation or to maintain a proliferative status, the UPR must be tightly regulated. Our and published data suggest that histone modifications, including H3K4me3, H3K14ac, and H3K79me2, are implicated in the control of transcriptional activation of ER stress genes. In this context, the loss of H3K79me2 at the Atf4- and Ddit3-promoters appears to mark a point-of-no-return that activates the death program in NSCs.
Collapse
|
26
|
Abstract
BioJS is an open source software project that develops visualization tools for different types of biological data. Here we report on the factors that influenced the growth of the BioJS user and developer community, and outline our strategy for building on this growth. The lessons we have learned on BioJS may also be relevant to other open source software projects. DOI:http://dx.doi.org/10.7554/eLife.07009.001
Collapse
|
27
|
Abstract
RATIONALE Epigenetic mechanisms are crucial for cell identity and transcriptional control. The heart consists of different cell types, including cardiac myocytes, endothelial cells, fibroblasts, and others. Therefore, cell type-specific analysis is needed to gain mechanistic insight into the regulation of gene expression in cardiac myocytes. Although cytosolic mRNA represents steady-state levels, nuclear mRNA more closely reflects transcriptional activity. To unravel epigenetic mechanisms of transcriptional control, cell type-specific analysis of nuclear mRNA and epigenetic modifications is crucial. OBJECTIVE The aim was to purify cardiac myocyte nuclei from hearts of different species by magnetic- or fluorescent-assisted sorting and to determine the nuclear and cellular RNA expression profiles and epigenetic marks in a cardiac myocyte-specific manner. METHODS AND RESULTS Frozen cardiac tissue samples were used to isolate cardiac myocyte nuclei. High sorting purity was confirmed for cardiac myocyte nuclei isolated from mice, rats, and humans. Deep sequencing of nuclear RNA revealed a major fraction of nascent, unspliced RNA in contrast to results obtained from purified cardiac myocytes. Cardiac myocyte nuclear and cellular RNA expression profiles showed differences, especially for metabolic genes. Genome-wide maps of the transcriptional elongation mark H3K36me3 were generated by chromatin-immunoprecipitation. Transcriptome and epigenetic data confirmed the high degree of cardiac myocyte-specificity of our protocol. An integrative analysis of nuclear mRNA and histone mark occurrence indicated a major impact of the chromatin state on transcriptional activity in cardiac myocytes. CONCLUSIONS This study establishes cardiac myocyte-specific sorting of nuclei as a universal method to investigate epigenetic and transcriptional processes in cardiac myocytes of different origins. These data sets provide novel insight into cardiac myocyte transcription.
Collapse
|
28
|
Abstract
The screening of a reduced yet diverse and synthesizable region of the chemical space is a critical step in drug discovery. The ZINC database is nowadays routinely used to freely access and screen millions of commercially available compounds. We collected ∼125 million compounds from chemical catalogs and the ZINC database, yielding more than 68 million unique molecules, including a large portion of described natural products (NPs) and drugs. The data set was filtered using advanced medicinal chemistry rules to remove potentially toxic, promiscuous, metabolically labile, or reactive compounds. We studied the physicochemical properties of this compilation and identified millions of NP-like, fragment-like, inhibitors of protein-protein interactions (i-PPIs) like, and drug-like compounds. The related focused libraries were subjected to a detailed scaffold diversity analysis and compared to reference NPs and marketed drugs. This study revealed thousands of diverse chemotypes with distinct representations of building block combinations among the data sets. An analysis of the stereogenic and shape complexity properties of the libraries also showed that they present well-defined levels of complexity, following the tendency: i-PPIs-like < drug-like < fragment-like < NP-like. As the collected compounds have huge interest in drug discovery and particularly virtual screening and library design, we offer a freely available collection comprising over 37 million molecules under: http://pbox.pharmaceutical-bioinformatics.org , as well as the filtering rules used to build the focused libraries described herein.
Collapse
|
29
|
Autosomal dominant immune dysregulation syndrome in humans with CTLA4 mutations. Nat Med 2014; 20:1410-1416. [PMID: 25329329 PMCID: PMC4668597 DOI: 10.1038/nm.3746] [Citation(s) in RCA: 600] [Impact Index Per Article: 60.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Accepted: 10/14/2014] [Indexed: 12/14/2022]
Abstract
The protein cytotoxic T lymphocyte antigen-4 (CTLA-4) is an essential negative regulator of immune responses, and its loss causes fatal autoimmunity in mice. We studied a large family in which five individuals presented with a complex, autosomal dominant immune dysregulation syndrome characterized by hypogammaglobulinemia, recurrent infections and multiple autoimmune clinical features. We identified a heterozygous nonsense mutation in exon 1 of CTLA4. Screening of 71 unrelated patients with comparable clinical phenotypes identified five additional families (nine individuals) with previously undescribed splice site and missense mutations in CTLA4. Clinical penetrance was incomplete (eight adults of a total of 19 genetically proven CTLA4 mutation carriers were considered unaffected). However, CTLA-4 protein expression was decreased in regulatory T cells (Treg cells) in both patients and carriers with CTLA4 mutations. Whereas Treg cells were generally present at elevated numbers in these individuals, their suppressive function, CTLA-4 ligand binding and transendocytosis of CD80 were impaired. Mutations in CTLA4 were also associated with decreased circulating B cell numbers. Taken together, mutations in CTLA4 resulting in CTLA-4 haploinsufficiency or impaired ligand binding result in disrupted T and B cell homeostasis and a complex immune dysregulation syndrome.
Collapse
MESH Headings
- Adolescent
- Adult
- Agammaglobulinemia/genetics
- Agammaglobulinemia/immunology
- Anemia, Hemolytic, Autoimmune/genetics
- Anemia, Hemolytic, Autoimmune/immunology
- Animals
- Autoimmune Diseases/genetics
- Autoimmune Diseases/immunology
- B-Lymphocytes/immunology
- B7-1 Antigen/metabolism
- CTLA-4 Antigen/genetics
- CTLA-4 Antigen/immunology
- Child
- Codon, Nonsense
- Endocytosis/genetics
- Endocytosis/immunology
- Exons
- Female
- Granuloma/genetics
- Granuloma/immunology
- Heterozygote
- Humans
- Immune System Diseases/genetics
- Lung Diseases, Interstitial/genetics
- Lung Diseases, Interstitial/immunology
- Male
- Mice
- Middle Aged
- Mutation, Missense
- Pedigree
- Polyendocrinopathies, Autoimmune/genetics
- Polyendocrinopathies, Autoimmune/immunology
- Purpura, Thrombocytopenic, Idiopathic/genetics
- Purpura, Thrombocytopenic, Idiopathic/immunology
- Recurrence
- Respiratory Tract Infections/genetics
- Respiratory Tract Infections/immunology
- Syndrome
- T-Lymphocytes, Regulatory/immunology
- Young Adult
Collapse
|
30
|
Dynamic DNA methylation orchestrates cardiomyocyte development, maturation and disease. Nat Commun 2014; 5:5288. [PMID: 25335909 PMCID: PMC4220495 DOI: 10.1038/ncomms6288] [Citation(s) in RCA: 213] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2014] [Accepted: 09/17/2014] [Indexed: 01/20/2023] Open
Abstract
The heart is a highly specialized organ with essential function for the organism throughout life. The significance of DNA methylation in shaping the phenotype of the heart remains only partially known. Here we generate and analyse DNA methylomes from highly purified cardiomyocytes of neonatal, adult healthy and adult failing hearts. We identify large genomic regions that are differentially methylated during cardiomyocyte development and maturation. Demethylation of cardiomyocyte gene bodies correlates strongly with increased gene expression. Silencing of demethylated genes is characterized by the polycomb mark H3K27me3 or by DNA methylation. De novo methylation by DNA methyltransferases 3A/B causes repression of fetal cardiac genes, including essential components of the cardiac sarcomere. Failing cardiomyocytes partially resemble neonatal methylation patterns. This study establishes DNA methylation as a highly dynamic process during postnatal growth of cardiomyocytes and their adaptation to pathological stress in a process tightly linked to gene regulation and activity. DNA methylation is essential for proper gene expression, development and genome stability. Here the authors present whole-genome DNA methylation analyses of purified mouse cardiomyocytes from newborn, adult and failing hearts and find highly dynamic patterns between the three phenotypes of cardiomyocytes.
Collapse
|
31
|
Abstract
We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at http://deeptools.ie-freiburg.mpg.de and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy.
Collapse
|
32
|
Regio- and Stereoselective Intermolecular Oxidative Phenol Coupling in Streptomyces. J Am Chem Soc 2014; 136:6195-8. [DOI: 10.1021/ja501630w] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
33
|
|
34
|
Dynamic information system for small molecules. J Cheminform 2014. [PMCID: PMC3980058 DOI: 10.1186/1758-2946-6-s1-p28] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
35
|
Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology. PeerJ 2013; 1:e167. [PMID: 24109552 PMCID: PMC3792188 DOI: 10.7717/peerj.167] [Citation(s) in RCA: 101] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Accepted: 08/30/2013] [Indexed: 12/28/2022] Open
Abstract
The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of "effector" proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen's predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu).
Collapse
|
36
|
StreptomeDB: a resource for natural compounds isolated from Streptomyces species. Nucleic Acids Res 2013; 41:D1130-6. [PMID: 23193280 PMCID: PMC3531085 DOI: 10.1093/nar/gks1253] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2012] [Revised: 10/10/2012] [Accepted: 11/04/2012] [Indexed: 11/12/2022] Open
Abstract
Bacteria from the genus Streptomyces are very important for the production of natural bioactive compounds such as antibiotic, antitumour or immunosuppressant drugs. Around two-thirds of all known natural antibiotics are produced by these bacteria. An enormous quantity of crucial data related to this genus has been generated and published, but so far no freely available and comprehensive database exists. Here, we present StreptomeDB (http://www.pharmaceutical-bioinformatics.de/streptomedb/). To the best of our knowledge, this is the largest database of natural products isolated from Streptomyces. It contains >2400 unique and diverse compounds from >1900 different Streptomyces strains and substrains. In addition to names and molecular structures of the compounds, information about source organisms, references, biological role, activities and synthesis routes (e.g. polyketide synthase derived and non-ribosomal peptides derived) is included. Data can be accessed through queries on compound names, chemical structures or organisms. Extraction from the literature was performed through automatic text mining of thousands of articles from PubMed, followed by manual curation. All annotated compound structures can be downloaded from the website and applied for in silico screenings for identifying new active molecules with undiscovered properties.
Collapse
|
37
|
|
38
|
Small-molecule conversion of toxic oligomers to nontoxic β-sheet–rich amyloid fibrils. Nat Chem Biol 2011; 8:93-101. [DOI: 10.1038/nchembio.719] [Citation(s) in RCA: 355] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2011] [Accepted: 09/02/2011] [Indexed: 11/09/2022]
|
39
|
Abstract
SUMMARY Searching for certain compounds in literature can be an elaborate task, with many compounds having several different synonyms. Often, only the structure is known but not its name. Furthermore, rarely investigated compounds may not be described in the available literature at all. In such cases, preceding searches for described similar compounds facilitate literature mining. Highlighted names of proteins in selected texts may further accelerate the time-consuming process of literary research. Compounds In Literature (CIL) provides a web interface to automatically find names, structures, and similar structures in over 28 million compounds of PubChem and more than 18 million citations provided by the PubMed service. CIL's pre-calculated database contains more than 56 million parent compound-abstract relations. Found compounds, relatives and abstracts are related to proteins in a concise 'heat map'-like overview. Compounds and proteins are highlighted in their respective abstracts, and are provided with links to PubChem and UniProt. AVAILABILITY An easy-to-use web interface with detailed descriptions, help and statistics is available from http://cil.pharmaceutical-bioinformatics.de. CONTACT stefan.guenther@pharmazie.uni-freiburg.de.
Collapse
|