1
|
Rouhiainen A, Zhao X, Vanttola P, Qian K, Kulesskiy E, Kuja-Panula J, Gransalke K, Grönholm M, Unni E, Meistrich M, Tian L, Auvinen P, Rauvala H. HMGB4 is expressed by neuronal cells and affects the expression of genes involved in neural differentiation. Sci Rep 2016; 6:32960. [PMID: 27608812 PMCID: PMC5036535 DOI: 10.1038/srep32960] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Accepted: 08/18/2016] [Indexed: 12/21/2022] Open
Abstract
HMGB4 is a new member in the family of HMGB proteins that has been characterized in sperm cells, but little is known about its functions in somatic cells. Here we show that HMGB4 and the highly similar rat Transition Protein 4 (HMGB4L1) are expressed in neuronal cells. Both proteins had slow mobility in nucleus of living NIH-3T3 cells. They interacted with histones and their differential expression in transformed cells of the nervous system altered the post-translational modification statuses of histones in vitro. Overexpression of HMGB4 in HEK 293T cells made cells more susceptible to cell death induced by topoisomerase inhibitors in an oncology drug screening array and altered variant composition of histone H3. HMGB4 regulated over 800 genes in HEK 293T cells with a p-value ≤0.013 (n = 3) in a microarray analysis and displayed strongest association with adhesion and histone H2A –processes. In neuronal and transformed cells HMGB4 regulated the expression of an oligodendrocyte marker gene PPP1R14a and other neuronal differentiation marker genes. In conclusion, our data suggests that HMGB4 is a factor that regulates chromatin and expression of neuronal differentiation markers.
Collapse
Affiliation(s)
- Ari Rouhiainen
- Neuroscience center, University of Helsinki, Finland.,Department of Biosciences, University of Helsinki, Finland
| | - Xiang Zhao
- Neuroscience center, University of Helsinki, Finland.,Schools of Pharmacy and Medicine, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, USA
| | | | - Kui Qian
- Institute of Biotechnology, University of Helsinki, Finland
| | - Evgeny Kulesskiy
- Neuroscience center, University of Helsinki, Finland.,Institute for Molecular Medicine Finland, FIMM, University of Helsinki, Finland
| | | | | | | | - Emmanual Unni
- Department of Biochemistry, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Marvin Meistrich
- Department of Experimental Radiation Oncology, Division of Radiation Oncology, MD Anderson Cancer Center, Houston, Texas, USA
| | - Li Tian
- Neuroscience center, University of Helsinki, Finland.,Psychiatry Research Center, Beijing Hui Long Guan Hospital, Peking University, Beijing, China
| | - Petri Auvinen
- Institute of Biotechnology, University of Helsinki, Finland
| | | |
Collapse
|
2
|
Bevilacqua V, Pietroleonardo N, Giannino E, Stroppa F, Simone D, Pesole G, Picardi E. EasyCluster2: an improved tool for clustering and assembling long transcriptome reads. BMC Bioinformatics 2014; 15 Suppl 15:S7. [PMID: 25474441 PMCID: PMC4271567 DOI: 10.1186/1471-2105-15-s15-s7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Expressed sequences (e.g. ESTs) are a strong source of evidence to improve gene structures and predict reliable alternative splicing events. When a genome assembly is available, ESTs are suitable to generate gene-oriented clusters through the well-established EasyCluster software. Nowadays, EST-like sequences can be massively produced using Next Generation Sequencing (NGS) technologies. In order to handle genome-scale transcriptome data, we present here EasyCluster2, a reimplementation of EasyCluster able to speed up the creation of gene-oriented clusters and facilitate downstream analyses as the assembly of full-length transcripts and the detection of splicing isoforms. RESULTS EasyCluster2 has been developed to facilitate the genome-based clustering of EST-like sequences generated through the NGS 454 technology. Reads mapped onto the reference genome can be uploaded using the standard GFF3 file format. Alignment parsing is initially performed to produce a first collection of pseudo-clusters by grouping reads according to the overlap of their genomic coordinates on the same strand. EasyCluster2 then refines read grouping by including in each cluster only reads sharing at least one splice site and optionally performs a Smith-Waterman alignment in the region surrounding splice sites in order to correct for potential alignment errors. In addition, EasyCluster2 can include unspliced reads, which generally account for >50% of 454 datasets, and collapses overlapping clusters. Finally, EasyCluster2 can assemble full-length transcripts using a Directed-Acyclic-Graph-based strategy, simplifying the identification of alternative splicing isoforms, thanks also to the implementation of the widespread AStalavista methodology. Accuracy and performances have been tested on real as well as simulated datasets. CONCLUSIONS EasyCluster2 represents a unique tool to cluster and assemble transcriptome reads produced with 454 technology, as well as ESTs and full-length transcripts. The clustering procedure is enhanced with the employment of genome annotations and unspliced reads. Overall, EasyCluster2 is able to perform an effective detection of splicing isoforms, since it can refine exon-exon junctions and explore alternative splicing without known reference transcripts. Results in GFF3 format can be browsed in the UCSC Genome Browser. Therefore, EasyCluster2 is a powerful tool to generate reliable clusters for gene expression studies, facilitating the analysis also to researchers not skilled in bioinformatics.
Collapse
|
3
|
Sturgeon XH, Gardiner KJ. RCDA: a highly sensitive and specific alternatively spliced transcript assembly tool featuring upstream consecutive exon structures. Genomics 2012; 100:357-62. [PMID: 22971325 PMCID: PMC5470730 DOI: 10.1016/j.ygeno.2012.08.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2012] [Revised: 08/14/2012] [Accepted: 08/14/2012] [Indexed: 01/21/2023]
Abstract
When applied to complex transcript datasets, current tools for automated assembly of mRNA sequences require long run times and produce exponentially increasing numbers of splice variants. Here, we describe RCDA, a genome-based transcript assembly tool comprising RCluster, that recursively clusters transcripts, and DAssemble, that generates composite transcript sequences through path-finding using a directed acyclic graph. Each exon included in a final transcript is associated with an array of all upstream consecutive exon structures obtained from original transcripts. When a depth-first-search path reaches an exon, the path is retained only if it contains a structure from that exon's array. RCDA assemblies, therefore, include only those transcripts with experimentally supported exon patterns. When applied to >23,000 transcripts from human chromosome 21, using biologically reasonable filters, RCDA execution time was approximately 4h. RCDA outperformed ECgene in reconstructing RefSeq transcripts and in limiting the total number of transcripts and transcripts per gene.
Collapse
Affiliation(s)
- Xiaolu H Sturgeon
- Department of Pediatrics, Linda Crnic Institute for Down Syndrome, University of Colorado Denver, Mail Stop 8608, 12700 E. 19th Avenue, Aurora, CO 80045, USA.
| | | |
Collapse
|
4
|
Ng KH, Ho CK, Phon-Amnuaisuk S. A hybrid distance measure for clustering expressed sequence tags originating from the same gene family. PLoS One 2012; 7:e47216. [PMID: 23071763 PMCID: PMC3469558 DOI: 10.1371/journal.pone.0047216] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2012] [Accepted: 09/10/2012] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Clustering is a key step in the processing of Expressed Sequence Tags (ESTs). The primary goal of clustering is to put ESTs from the same transcript of a single gene into a unique cluster. Recent EST clustering algorithms mostly adopt the alignment-free distance measures, where they tend to yield acceptable clustering accuracies with reasonable computational time. Despite the fact that these clustering methods work satisfactorily on a majority of the EST datasets, they have a common weakness. They are prone to deliver unsatisfactory clustering results when dealing with ESTs from the genes derived from the same family. The root cause is the distance measures applied on them are not sensitive enough to separate these closely related genes. METHODOLOGY/PRINCIPAL FINDINGS We propose a hybrid distance measure that combines the global and local features extracted from ESTs, with the aim to address the clustering problem faced by ESTs derived from the same gene family. The clustering process is implemented using the DBSCAN algorithm. We test the hybrid distance measure on the ten EST datasets, and the clustering results are compared with the two alignment-free EST clustering tools, i.e. wcd and PEACE. The clustering results indicate that the proposed hybrid distance measure performs relatively better (in terms of clustering accuracy) than both EST clustering tools. CONCLUSIONS/SIGNIFICANCE The clustering results provide support for the effectiveness of the proposed hybrid distance measure in solving the clustering problem for ESTs that originate from the same gene family. The improvement of clustering accuracies on the experimental datasets has supported the claim that the sensitivity of the hybrid distance measure is sufficient to solve the clustering problem.
Collapse
Affiliation(s)
- Keng-Hoong Ng
- Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Malaysia.
| | | | | |
Collapse
|
5
|
Wei D, Jiang Q, Wei Y, Wang S. A novel hierarchical clustering algorithm for gene sequences. BMC Bioinformatics 2012; 13:174. [PMID: 22823405 PMCID: PMC3443659 DOI: 10.1186/1471-2105-13-174] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2011] [Accepted: 06/30/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Clustering DNA sequences into functional groups is an important problem in bioinformatics. We propose a new alignment-free algorithm, mBKM, based on a new distance measure, DMk, for clustering gene sequences. This method transforms DNA sequences into the feature vectors which contain the occurrence, location and order relation of k-tuples in DNA sequence. Afterwards, a hierarchical procedure is applied to clustering DNA sequences based on the feature vectors. RESULTS The proposed distance measure and clustering method are evaluated by clustering functionally related genes and by phylogenetic analysis. This method is also compared with BlastClust, CD-HIT-EST and some others. The experimental results show our method is effective in classifying DNA sequences with similar biological characteristics and in discovering the underlying relationship among the sequences. CONCLUSIONS We introduced a novel clustering algorithm which is based on a new sequence similarity measure. It is effective in classifying DNA sequences with similar biological characteristics and in discovering the relationship among the sequences.
Collapse
Affiliation(s)
- Dan Wei
- Cognitive Science Department & Fujian Key Laboratory of the Brain-like Intelligent Systems, Xiamen University, Xiamen, China
| | | | | | | |
Collapse
|
6
|
Hazelhurst S, Lipták Z. KABOOM! A new suffix array based algorithm for clustering expression data. ACTA ACUST UNITED AC 2011; 27:3348-55. [PMID: 21984769 DOI: 10.1093/bioinformatics/btr560] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Second-generation sequencing technology has reinvigorated research using expression data, and clustering such data remains a significant challenge, with much larger datasets and with different error profiles. Algorithms that rely on all-versus-all comparison of sequences are not practical for large datasets. RESULTS We introduce a new filter for string similarity which has the potential to eliminate the need for all-versus-all comparison in clustering of expression data and other similar tasks. Our filter is based on multiple long exact matches between the two strings, with the additional constraint that these matches must be sufficiently far apart. We give details of its efficient implementation using modified suffix arrays. We demonstrate its efficiency by presenting our new expression clustering tool, wcd-express, which uses this heuristic. We compare it to other current tools and show that it is very competitive both with respect to quality and run time. AVAILABILITY Source code and binaries available under GPL at http://code.google.com/p/wcdest. Runs on Linux and MacOS X. CONTACT scott.hazelhurst@wits.ac.za; zsuzsa@cebitec.uni-bielefeld.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Scott Hazelhurst
- Wits Bioinformatics, School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, Private Bag 3, 2050 Wits, South Africa.
| | | |
Collapse
|
7
|
Abstract
Motivation: Similarity clustering of next-generation sequences (NGS) is an important computational problem to study the population sizes of DNA/RNA molecules and to reduce the redundancies in NGS data. Currently, most sequence clustering algorithms are limited by their speed and scalability, and thus cannot handle data with tens of millions of reads. Results: Here, we introduce SEED—an efficient algorithm for clustering very large NGS sets. It joins sequences into clusters that can differ by up to three mismatches and three overhanging residues from their virtual center. It is based on a modified spaced seed method, called block spaced seeds. Its clustering component operates on the hash tables by first identifying virtual center sequences and then finding all their neighboring sequences that meet the similarity parameters. SEED can cluster 100 million short read sequences in <4 h with a linear time and memory performance. When using SEED as a preprocessing tool on genome/transcriptome assembly data, it was able to reduce the time and memory requirements of the Velvet/Oasis assembler for the datasets used in this study by 60–85% and 21–41%, respectively. In addition, the assemblies contained longer contigs than non-preprocessed data as indicated by 12–27% larger N50 values. Compared with other clustering tools, SEED showed the best performance in generating clusters of NGS data similar to true cluster results with a 2- to 10-fold better time performance. While most of SEED's utilities fall into the preprocessing area of NGS data, our tests also demonstrate its efficiency as stand-alone tool for discovering clusters of small RNA sequences in NGS data from unsequenced organisms. Availability: The SEED software can be downloaded for free from this site: http://manuals.bioinformatics.ucr.edu/home/seed. Contact:thomas.girke@ucr.edu Supplementary information:Supplementary data are available at Bioinformatics online
Collapse
Affiliation(s)
- Ergude Bao
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA
| | | | | | | |
Collapse
|
8
|
Rao DM, Moler JC, Ozden M, Zhang Y, Liang C, Karro JE. PEACE: Parallel Environment for Assembly and Clustering of Gene Expression. Nucleic Acids Res 2010; 38:W737-42. [PMID: 20522511 PMCID: PMC2896108 DOI: 10.1093/nar/gkq470] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
We present PEACE, a stand-alone tool for high-throughput ab initio clustering of transcript fragment sequences produced by Next Generation or Sanger Sequencing technologies. It is freely available from www.peace-tools.org. Installed and managed through a downloadable user-friendly graphical user interface (GUI), PEACE can process large data sets of transcript fragments of length 50 bases or greater, grouping the fragments by gene associations with a sensitivity comparable to leading clustering tools. Once clustered, the user can employ the GUI's analysis functions, facilitating the easy collection of statistics and allowing them to single out specific clusters for more comprehensive study or assembly. Using a novel minimum spanning tree-based clustering method, PEACE is the equal of leading tools in the literature, with an interface making it accessible to any user. It produces results of quality virtually identical to those of the WCD tool when applied to Sanger sequences, significantly improved results over WCD and TGICL when applied to the products of Next Generation Sequencing Technology and significantly improved results over Cap3 in both cases. In short, PEACE provides an intuitive GUI and a feature-rich, parallel clustering engine that proves to be a valuable addition to the leading cDNA clustering tools.
Collapse
Affiliation(s)
- D M Rao
- Department of Computer Science and Software Engineering, Miami University, Oxford, Ohio 45056, USA
| | | | | | | | | | | |
Collapse
|
9
|
D'Elia D, Gisel A, Eriksson NE, Kossida S, Mattila K, Klucar L, Bongcam-Rudloff E. The 20th anniversary of EMBnet: 20 years of bioinformatics for the Life Sciences community. BMC Bioinformatics 2009; 10 Suppl 6:S1. [PMID: 19534734 PMCID: PMC2697632 DOI: 10.1186/1471-2105-10-s6-s1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
The EMBnet Conference 2008, focusing on 'Leading Applications and Technologies in Bioinformatics', was organized by the European Molecular Biology network (EMBnet) to celebrate its 20th anniversary. Since its foundation in 1988, EMBnet has been working to promote collaborative development of bioinformatics services and tools to serve the European community of molecular biology laboratories. This conference was the first meeting organized by the network that was open to the international scientific community outside EMBnet. The conference covered a broad range of research topics in bioinformatics with a main focus on new achievements and trends in emerging technologies supporting genomics, transcriptomics and proteomics analyses such as high-throughput sequencing and data managing, text and data-mining, ontologies and Grid technologies. Papers selected for publication, in this supplement to BMC Bioinformatics, cover a broad range of the topics treated, providing also an overview of the main bioinformatics research fields that the EMBnet community is involved in.
Collapse
Affiliation(s)
- Domenica D'Elia
- Institute for Biomedical Technologies, CNR, Via Amendola 122/D, 70126 Bari, Italy
| | - Andreas Gisel
- Institute for Biomedical Technologies, CNR, Via Amendola 122/D, 70126 Bari, Italy
| | - Nils-Einar Eriksson
- Uppsala Biomedical Centre (BMC), Computing Department, University of Uppsala, Box 570 SE-751 23 Uppsala, Sweden
| | - Sophia Kossida
- Bioinformatics & Medical Informatics Team, Biomedical Research Foundation of the Academy of Athens, 11527 Athens, Greece
| | - Kimmo Mattila
- CSC – IT Center for Science Ltd., Keilaranta 14, 02100 Espoo, Finland
| | - Lubos Klucar
- Institute of Molecular Biology, Slovak Academy of Sciences, Dubravska cesta 21, 84551 Bratislava, Slovakia
| | - Erik Bongcam-Rudloff
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, 75024 Uppsala, Sweden
| |
Collapse
|