1
|
Boccaletto P, Machnicka MA, Purta E, Piatkowski P, Baginski B, Wirecki TK, de Crécy-Lagard V, Ross R, Limbach PA, Kotter A, Helm M, Bujnicki JM. MODOMICS: a database of RNA modification pathways. 2017 update. Nucleic Acids Res 2019; 46:D303-D307. [PMID: 29106616 PMCID: PMC5753262 DOI: 10.1093/nar/gkx1030] [Citation(s) in RCA: 1303] [Impact Index Per Article: 260.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 10/18/2017] [Indexed: 12/13/2022] Open
Abstract
MODOMICS is a database of RNA modifications that provides comprehensive information concerning the chemical structures of modified ribonucleosides, their biosynthetic pathways, the location of modified residues in RNA sequences, and RNA-modifying enzymes. In the current database version, we included the following new features and data: extended mass spectrometry and liquid chromatography data for modified nucleosides; links between human tRNA sequences and MINTbase - a framework for the interactive exploration of mitochondrial and nuclear tRNA fragments; new, machine-friendly system of unified abbreviations for modified nucleoside names; sets of modified tRNA sequences for two bacterial species, updated collection of mammalian tRNA modifications, 19 newly identified modified ribonucleosides and 66 functionally characterized proteins involved in RNA modification. Data from MODOMICS have been linked to the RNAcentral database of RNA sequences. MODOMICS is available at http://modomics.genesilico.pl.
Collapse
Affiliation(s)
- Pietro Boccaletto
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland
| | - Magdalena A Machnicka
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland.,Institute of Informatics, University of Warsaw, Banacha 2, PL-02-097 Warsaw, Poland
| | - Elzbieta Purta
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland
| | - Pawel Piatkowski
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland
| | - Blazej Baginski
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland
| | - Tomasz K Wirecki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland
| | | | - Robert Ross
- Department of Chemistry, Rieveschl Laboratories for Mass Spectrometry, University of Cincinnati, Cincinnati, OH 45221, USA
| | - Patrick A Limbach
- Department of Chemistry, Rieveschl Laboratories for Mass Spectrometry, University of Cincinnati, Cincinnati, OH 45221, USA
| | - Annika Kotter
- Institut für Pharmazie und Biochemie, Johannes Gutenberg-Universität, Staudinger Weg 5, D-55128 Mainz, Germany
| | - Mark Helm
- Institut für Pharmazie und Biochemie, Johannes Gutenberg-Universität, Staudinger Weg 5, D-55128 Mainz, Germany
| | - Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland.,Faculty of Biology, Adam Mickiewicz University, ul. Umultowska 89, PL-61-614 Poznan, Poland
| |
Collapse
|
2
|
Abstract
Background:
One of the pivotal challenges in nowadays genomic research domain is the fast
processing of voluminous data such as the ones engendered by high-throughput Next-Generation
Sequencing technologies. On the other hand, BLAST (Basic Local Alignment Search Tool), a longestablished
and renowned tool in Bioinformatics, has shown to be incredibly slow in this regard.
Objective:
To improve the performance of BLAST in the processing of voluminous data, we have
applied a novel memory-aware technique to BLAST for faster parallel processing of voluminous data.
Method:
We have used a master-worker model for the processing of voluminous data alongside a
memory-aware technique in which the master partitions the whole data in equal chunks, one chunk for
each worker, and consequently each worker further splits and formats its allocated data chunk according
to the size of its memory. Each worker searches every split data one-by-one through a list of queries.
Results:
We have chosen a list of queries with different lengths to run insensitive searches in a huge
database called UniProtKB/TrEMBL. Our experiments show 20 percent improvement in performance
when workers used our proposed memory-aware technique compared to when they were not memory
aware. Comparatively, experiments show even higher performance improvement, approximately 50
percent, when we applied our memory-aware technique to mpiBLAST.
Conclusion:
We have shown that memory-awareness in formatting bulky database, when running
BLAST, can improve performance significantly, while preventing unexpected crashes in low-memory
environments. Even though distributed computing attempts to mitigate search time by partitioning and
distributing database portions, our memory-aware technique alleviates negative effects of page-faults on
performance.
Collapse
Affiliation(s)
- Majid Hajibaba
- Department of Electrical Engineering and Information Technology, Iranian Research Organization for Science and Technology, Tehran, Iran
| | - Mohsen Sharifi
- School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
| | - Saeid Gorgin
- Department of Electrical Engineering and Information Technology, Iranian Research Organization for Science and Technology, Tehran, Iran
| |
Collapse
|
3
|
Mondal S, Maji RK, Ghosh Z, Khatua S. ParStream-seq: An improved method of handling next generation sequence data. Genomics 2018; 111:1641-1650. [PMID: 30448525 DOI: 10.1016/j.ygeno.2018.11.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Revised: 11/11/2018] [Accepted: 11/12/2018] [Indexed: 10/27/2022]
Abstract
The exponential growth of next generation sequencing (NGS) data has put forward the challenge for its storage as well as its efficient and faster analysis. Storing the entire amount of data for a particular experiment and its alignment to the reference genome is an essential step for any quantitative analysis of NGS data. Here, we introduce streaming access technique 'ParStream-seq' that splits the bulk sequence data, accessed from a remote repository into short manageable packets followed by executing their alignment process in parallel in each of the compute core. The optimal packet size with fixed number of reads is determined in the stream that maximizes system utilization. Result shows a reduction in the execution time and improvement in the memory footprint. Overall, this streaming access technique provides means to overcome the hurdle of storing the entire volume of sequence data corresponding to a particular experiment, prior to its analysis.
Collapse
Affiliation(s)
- Sudip Mondal
- Department of Computer Science and Engineering, University of Calcutta, Kolkata, India
| | | | - Zhumur Ghosh
- Bioinformatics Center, Bose Institute, Kolkata, India
| | - Sunirmal Khatua
- Department of Computer Science and Engineering, University of Calcutta, Kolkata, India.
| |
Collapse
|
4
|
Gálvez S, Ferusic A, Esteban FJ, Hernández P, Caballero JA, Dorado G. Speeding-up Bioinformatics Algorithms with Heterogeneous Architectures: Highly Heterogeneous Smith-Waterman (HHeterSW). J Comput Biol 2016; 23:801-9. [PMID: 27104636 DOI: 10.1089/cmb.2015.0237] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The Smith-Waterman algorithm has a great sensitivity when used for biological sequence-database searches, but at the expense of high computing-power requirements. To overcome this problem, there are implementations in literature that exploit the different hardware-architectures available in a standard PC, such as GPU, CPU, and coprocessors. We introduce an application that splits the original database-search problem into smaller parts, resolves each of them by executing the most efficient implementations of the Smith-Waterman algorithms in different hardware architectures, and finally unifies the generated results. Using non-overlapping hardware allows simultaneous execution, and up to 2.58-fold performance gain, when compared with any other algorithm to search sequence databases. Even the performance of the popular BLAST heuristic is exceeded in 78% of the tests. The application has been tested with standard hardware: Intel i7-4820K CPU, Intel Xeon Phi 31S1P coprocessors, and nVidia GeForce GTX 960 graphics cards. An important increase in performance has been obtained in a wide range of situations, effectively exploiting the available hardware.
Collapse
Affiliation(s)
- Sergio Gálvez
- 1 Dep. Lenguajes y Ciencias de la Computación, ETSI Informática, Campus de Teatinos, Universidad de Málaga , Málaga, Spain
| | - Adis Ferusic
- 1 Dep. Lenguajes y Ciencias de la Computación, ETSI Informática, Campus de Teatinos, Universidad de Málaga , Málaga, Spain
| | - Francisco J Esteban
- 2 Servicio de Informática, Campus Rabanales C6-1-E17, Campus de Excelencia Internacional Agroalimentario (ceiA3), Universidad de Córdoba , 14071 Córdoba, Spain
| | - Pilar Hernández
- 3 Instituto de Agricultura Sostenible (IAS-CSIC) , Alameda del Obispo s/n, Córdoba, Spain
| | - Juan A Caballero
- 4 Dep. Estadística, Campus Rabanales C6-1-E17, Campus de Excelencia Internacional Agroalimentario (ceiA3), Universidad de Córdoba , 14071 Córdoba, Spain
| | - Gabriel Dorado
- 5 Dep. Bioquímica y Biología Molecular, Campus Rabanales C6-1-E17, Campus de Excelencia Internacional Agroalimentario (ceiA3), Universidad de Córdoba , 14071 Córdoba, Spain
| |
Collapse
|
5
|
Gonen S, Bishop SC, Houston RD. Exploring the utility of cross-laboratory RAD-sequencing datasets for phylogenetic analysis. BMC Res Notes 2015; 8:299. [PMID: 26152111 PMCID: PMC4495686 DOI: 10.1186/s13104-015-1261-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2015] [Accepted: 06/25/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Restriction site-Associated DNA sequencing (RAD-Seq) is widely applied to generate genome-wide sequence and genetic marker datasets. RAD-Seq has been extensively utilised, both at the population level and across species, for example in the construction of phylogenetic trees. However, the consistency of RAD-Seq data generated in different laboratories, and the potential use of cross-species orthologous RAD loci in the estimation of genetic relationships, have not been widely investigated. This study describes the use of SbfI RAD-Seq data for the estimation of evolutionary relationships amongst ten teleost fish species, using previously established phylogeny as a benchmark. RESULTS The number of orthologous SbfI RAD loci identified decreased with increasing evolutionary distance between the species, with several thousand loci conserved across five salmonid species (divergence ~50 MY), and several hundred conserved across the more distantly related teleost species (divergence ~100-360 MY). The majority (>70%) of loci identified between the more distantly related species were genic in origin, suggesting that the bias of SbfI towards genic regions is useful for identifying distant orthologs. Interspecific single nucleotide variants at each orthologous RAD locus were identified. Evolutionary relationships estimated using concatenated sequences of interspecific variants were congruent with previously published phylogenies, even for distantly (divergence up to ~360 MY) related species. CONCLUSION Overall, this study has demonstrated that orthologous SbfI RAD loci can be identified across closely and distantly related species. This has positive implications for the repeatability of SbfI RAD-Seq and its potential to address research questions beyond the scope of the original studies. Furthermore, the concordance in tree topologies and relationships estimated in this study with published teleost phylogenies suggests that similar meta-datasets could be utilised in the prediction of evolutionary relationships across populations and species with readily available RAD-Seq datasets, but for which relationships remain uncharacterised.
Collapse
Affiliation(s)
- Serap Gonen
- The Roslin Institute, University of Edinburgh, Midlothian, EH25 9RG, Scotland, UK.
| | - Stephen C Bishop
- The Roslin Institute, University of Edinburgh, Midlothian, EH25 9RG, Scotland, UK.
| | - Ross D Houston
- The Roslin Institute, University of Edinburgh, Midlothian, EH25 9RG, Scotland, UK.
| |
Collapse
|
6
|
Lee HP, Sheu TF. An algorithm of discovering signatures from DNA databases on a computer cluster. BMC Bioinformatics 2014; 15:339. [PMID: 25282047 PMCID: PMC4286918 DOI: 10.1186/1471-2105-15-339] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Accepted: 09/29/2014] [Indexed: 11/18/2022] Open
Abstract
Background Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved. Results In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms. Conclusions The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available athttp://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm.
Collapse
Affiliation(s)
| | - Tzu-Fang Sheu
- Department of Computer Science and Communication Engineering, Providence University, 200, Sec, 7, Taiwan Boulevard, 43301 Shalu Dist,, Taichung, Taiwan.
| |
Collapse
|
7
|
Kuhn M, Hyman AA, Beyer A. Coiled-coil proteins facilitated the functional expansion of the centrosome. PLoS Comput Biol 2014; 10:e1003657. [PMID: 24901223 PMCID: PMC4046923 DOI: 10.1371/journal.pcbi.1003657] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2014] [Accepted: 04/15/2014] [Indexed: 12/16/2022] Open
Abstract
Repurposing existing proteins for new cellular functions is recognized as a main mechanism of evolutionary innovation, but its role in organelle evolution is unclear. Here, we explore the mechanisms that led to the evolution of the centrosome, an ancestral eukaryotic organelle that expanded its functional repertoire through the course of evolution. We developed a refined sequence alignment technique that is more sensitive to coiled coil proteins, which are abundant in the centrosome. For proteins with high coiled-coil content, our algorithm identified 17% more reciprocal best hits than BLAST. Analyzing 108 eukaryotic genomes, we traced the evolutionary history of centrosome proteins. In order to assess how these proteins formed the centrosome and adopted new functions, we computationally emulated evolution by iteratively removing the most recently evolved proteins from the centrosomal protein interaction network. Coiled-coil proteins that first appeared in the animal–fungi ancestor act as scaffolds and recruit ancestral eukaryotic proteins such as kinases and phosphatases to the centrosome. This process created a signaling hub that is crucial for multicellular development. Our results demonstrate how ancient proteins can be co-opted to different cellular localizations, thereby becoming involved in novel functions. The centrosome helps cells to divide, and is important for the development of animals. It has its evolutionary origins in the basal body, which was present in the last common ancestor of all eukaryotes. Here, we study how the evolution of novel proteins helped the formation of the centrosome. Coiled-coil proteins are important for the function of the centrosome. But, they have repeating patterns that can confuse existing methods for finding related proteins. We refined these methods by adjusting for the special properties of the coiled-coil regions. This enabled us to find more distant relatives of centrosomal proteins. We then tested how novel proteins affect the protein interaction network of the centrosome. We did this by removing the most novel proteins step by step. At each stage, we observed how the remaining proteins are connected to the centriole, the core of the centrosome. We found that coiled-coil proteins that first occurred in the ancestor of fungi and animals help to recruit older proteins. By being recruited to the centrosome, these older proteins acquired new functions. We thus now have a clearer picture of how the centrosome became such an important part of animal cells.
Collapse
Affiliation(s)
- Michael Kuhn
- Biotechnology Center, TU Dresden, Dresden, Germany
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | - Anthony A. Hyman
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- * E-mail: (AAH); (AB)
| | - Andreas Beyer
- Biotechnology Center, TU Dresden, Dresden, Germany
- University of Cologne, Cologne, Germany
- * E-mail: (AAH); (AB)
| |
Collapse
|
8
|
Computational modeling of protein-RNA complex structures. Methods 2013; 65:310-9. [PMID: 24083976 DOI: 10.1016/j.ymeth.2013.09.014] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Revised: 09/17/2013] [Accepted: 09/19/2013] [Indexed: 12/26/2022] Open
Abstract
Protein-RNA interactions play fundamental roles in many biological processes, such as regulation of gene expression, RNA splicing, and protein synthesis. The understanding of these processes improves as new structures of protein-RNA complexes are solved and the molecular details of interactions analyzed. However, experimental determination of protein-RNA complex structures by high-resolution methods is tedious and difficult. Therefore, studies on protein-RNA recognition and complex formation present major technical challenges for macromolecular structural biology. Alternatively, protein-RNA interactions can be predicted by computational methods. Although less accurate than experimental measurements, theoretical models of macromolecular structures can be sufficiently accurate to prompt functional hypotheses and guide e.g. identification of important amino acid or nucleotide residues. In this article we present an overview of strategies and methods for computational modeling of protein-RNA complexes, including software developed in our laboratory, and illustrate it with practical examples of structural predictions.
Collapse
|
9
|
Kaznadzey A, Alexandrova N, Novichkov V, Kaznadzey D. PSimScan: algorithm and utility for fast protein similarity search. PLoS One 2013; 8:e58505. [PMID: 23505522 PMCID: PMC3591303 DOI: 10.1371/journal.pone.0058505] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2012] [Accepted: 02/07/2013] [Indexed: 01/19/2023] Open
Abstract
In the era of metagenomics and diagnostics sequencing, the importance of protein comparison methods of boosted performance cannot be overstated. Here we present PSimScan (Protein Similarity Scanner), a flexible open source protein similarity search tool which provides a significant gain in speed compared to BLASTP at the price of controlled sensitivity loss. The PSimScan algorithm introduces a number of novel performance optimization methods that can be further used by the community to improve the speed and lower hardware requirements of bioinformatics software. The optimization starts at the lookup table construction, then the initial lookup table–based hits are passed through a pipeline of filtering and aggregation routines of increasing computational complexity. The first step in this pipeline is a novel algorithm that builds and selects ‘similarity zones’ aggregated from neighboring matches on small arrays of adjacent diagonals. PSimScan performs 5 to 100 times faster than the standard NCBI BLASTP, depending on chosen parameters, and runs on commodity hardware. Its sensitivity and selectivity at the slowest settings are comparable to the NCBI BLASTP’s and decrease with the increase of speed, yet stay at the levels reasonable for many tasks. PSimScan is most advantageous when used on large collections of query sequences. Comparing the entire proteome of Streptocuccus pneumoniae (2,042 proteins) to the NCBI’s non-redundant protein database of 16,971,855 records takes 6.5 hours on a moderately powerful PC, while the same task with the NCBI BLASTP takes over 66 hours. We describe innovations in the PSimScan algorithm in considerable detail to encourage bioinformaticians to improve on the tool and to use the innovations in their own software development.
Collapse
Affiliation(s)
- Anna Kaznadzey
- Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
| | - Natalia Alexandrova
- Genome Designs, Inc., Walnut Creek, California, United States of America
- * E-mail:
| | | | - Denis Kaznadzey
- DOE Joint Genome Institute, Walnut Creek, California, United States of America
| |
Collapse
|
10
|
Abstract
Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the "multiple segment Viterbi" (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call "sparse rescaling". These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches.
Collapse
Affiliation(s)
- Sean R Eddy
- HHMI Janelia Farm Research Campus, Ashburn, Virginia, United States of America.
| |
Collapse
|
11
|
Rother M, Milanowska K, Puton T, Jeleniewicz J, Rother K, Bujnicki JM. ModeRNA server: an online tool for modeling RNA 3D structures. Bioinformatics 2011; 27:2441-2. [DOI: 10.1093/bioinformatics/btr400] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
12
|
Thomassen GOS, Røsok Ø, Rognes T. Computational prediction of microRNAs encoded in viral and other genomes. J Biomed Biotechnol 2010; 2006:95270. [PMID: 17057374 PMCID: PMC1559940 DOI: 10.1155/jbb/2006/95270] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
We present an overview of selected computational methods for microRNA prediction. It is especially aimed at viral miRNA detection. As the number of microRNAs increases and the range of genomes encoding miRNAs expands, it seems that these small regulators have a more important role than has been previously thought. Most microRNAs have been detected by cloning and Northern blotting, but experimental methods are biased towards abundant microRNAs as well as being time-consuming. Computational detection methods must therefore be refined to serve as a faster, better, and more affordable method for microRNA detection. We also present data from a small study investigating the problems of computational miRNA prediction. Our findings suggest that the prediction of microRNA precursor candidates is fairly easy, while excluding false positives as well as exact prediction of the mature microRNA is hard. Finally, we discuss possible improvements to computational microRNA detection.
Collapse
Affiliation(s)
- Gard O. S. Thomassen
- Centre for Molecular Biology and Neuroscience (CMBN),
Institute of Medical Microbiology, Rikshospitalet-Radiumhospitalet Medical Centre, 0027 Oslo, Norway
| | - Øystein Røsok
- Department of Immunology, Institute for Cancer Research, Rikshospitalet-Radiumhospitalet Medical Centre, 0310 Oslo, Norway
| | - Torbjørn Rognes
- Centre for Molecular Biology and Neuroscience (CMBN),
Institute of Medical Microbiology, Rikshospitalet-Radiumhospitalet Medical Centre, 0027 Oslo, Norway
- Department of Informatics, University of Oslo, PO Box
1080 Blindern, 0316 Oslo, Norway
- *Torbjørn Rognes:
| |
Collapse
|
13
|
Gálvez S, Díaz D, Hernández P, Esteban FJ, Caballero JA, Dorado G. Next-generation bioinformatics: using many-core processor architecture to develop a web service for sequence alignment. Bioinformatics 2010; 26:683-6. [DOI: 10.1093/bioinformatics/btq017] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Sergio Gálvez
- Department Lenguajes y Ciencias de la Computación, Universidad de Málaga 29071 Málaga, 2 Instituto de Agricultura Sostenible (IAS-CSIC), Alameda del Obispo, s/n, 14080 Córdoba, 3 Computer Services, 4 Department Estadιstica and 5 Department Bioquímica y Biología Molecular, Universidad de Córdoba 14071 Córdoba, Spain
| | - David Díaz
- Department Lenguajes y Ciencias de la Computación, Universidad de Málaga 29071 Málaga, 2 Instituto de Agricultura Sostenible (IAS-CSIC), Alameda del Obispo, s/n, 14080 Córdoba, 3 Computer Services, 4 Department Estadιstica and 5 Department Bioquímica y Biología Molecular, Universidad de Córdoba 14071 Córdoba, Spain
| | - Pilar Hernández
- Department Lenguajes y Ciencias de la Computación, Universidad de Málaga 29071 Málaga, 2 Instituto de Agricultura Sostenible (IAS-CSIC), Alameda del Obispo, s/n, 14080 Córdoba, 3 Computer Services, 4 Department Estadιstica and 5 Department Bioquímica y Biología Molecular, Universidad de Córdoba 14071 Córdoba, Spain
| | - Francisco J. Esteban
- Department Lenguajes y Ciencias de la Computación, Universidad de Málaga 29071 Málaga, 2 Instituto de Agricultura Sostenible (IAS-CSIC), Alameda del Obispo, s/n, 14080 Córdoba, 3 Computer Services, 4 Department Estadιstica and 5 Department Bioquímica y Biología Molecular, Universidad de Córdoba 14071 Córdoba, Spain
| | - Juan A. Caballero
- Department Lenguajes y Ciencias de la Computación, Universidad de Málaga 29071 Málaga, 2 Instituto de Agricultura Sostenible (IAS-CSIC), Alameda del Obispo, s/n, 14080 Córdoba, 3 Computer Services, 4 Department Estadιstica and 5 Department Bioquímica y Biología Molecular, Universidad de Córdoba 14071 Córdoba, Spain
| | - Gabriel Dorado
- Department Lenguajes y Ciencias de la Computación, Universidad de Málaga 29071 Málaga, 2 Instituto de Agricultura Sostenible (IAS-CSIC), Alameda del Obispo, s/n, 14080 Córdoba, 3 Computer Services, 4 Department Estadιstica and 5 Department Bioquímica y Biología Molecular, Universidad de Córdoba 14071 Córdoba, Spain
| |
Collapse
|
14
|
Affiliation(s)
- Joel T Dudley
- Program in Biomedical Informatics, Stanford University School of Medicine, Stanford, California, USA.
| | | |
Collapse
|
15
|
|
16
|
Bandyopadhyay S, Mitra R. A parallel pairwise local sequence alignment algorithm. IEEE Trans Nanobioscience 2009; 8:139-46. [PMID: 19366648 DOI: 10.1109/tnb.2009.2019642] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Researchers are compelled to use heuristic-based pairwise sequence alignment tools instead of Smith-Waterman (SW) algorithm due to space and time constraints, thereby losing significant amount of sensitivity. Parallelization is a possible solution, though, till date, the parallelization is restricted to database searching through database fragmentation. In this paper, the power of a cluster computer is utilized for developing a parallel algorithm, RPAlign, involving, first, the detection of regions that are potentially alignable, followed by their actual alignment. RPAlign is found to reduce the timing requirement by a factor of upto 9 and 99 when used with the basic local alignment search tool (BLAST) and SW, respectively, while keeping the sensitivity similar to the corresponding method. For distantly related sequences, which remain undetected by BLAST, RPAlign with SW can be used. Again, for megabase-scale sequences, when SW becomes computationally intractable, the proposed method can still align them reasonably fast with high sensitivity.
Collapse
|
17
|
Fokkens L, Snel B. Cohesive versus flexible evolution of functional modules in eukaryotes. PLoS Comput Biol 2009; 5:e1000276. [PMID: 19180181 PMCID: PMC2615111 DOI: 10.1371/journal.pcbi.1000276] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2008] [Accepted: 12/16/2008] [Indexed: 12/02/2022] Open
Abstract
Although functionally related proteins can be reliably predicted from phylogenetic profiles, many functional modules do not seem to evolve cohesively according to case studies and systematic analyses in prokaryotes. In this study we quantify the extent of evolutionary cohesiveness of functional modules in eukaryotes and probe the biological and methodological factors influencing our estimates. We have collected various datasets of protein complexes and pathways in Saccheromyces cerevisiae. We define orthologous groups on 34 eukaryotic genomes and measure the extent of cohesive evolution of sets of orthologous groups of which members constitute a known complex or pathway. Within this framework it appears that most functional modules evolve flexibly rather than cohesively. Even after correcting for uncertain module definitions and potentially problematic orthologous groups, only 46% of pathways and complexes evolve more cohesively than random modules. This flexibility seems partly coupled to the nature of the functional module because biochemical pathways are generally more cohesively evolving than complexes. Components of a protein complex or a metabolic pathway strongly cooperate to perform a specific function. Because of this functional interdependence, proteins that form a complex or pathway are expected to be present and absent together in different species. Phylogenetic profiling methods, in which proteins with similar presence and absence patterns are inferred to be functionally linked, are based on this assumption. In this report, we quantify to what extent proteins that together constitute a complex or pathway (a functional module) in yeast are present and absent together (evolve cohesively) in other eukaryotic species. We find that more than half of all complexes and pathways are only partially present in a number of species. It appears that evolution of functional modules is very flexible; components are not indispensable; they can be replaced or reused in a different functional context. This places a limit on how well phylogenetic profiling methods can detect functionally related proteins. Functional modules that evolve cohesively are typically involved in biological processes such as translation and amino acid metabolism.
Collapse
Affiliation(s)
- Like Fokkens
- Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, The Netherlands.
| | | |
Collapse
|
18
|
Weel-Sneve R, Bjørås M, Kristiansen KI. Overexpression of the LexA-regulated tisAB RNA in E. coli inhibits SOS functions; implications for regulation of the SOS response. Nucleic Acids Res 2008; 36:6249-59. [PMID: 18832374 PMCID: PMC2577331 DOI: 10.1093/nar/gkn633] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The DNA damage induced SOS response in Escherichia coli is initiated by cleavage of the LexA repressor through activation of RecA. Here we demonstrate that overexpression of the SOS-inducible tisAB gene inhibits several SOS functions in vivo. Wild-type E. coli overexpressing tisAB showed the same UV sensitivity as a lexA mutant carrying a noncleavable version of the LexA protein unable to induce the SOS response. Immunoblotting confirmed that tisAB overexpression leads to higher levels of LexA repressor and northern experiments demonstrated delayed and reduced induction of recA mRNA. In addition, induction of prophage λ and UV-induced filamentation was inhibited by tisAB overexpression. The tisAB gene contains antisense sequences to the SOS-inducible dinD gene (16 nt) and the uxaA gene (20 nt), the latter encoding a dehydratase essential for galacturonate catabolism. Cleavage of uxaA mRNA at the antisense sequence was dependent on tisAB RNA expression. We showed that overexpression of tisAB is less able to confer UV sensitivity to the uxaA dinD double mutant as compared to wild-type, indicating that the dinD and uxaA transcripts modulate the anti-SOS response of tisAB. These data shed new light on the complexity of SOS regulation in which the uxaA gene could link sugar metabolism to the SOS response via antisense regulation of the tisAB gene.
Collapse
Affiliation(s)
- Ragnhild Weel-Sneve
- Centre for Molecular Biology and Neuroscience, Institute of Medical Microbiology, Rikshospitalet University Hospital, NO-0027 Oslo, Norway
| | | | | |
Collapse
|
19
|
Lagesen K, Hallin P, Rødland EA, Staerfeldt HH, Rognes T, Ussery DW. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 2007; 35:3100-8. [PMID: 17452365 PMCID: PMC1888812 DOI: 10.1093/nar/gkm160] [Citation(s) in RCA: 4655] [Impact Index Per Article: 273.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The publication of a complete genome sequence is usually accompanied by annotations of its genes. In contrast to protein coding genes, genes for ribosomal RNA (rRNA) are often poorly or inconsistently annotated. This makes comparative studies based on rRNA genes difficult. We have therefore created computational predictors for the major rRNA species from all kingdoms of life and compiled them into a program called RNAmmer. The program uses hidden Markov models trained on data from the 5S ribosomal RNA database and the European ribosomal RNA database project. A pre-screening step makes the method fast with little loss of sensitivity, enabling the analysis of a complete bacterial genome in less than a minute. Results from running RNAmmer on a large set of genomes indicate that the location of rRNAs can be predicted with a very high level of accuracy. Novel, unannotated rRNAs are also predicted in many genomes. The software as well as the genome analysis results are available at the CBS web server.
Collapse
Affiliation(s)
- Karin Lagesen
- Centre for Molecular Biology and Neuroscience and Institute of Medical Microbiology, University of Oslo, NO-0027 Oslo, Norway.
| | | | | | | | | | | |
Collapse
|
20
|
Nakken S, Alseth I, Rognes T. Computational prediction of the effects of non-synonymous single nucleotide polymorphisms in human DNA repair genes. Neuroscience 2006; 145:1273-9. [PMID: 17055652 DOI: 10.1016/j.neuroscience.2006.09.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2006] [Revised: 09/08/2006] [Accepted: 09/12/2006] [Indexed: 10/24/2022]
Abstract
Non-synonymous single nucleotide polymorphisms (nsSNPs) represent common genetic variation that alters encoded amino acids in proteins. All nsSNPs may potentially affect the structure or function of expressed proteins and could therefore have an impact on complex diseases. In an effort to evaluate the phenotypic effect of all known nsSNPs in human DNA repair genes, we have characterized each polymorphism in terms of different functional properties. The properties are computed based on amino acid characteristics (e.g. residue volume change); position-specific phylogenetic information from multiple sequence alignments and from prediction programs such as SIFT (Sorting Intolerant From Tolerant) and PolyPhen (Polymorphism Phenotyping). We provide a comprehensive, updated list of all validated nsSNPs from dbSNP (public database of human single nucleotide polymorphisms at National Center for Biotechnology Information, USA) located in human DNA repair genes. The list includes repair enzymes, genes associated with response to DNA damage as well as genes implicated with genetic instability or sensitivity to DNA damaging agents. Out of a total of 152 genes involved in DNA repair, 95 had validated nsSNPs in them. The fraction of nsSNPs that had high probability of being functionally significant was predicted to be 29.6% and 30.9%, by SIFT and PolyPhen respectively. The resulting list of annotated nsSNPs is available online (http://dna.uio.no/repairSNP), and is an ongoing project that will continue assessing the function of coding SNPs in human DNA repair genes.
Collapse
Affiliation(s)
- S Nakken
- Centre for Molecular Biology and Neuroscience, Institute of Medical Microbiology, Rikshospitalet-Radiumhospitalet Medical Centre, NO-0027 Oslo, Norway
| | | | | |
Collapse
|
21
|
Hulsen T, de Vlieg J, Leunissen JAM, Groenen PMA. Testing statistical significance scores of sequence comparison methods with structure similarity. BMC Bioinformatics 2006; 7:444. [PMID: 17038163 PMCID: PMC1618413 DOI: 10.1186/1471-2105-7-444] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2006] [Accepted: 10/12/2006] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. RESULTS All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. CONCLUSION The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons.
Collapse
Affiliation(s)
- Tim Hulsen
- Centre for Molecular and Biomolecular Informatics (CMBI), Nijmegen Centre for Molecular Life Sciences (NCMLS), Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | - Jacob de Vlieg
- Centre for Molecular and Biomolecular Informatics (CMBI), Nijmegen Centre for Molecular Life Sciences (NCMLS), Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
- Molecular Design and Informatics, NV Organon, Oss, The Netherlands
| | - Jack AM Leunissen
- Laboratory of Bioinformatics, Wageningen University and Research Centre, Wageningen, The Netherlands
| | - Peter MA Groenen
- Molecular Design and Informatics, NV Organon, Oss, The Netherlands
| |
Collapse
|
22
|
Piehler AP, Wenzel JJ, Olstad OK, Haug KBF, Kierulf P, Kaminski WE. The human ortholog of the rodent testis-specific ABC transporter Abca17 is a ubiquitously expressed pseudogene (ABCA17P) and shares a common 5' end with ABCA3. BMC Mol Biol 2006; 7:28. [PMID: 16968533 PMCID: PMC1579226 DOI: 10.1186/1471-2199-7-28] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2006] [Accepted: 09/12/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND During the past years, we and others discovered a series of human ATP-binding cassette (ABC) transporters, now referred to as ABC A-subfamily transporters. Recently, a novel testis-specific ABC A transporter, Abca17, has been cloned in rodent. In this study, we report the identification and characterization of the human ortholog of rodent Abca17. RESULTS The novel human ABC A-transporter gene on chromosome 16p13.3 is ubiquitously expressed with highest expression in glandular tissues and the heart. The new ABC transporter gene exhibits striking nucleotide sequence homology with the recently cloned mouse (58%) and rat Abca17 (51%), respectively, and is located in the syntenic region of mouse Abca17 indicating that it represents the human ortholog of rodent Abca17. However, unlike in the mouse, the full-length ABCA17 transcript (4.3 kb) contains numerous mutations that preclude its translation into a bona fide ABC transporter protein strongly suggesting that the human ABCA17 gene is a transcribed pseudogene (ABCA17P). We identified numerous alternative ABCA17P splice variants which are transcribed from two distinct transcription initiation sites. Genomic analysis revealed that ABCA17P borders on another ABC A-subfamily transporter - the lung surfactant deficiency gene ABCA3. Surprisingly, we found that both genes overlap at their first exons and are transcribed from opposite strands. This genomic colocalization and the observation that the ABCA17P and ABCA3 genes share significant homologies in several exons (up to 98%) suggest that both genes have evolved by gene duplication. CONCLUSION Our results demonstrate that ABCA17P and ABCA3 form a complex of overlapping genes in the human genome from which both non-coding and protein-coding ABC A-transporter RNAs are expressed. The fact that both genes overlap at their 5' ends suggests interdependencies in their regulation and may have important implications for the functional analysis of the disease gene ABCA3. Moreover, this is the first demonstration of the expression of a pseudogene and its parent gene from a common overlapping DNA region in the human genome.
Collapse
Affiliation(s)
- Armin P Piehler
- R&D Group, Department of Clinical Chemistry, Ulleval University Hospital, 0407 Oslo, Norway
| | - Jürgen J Wenzel
- Department of Clinical Chemistry and Laboratory Medicine, Johannes Gutenberg University Hospital, 55101 Mainz, Germany
| | - Ole K Olstad
- R&D Group, Department of Clinical Chemistry, Ulleval University Hospital, 0407 Oslo, Norway
| | - Kari Bente Foss Haug
- R&D Group, Department of Clinical Chemistry, Ulleval University Hospital, 0407 Oslo, Norway
| | - Peter Kierulf
- R&D Group, Department of Clinical Chemistry, Ulleval University Hospital, 0407 Oslo, Norway
| | - Wolfgang E Kaminski
- Institute for Clinical Chemistry, University of Heidelberg, 68167 Mannheim, Germany
| |
Collapse
|
23
|
Alseth I, Rognes T, Lindbäck T, Solberg I, Robertsen K, Kristiansen KI, Mainieri D, Lillehagen L, Kolstø AB, Bjørås M. A new protein superfamily includes two novel 3-methyladenine DNA glycosylases from Bacillus cereus, AlkC and AlkD. Mol Microbiol 2006; 59:1602-9. [PMID: 16468998 PMCID: PMC1413580 DOI: 10.1111/j.1365-2958.2006.05044.x] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Soil bacteria are heavily exposed to environmental methylating agents such as methylchloride and may have special requirements for repair of alkylation damage on DNA. We have used functional complementation of an Escherichia coli tag alkA mutant to screen for 3-methyladenine DNA glycosylase genes in genomic libraries of the soil bacterium Bacillus cereus. Three genes were recovered: alkC, alkD and alkE. The amino acid sequence of AlkE is homologous to the E. coli AlkA sequence. AlkC and AlkD represent novel proteins without sequence similarity to any protein of known function. However, iterative and indirect sequence similarity searches revealed that AlkC and AlkD are distant homologues of each other within a new protein superfamily that is ubiquitous in the prokaryotic kingdom. Homologues of AlkC and AlkD were also identified in the amoebas Entamoeba histolytica and Dictyostelium discoideum, but no other eukaryotic counterparts of the superfamily were found. The alkC and alkD genes were expressed in E. coli and the proteins were purified to homogeneity. Both proteins were found to be specific for removal of N-alkylated bases, and showed no activity on oxidized or deaminated base lesions in DNA. B. cereus AlkC and AlkD thus define novel families of alkylbase DNA glycosylases within a new protein superfamily.
Collapse
Affiliation(s)
- Ingrun Alseth
- Department of Molecular Biology, Institute of Medical Microbiology and Centre of Molecular Biology and Neuroscience, University of OsloRikshospitalet-Radiumhospitalet HF, N-0027 Oslo, Norway
| | - Torbjørn Rognes
- Department of Molecular Biology, Institute of Medical Microbiology and Centre of Molecular Biology and Neuroscience, University of OsloRikshospitalet-Radiumhospitalet HF, N-0027 Oslo, Norway
- Department of Informatics, University of OsloPO Box 1080 Blindern, N-0316 Oslo, Norway.
| | - Toril Lindbäck
- Department of Food Safety and Infection BiologyNorwegian School of Veterinary Science, N-0033 Oslo, Norway
- Biotechnology Centre of Oslo and Department of Pharmaceutical Biosciences, University of OsloPO Box 1125 Blindern, N-0317 Oslo, Norway
| | - Inger Solberg
- Department of Molecular Biology, Institute of Medical Microbiology and Centre of Molecular Biology and Neuroscience, University of OsloRikshospitalet-Radiumhospitalet HF, N-0027 Oslo, Norway
| | - Kristin Robertsen
- Department of Molecular Biology, Institute of Medical Microbiology and Centre of Molecular Biology and Neuroscience, University of OsloRikshospitalet-Radiumhospitalet HF, N-0027 Oslo, Norway
| | - Knut Ivan Kristiansen
- Department of Molecular Biology, Institute of Medical Microbiology and Centre of Molecular Biology and Neuroscience, University of OsloRikshospitalet-Radiumhospitalet HF, N-0027 Oslo, Norway
| | - Davide Mainieri
- Department of Molecular Biology, Institute of Medical Microbiology and Centre of Molecular Biology and Neuroscience, University of OsloRikshospitalet-Radiumhospitalet HF, N-0027 Oslo, Norway
| | - Lucy Lillehagen
- Biotechnology Centre of Oslo and Department of Pharmaceutical Biosciences, University of OsloPO Box 1125 Blindern, N-0317 Oslo, Norway
| | - Anne-Brit Kolstø
- Biotechnology Centre of Oslo and Department of Pharmaceutical Biosciences, University of OsloPO Box 1125 Blindern, N-0317 Oslo, Norway
| | - Magnar Bjørås
- Department of Molecular Biology, Institute of Medical Microbiology and Centre of Molecular Biology and Neuroscience, University of OsloRikshospitalet-Radiumhospitalet HF, N-0027 Oslo, Norway
- *For correspondence. E-mail ; Tel. (+47) 23074061; Fax (+47) 23074060
| |
Collapse
|
24
|
Sæbø PE, Andersen SM, Myrseth J, Laerdahl JK, Rognes T. PARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology. Nucleic Acids Res 2005; 33:W535-9. [PMID: 15980529 PMCID: PMC1160184 DOI: 10.1093/nar/gki423] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
PARALIGN is a rapid and sensitive similarity search tool for the identification of distantly related sequences in both nucleotide and amino acid sequence databases. Two algorithms are implemented, accelerated Smith-Waterman and ParAlign. The ParAlign algorithm is similar to Smith-Waterman in sensitivity, while as quick as BLAST for protein searches. A form of parallel computing technology known as multimedia technology that is available in modern processors, but rarely used by other bioinformatics software, has been exploited to achieve the high speed. The software is also designed to run efficiently on computer clusters using the message-passing interface standard. A public search service powered by a large computer cluster has been set-up and is freely available at www.paralign.org, where the major public databases can be searched. The software can also be downloaded free of charge for academic use.
Collapse
Affiliation(s)
- Per Eystein Sæbø
- Centre for Molecular Biology and Neuroscience, Institute of Medical Microbiology, University of Oslo and Rikshospitalet-Radiumhospitalet HFNO-0027 Oslo, Norway
| | | | - Jon Myrseth
- Centre for Molecular Biology and Neuroscience, Institute of Medical Microbiology, University of Oslo and Rikshospitalet-Radiumhospitalet HFNO-0027 Oslo, Norway
| | - Jon K. Laerdahl
- Centre for Molecular Biology and Neuroscience, Institute of Medical Microbiology, University of Oslo and Rikshospitalet-Radiumhospitalet HFNO-0027 Oslo, Norway
| | - Torbjørn Rognes
- Centre for Molecular Biology and Neuroscience, Institute of Medical Microbiology, University of Oslo and Rikshospitalet-Radiumhospitalet HFNO-0027 Oslo, Norway
- Sencel Bioinformatics ASMotzfeldts gate 16, NO-0187 Oslo, Norway
- Department of Informatics, University of OsloPO Box 1080, NO-0316, Oslo, Norway
- To whom correspondence should be addressed. Tel: +47 22844787; Fax: +47 22844782;
| |
Collapse
|
25
|
Snøve O, Nedland M, Fjeldstad SH, Humberset H, Birkeland OR, Grünfeld T, Saetrom P. Designing effective siRNAs with off-target control. Biochem Biophys Res Commun 2004; 325:769-73. [PMID: 15541356 DOI: 10.1016/j.bbrc.2004.10.097] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2004] [Indexed: 10/26/2022]
Abstract
Successful gene silencing by RNA interference requires a potent and specific depletion of the target mRNA. Target candidates must be chosen so that their corresponding short interfering RNAs are likely to be effective against that target and unlikely to accidentally silence other transcripts due to sequence similarity. We show that both effective and unique targets exist in mouse, fruit fly, and worm, and present a new design tool that enables users to make the trade-off between efficacy and uniqueness. The tool lists all targets with partial sequence similarity to the primary target to highlight candidates for negative controls.
Collapse
Affiliation(s)
- Ola Snøve
- Interagon AS, Medisinsk teknisk senter, NO-7489 Trondheim, Norway.
| | | | | | | | | | | | | |
Collapse
|
26
|
Sunyaev SR, Bogopolsky GA, Oleynikova NV, Vlasov PK, Finkelstein AV, Roytberg MA. From analysis of protein structural alignments toward a novel approach to align protein sequences. Proteins 2003; 54:569-82. [PMID: 14748004 DOI: 10.1002/prot.10503] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Alignment of protein sequences is a key step in most computational methods for prediction of protein function and homology-based modeling of three-dimensional (3D)-structure. We investigated correspondence between "gold standard" alignments of 3D protein structures and the sequence alignments produced by the Smith-Waterman algorithm, currently the most sensitive method for pair-wise alignment of sequences. The results of this analysis enabled development of a novel method to align a pair of protein sequences. The comparison of the Smith-Waterman and structure alignments focused on their inner structure and especially on the continuous ungapped alignment segments, "islands" between gaps. Approximately one third of the islands in the gold standard alignments have negative or low positive score, and their recognition is below the sensitivity limit of the Smith-Waterman algorithm. From the alignment accuracy perspective, the time spent by the algorithm while working in these unalignable regions is unnecessary. We considered features of the standard similarity scoring function responsible for this phenomenon and suggested an alternative hierarchical algorithm, which explicitly addresses high scoring regions. This algorithm is considerably faster than the Smith-Waterman algorithm, whereas resulting alignments are in average of the same quality with respect to the gold standard. This finding shows that the decrease of alignment accuracy is not necessarily a price for the computational efficiency.
Collapse
Affiliation(s)
- Shamil R Sunyaev
- Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | | | | | | | | | | |
Collapse
|
27
|
A Procedure for Biological Sensitive Pattern Matching in Protein Sequences. PATTERN RECOGNITION AND IMAGE ANALYSIS 2003. [DOI: 10.1007/978-3-540-44871-6_64] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
28
|
Morland I, Rolseth V, Luna L, Rognes T, Bjørås M, Seeberg E. Human DNA glycosylases of the bacterial Fpg/MutM superfamily: an alternative pathway for the repair of 8-oxoguanine and other oxidation products in DNA. Nucleic Acids Res 2002; 30:4926-36. [PMID: 12433996 PMCID: PMC137166 DOI: 10.1093/nar/gkf618] [Citation(s) in RCA: 216] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The mild phenotype associated with targeted disruption of the mouse OGG1 and NTH1 genes has been attributed to the existence of back-up activities and/or alternative pathways for the removal of oxidised DNA bases. We have characterised two new genes in human cells that encode DNA glycosylases, homologous to the bacterial Fpg (MutM)/Nei class of enzymes, capable of removing lesions that are substrates for both hOGG1 and hNTH1. One gene, designated HFPG1, showed ubiquitous expression in all tissues examined whereas the second gene, HFPG2, was only expressed at detectable levels in the thymus and testis. Transient transfections of HeLa cells with fusions of the cDNAs to EGFP revealed intracellular sorting to the nucleus with accumulation in the nucleoli for hFPG1, while hFPG2 co-localised with the 30 kDa subunit of RPA. hFPG1 was purified and shown to act on DNA substrates containing 8-oxoguanine, 5-hydroxycytosine and abasic sites. Removal of 8-oxoguanine, but not cleavage at abasic sites, was opposite base-dependent, with 8-oxoG:C being the preferred substrate and negligible activity towards 8-oxoG:A. It thus appears that hFPG1 has properties similar to mammalian OGG1 in preventing mutations arising from misincorporation of A across 8-oxoG and could function as a back-up repair activity for OGG1 in ogg1(-/-) mice.
Collapse
Affiliation(s)
- Ingrid Morland
- Department of Molecular Biology, Institute of Medical Microbiology, University of Oslo, Rikshospitalet, 0027 Oslo, Norway
| | | | | | | | | | | |
Collapse
|
29
|
Thallinger GG, Trajanoski S, Stocker G, Trajanoski Z. Information management systems for pharmacogenomics. Pharmacogenomics 2002; 3:651-67. [PMID: 12223050 DOI: 10.1517/14622416.3.5.651] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
The value of high-throughput genomic research is dramatically enhanced by association with key patient data. These data are generally available but of disparate quality and not typically directly associated. A system that could bring these disparate data sources into a common resource connected with functional genomic data would be tremendously advantageous. However, the integration of clinical and accurate interpretation of the generated functional genomic data requires the development of information management systems capable of effectively capturing the data as well as tools to make that data accessible to the laboratory scientist or to the clinician. In this review these challenges and current information technology solutions associated with the management, storage and analysis of high-throughput data are highlighted. It is suggested that the development of a pharmacogenomic data management system which integrates public and proprietary databases, clinical datasets, and data mining tools embedded in a high-performance computing environment should include the following components: parallel processing systems, storage technologies, network technologies, databases and database management systems (DBMS), and application services.
Collapse
Affiliation(s)
- Gerhard G Thallinger
- Institute of Biomedical Engineering, Graz University of Technology, Krenngasse 37, A-8010 Graz, Austria.
| | | | | | | |
Collapse
|
30
|
David RB, Lim GB, Moritz KM, Koukoulas I, Wintour EM. Quantitation of the mRNA levels of Epo and EpoR in various tissues in the ovine fetus. Mol Cell Endocrinol 2002; 188:207-18. [PMID: 11911958 DOI: 10.1016/s0303-7207(01)00718-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
A partial cDNA of the sheep erythropoietin receptor (EpoR) was obtained and used in real-time PCR to quantitate mRNA levels in placenta, liver and kidney throughout development (term=150 days). This was compared with Epo mRNA levels in the same tissues. Both Epo and EpoR mRNA were present in the placenta throughout gestation at low levels from 66 days onwards and these did not vary throughout gestation. Compared with the expression levels in the placenta, the levels of EpoR gene expression in the liver at 66, 99 and 140 days were, median (range)-288 (120-343), 278 (63-541) and 7 (3-15), respectively, reflecting the disappearance of erythropoiesis after 130 days. Low levels of EpoR gene expression were seen in the kidney at 3 (2-5), 5 (2-7), and 7 (2-10) times that in the placenta at 66, 99, and 140 days, respectively. By hybridization histochemistry the EpoR mRNA was located in the proximal tubular cells of the mesonephros and metanephros at 42 days. Epo mRNA levels in the kidney were 215 (116-867), 528 (113-765) and 46 (15-204) times those in the placenta at 69, 99, and 140 days, respectively. In the liver at the same ages the concentrations of mRNA were lower than in the kidney, the liver/placenta ratios being 50 (11-90), 17 (3-39), 9 (5-14). At 130 days Epo/EpoR levels in the hippocampus were 6+/-3 and 8+/-3 times that in the term placenta, respectively. These studies demonstrate that the ovine placenta expresses the Epo gene from at least 66 days of gestation. However, gene expression levels are very low compared with those in the liver and kidney, and even the hippocampus.
Collapse
Affiliation(s)
- R Bruce David
- Norwegian School of Veterinary Science, Oslo, Norway
| | | | | | | | | |
Collapse
|
31
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2001. [PMCID: PMC2447222 DOI: 10.1002/cfg.60] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
|