1
|
Bajic VB, Charn TH, Xu JX, Panda SK, T Krishnan SP. Prediction Models for DNA Transcription Termination Based on SOM Networks. Conf Proc IEEE Eng Med Biol Soc 2012; 2005:4791-4. [PMID: 17281313 DOI: 10.1109/iembs.2005.1615543] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This paper presents two efficient models for predicting transcription termination (TT) in human DNA. A neural network, self-organizing map, was used for finding features from a human polyadenylation (polyA) sites dataset. We derived prediction models related to different polyA signals. A program, "Dragon PolyAtt", for predicting TT regions was designed for the two most frequent polyA sites "AAUAAA" and "AUUAAA". In our tests, Dragon PolyAtt predicts TT regions with a sensitivity of 48.4% (13.6%) and specificity of 74% (79.1%) when searching for polyA signal "AAUAAA" ("AUUAAA"). Both tests were done on human chromosome 21. Results of Dragon PolyAtt system are substantially better than those obtained by the well-known "polyadq" program.
Collapse
|
2
|
Krishnan SPT, Liang SS, Veeravalli B. Towards high performance computing for molecular structure prediction using IBM Cell Broadband Engine--an implementation perspective. BMC Bioinformatics 2010; 11 Suppl 1:S36. [PMID: 20122209 PMCID: PMC3009508 DOI: 10.1186/1471-2105-11-s1-s36] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND RNA structure prediction problem is a computationally complex task, especially with pseudo-knots. The problem is well-studied in existing literature and predominantly uses highly coupled Dynamic Programming (DP) solutions. The problem scale and complexity become embarrassingly humungous to handle as sequence size increases. This makes the case for parallelization. Parallelization can be achieved by way of networked platforms (clusters, grids, etc) as well as using modern day multi-core chips. METHODS In this paper, we exploit the parallelism capabilities of the IBM Cell Broadband Engine to parallelize an existing Dynamic Programming (DP) algorithm for RNA secondary structure prediction. We design three different implementation strategies that exploit the inherent data, code and/or hybrid parallelism, referred to as C-Par, D-Par and H-Par, and analyze their performances. Our approach attempts to introduce parallelism in critical sections of the algorithm. We ran our experiments on SONY Play Station 3 (PS3), which is based on the IBM Cell chip. RESULTS Our results suggest that introducing parallelism in DP algorithm allows it to easily handle longer sequences which otherwise would consume a large amount of time in single core computers. The results further demonstrate the speed-up gain achieved in exploiting the inherent parallelism in the problem and also elicits the advantages of using multi-core platforms towards designing more sophisticated methodologies for handling a fairly long sequence of RNA. CONCLUSION The speed-up performance reported here is promising, especially when sequence length is long. To the best of our literature survey, the work reported in this paper is probably the first-of-its-kind to utilize the IBM Cell Broadband Engine (a heterogeneous multi-core chip) to implement a DP. The results also encourage using multi-core platforms towards designing more sophisticated methodologies for handling a fairly long sequence of RNA to predict its secondary structure.
Collapse
Affiliation(s)
- S P T Krishnan
- Institute for Infocomm Research, 1 Fusionopolis Way, #21-01, Connexis South Tower, Singapore 138632.
| | | | | |
Collapse
|
3
|
Brahmachary M, Schönbach C, Yang L, Huang E, Tan SL, Chowdhary R, Krishnan SPT, Lin CY, Hume DA, Kai C, Kawai J, Carninci P, Hayashizaki Y, Bajic VB. Computational promoter analysis of mouse, rat and human antimicrobial peptide-coding genes. BMC Bioinformatics 2006; 7 Suppl 5:S8. [PMID: 17254313 PMCID: PMC1764486 DOI: 10.1186/1471-2105-7-s5-s8] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Mammalian antimicrobial peptides (AMPs) are effectors of the innate immune response. A multitude of signals coming from pathways of mammalian pathogen/pattern recognition receptors and other proteins affect the expression of AMP-coding genes (AMPcgs). For many AMPcgs the promoter elements and transcription factors that control their tissue cell-specific expression have yet to be fully identified and characterized. RESULTS Based upon the RIKEN full-length cDNA and public sequence data derived from human, mouse and rat, we identified 178 candidate AMP transcripts derived from 61 genes belonging to 29 AMP families. However, only for 31 mouse genes belonging to 22 AMP families we were able to determine true orthologous relationships with 30 human and 15 rat sequences. We screened the promoter regions of AMPcgs in the three species for motifs by an ab initio motif finding method and analyzed the derived promoter characteristics. Promoter models were developed for alpha-defensins, penk and zap AMP families. The results suggest a core set of transcription factors (TFs) that regulate the transcription of AMPcg families in mouse, rat and human. The three most frequent core TFs groups include liver-, nervous system-specific and nuclear hormone receptors (NHRs). Out of 440 motifs analyzed, we found that three represent potentially novel TF-binding motifs enriched in promoters of AMPcgs, while the other four motifs appear to be species-specific. CONCLUSION Our large-scale computational analysis of promoters of 22 families of AMPcgs across three mammalian species suggests that their key transcriptional regulators are likely to be TFs of the liver-, nervous system-specific and NHR groups. The computationally inferred promoter elements and potential TF binding motifs provide a rich resource for targeted experimental validation of TF binding and signaling studies that aim at the regulation of mouse, rat or human AMPcgs.
Collapse
Affiliation(s)
- Manisha Brahmachary
- Knowledge Extraction Laboratory, Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, Singapore
- Department of Biochemistry, Faculty of Medicine, National University of Singapore, 8 Medical Drive, Singapore 117597, Singapore
| | - Christian Schönbach
- Immunoinformatics Research Team, Advanced Genome Information Technology Group, RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
- Division of Genomics and Genetics, School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
| | - Liang Yang
- Department of Obstetrics and Gynecology, National University Hospital, National University of Singapore, 5 Lower Kent Ridge Road, Singapore 119074, Singapore
| | - Enli Huang
- Knowledge Extraction Laboratory, Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, Singapore
| | - Sin Lam Tan
- Knowledge Extraction Laboratory, Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, Singapore
- University of the Western Cape, South African National Bioinformatics Institute (SANBI), Private Bag X17, Bellville 7535, South Africa
| | - Rajesh Chowdhary
- Knowledge Extraction Laboratory, Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, Singapore
| | - SPT Krishnan
- Knowledge Extraction Laboratory, Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, Singapore
| | - Chin-Yo Lin
- Brigham Young University, Department of Microbiology and Molecular Biology, 753 WIDB, Provo, UT 84602, USA
| | - David A Hume
- ARC Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland 4072, Australia
| | - Chikatoshi Kai
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Jun Kawai
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
- Genome Science Laboratory, Discovery Research Institute, RIKEN Wako Institute, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Piero Carninci
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
- Genome Science Laboratory, Discovery Research Institute, RIKEN Wako Institute, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Yoshihide Hayashizaki
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
- Genome Science Laboratory, Discovery Research Institute, RIKEN Wako Institute, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Vladimir B Bajic
- University of the Western Cape, South African National Bioinformatics Institute (SANBI), Private Bag X17, Bellville 7535, South Africa
| |
Collapse
|
4
|
Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, Kodzius R, Shimokawa K, Bajic VB, Brenner SE, Batalov S, Forrest ARR, Zavolan M, Davis MJ, Wilming LG, Aidinis V, Allen JE, Ambesi-Impiombato A, Apweiler R, Aturaliya RN, Bailey TL, Bansal M, Baxter L, Beisel KW, Bersano T, Bono H, Chalk AM, Chiu KP, Choudhary V, Christoffels A, Clutterbuck DR, Crowe ML, Dalla E, Dalrymple BP, de Bono B, Della Gatta G, di Bernardo D, Down T, Engstrom P, Fagiolini M, Faulkner G, Fletcher CF, Fukushima T, Furuno M, Futaki S, Gariboldi M, Georgii-Hemming P, Gingeras TR, Gojobori T, Green RE, Gustincich S, Harbers M, Hayashi Y, Hensch TK, Hirokawa N, Hill D, Huminiecki L, Iacono M, Ikeo K, Iwama A, Ishikawa T, Jakt M, Kanapin A, Katoh M, Kawasawa Y, Kelso J, Kitamura H, Kitano H, Kollias G, Krishnan SPT, Kruger A, Kummerfeld SK, Kurochkin IV, Lareau LF, Lazarevic D, Lipovich L, Liu J, Liuni S, McWilliam S, Madan Babu M, Madera M, Marchionni L, Matsuda H, Matsuzawa S, Miki H, Mignone F, Miyake S, Morris K, Mottagui-Tabar S, Mulder N, Nakano N, Nakauchi H, Ng P, Nilsson R, Nishiguchi S, Nishikawa S, Nori F, Ohara O, Okazaki Y, Orlando V, Pang KC, Pavan WJ, Pavesi G, Pesole G, Petrovsky N, Piazza S, Reed J, Reid JF, Ring BZ, Ringwald M, Rost B, Ruan Y, Salzberg SL, Sandelin A, Schneider C, Schönbach C, Sekiguchi K, Semple CAM, Seno S, Sessa L, Sheng Y, Shibata Y, Shimada H, Shimada K, Silva D, Sinclair B, Sperling S, Stupka E, Sugiura K, Sultana R, Takenaka Y, Taki K, Tammoja K, Tan SL, Tang S, Taylor MS, Tegner J, Teichmann SA, Ueda HR, van Nimwegen E, Verardo R, Wei CL, Yagi K, Yamanishi H, Zabarovsky E, Zhu S, Zimmer A, Hide W, Bult C, Grimmond SM, Teasdale RD, Liu ET, Brusic V, Quackenbush J, Wahlestedt C, Mattick JS, Hume DA, Kai C, Sasaki D, Tomaru Y, Fukuda S, Kanamori-Katayama M, Suzuki M, Aoki J, Arakawa T, Iida J, Imamura K, Itoh M, Kato T, Kawaji H, Kawagashira N, Kawashima T, Kojima M, Kondo S, Konno H, Nakano K, Ninomiya N, Nishio T, Okada M, Plessy C, Shibata K, Shiraki T, Suzuki S, Tagami M, Waki K, Watahiki A, Okamura-Oho Y, Suzuki H, Kawai J, Hayashizaki Y. The transcriptional landscape of the mammalian genome. Science 2005; 309:1559-63. [PMID: 16141072 DOI: 10.1126/science.1112014] [Citation(s) in RCA: 2607] [Impact Index Per Article: 137.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
This study describes comprehensive polling of transcription start and termination sites and analysis of previously unidentified full-length complementary DNAs derived from the mouse genome. We identify the 5' and 3' boundaries of 181,047 transcripts with extensive variation in transcripts arising from alternative promoter usage, splicing, and polyadenylation. There are 16,247 new mouse protein-coding transcripts, including 5154 encoding previously unidentified proteins. Genomic mapping of the transcriptome reveals transcriptional forests, with overlapping transcription on both strands, separated by deserts in which few transcripts are observed. The data provide a comprehensive platform for the comparative analysis of mammalian transcriptional regulation in differentiation and development.
Collapse
|
5
|
Mohanty B, Krishnan SPT, Swarup S, Bajic VB. Detection and preliminary analysis of motifs in promoters of anaerobically induced genes of different plant species. Ann Bot 2005; 96:669-81. [PMID: 16027132 PMCID: PMC4247034 DOI: 10.1093/aob/mci219] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2004] [Revised: 12/16/2004] [Accepted: 01/31/2005] [Indexed: 05/03/2023]
Abstract
BACKGROUND AND AIMS Plants can suffer from oxygen limitation during flooding or more complete submergence and may therefore switch from Kreb's cycle respiration to fermentation in association with the expression of anaerobically inducible genes coding for enzymes involved in glycolysis and fermentation. The aim of this study was to clarify mechanisms of transcriptional regulation of these anaerobic genes by identifying motifs shared by their promoter regions. METHODS Statistically significant motifs were detected by an in silico method from 13 promoters of anaerobic genes. The selected motifs were common for the majority of analysed promoters. Their significance was evaluated by searching for their presence in transcription factor-binding site databases (TRANSFAC, PlantCARE and PLACE). Using several negative control data sets, it was tested whether the motifs found were specific to the anaerobic group. KEY RESULTS Previously, anaerobic response elements have been identified in maize (Zea mays) and arabidopsis (Arabidopsis thaliana) genes. Known functional motifs were detected, such as GT and GC motifs, but also other motifs shared by most of the genes examined. Five motifs detected have not been found in plants hitherto but are present in the promoters of animal genes with various functions. The consensus sequences of these novel motifs are 5'-AAACAAA-3', 5'-AGCAGC-3', 5'-TCATCAC-3', 5'-GTTT(A/C/T)GCAA-3' and 5'-TTCCCTGTT-3'. CONCLUSIONS It is believed that the promoter motifs identified could be functional by conferring anaerobic sensitivity to the genes that possess them. This proposal now requires experimental verification.
Collapse
Affiliation(s)
- Bijayalaxmi Mohanty
- Knowledge Extraction Laboratory, Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613.
| | | | | | | |
Collapse
|
6
|
Pan H, Zuo L, Choudhary V, Zhang Z, Leow SH, Chong FT, Huang Y, Ong VWS, Mohanty B, Tan SL, Krishnan SPT, Bajic VB. Dragon TF Association Miner: a system for exploring transcription factor associations through text-mining. Nucleic Acids Res 2004; 32:W230-4. [PMID: 15215386 PMCID: PMC441622 DOI: 10.1093/nar/gkh484] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We present Dragon TF Association Miner (DTFAM), a system for text-mining of PubMed documents for potential functional association of transcription factors (TFs) with terms from Gene Ontology (GO) and with diseases. DTFAM has been trained and tested in the selection of relevant documents on a manually curated dataset containing >3000 PubMed abstracts relevant to transcription control. On our test data the system achieves sensitivity of 80% with specificity of 82%. DTFAM provides comprehensive tabular and graphical reports linking terms to relevant sets of documents. These documents are color-coded for easier inspection. DTFAM complements the existing biological resources by collecting, assessing, extracting and presenting associations that can reveal some of the not so easily observable connections among the entities found which could explain the functions of TFs and help decipher parts of gene transcriptional regulatory networks. DTFAM summarizes information from a large volume of documents saving time and making analysis simpler for individual users. DTFAM is freely available for academic and non-profit users at http://research.i2r.a-star.edu.sg/DRAGON/TFAM/.
Collapse
Affiliation(s)
- Hong Pan
- Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Abstract
Antimicrobial peptides (AMPs) are important components of the innate immune system of many species. These peptides are found in eukaryotes, including mammals, amphibians, insects and plants, as well as in prokaryotes. Other than having pathogen-lytic properties, these peptides have other activities like antitumor activity, mitogen activity, or they may act as signaling molecules. Their short length, fast and efficient action against microbes and low toxicity to mammals have made them potential candidates as peptide drugs. In many cases they are effective against pathogens that are resistant to conventional antibiotics. They can serve as natural templates for the design of novel antimicrobial drugs. Although there are vast amounts of data on natural AMPs, they are not available through one central resource. We have developed a comprehensive database (ANTIMIC, http://research.i2r. a-star.edu.sg/Templar/DB/ANTIMIC/) of known and putative AMPs, which contains approximately 1700 of these peptides. The database is integrated with tools to facilitate efficient extraction of data and their analysis at molecular level, as well as search for new AMPs. These tools include BLAST, PDB structure viewer and the Antimic profile module.
Collapse
Affiliation(s)
- M Brahmachary
- Institute of Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613
| | | | | | | | | | | | | | | |
Collapse
|
8
|
Bajic VB, Seah SH, Chong A, Krishnan SPT, Koh JLY, Brusic V. Computer model for recognition of functional transcription start sites in RNA polymerase II promoters of vertebrates. J Mol Graph Model 2003; 21:323-32. [PMID: 12543131 DOI: 10.1016/s1093-3263(02)00179-1] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
This paper introduces a new computer system for recognition of functional transcription start sites (TSSs) in RNA polymerase II promoter regions of vertebrates. This system allows scanning complete vertebrate genomes for promoters with significantly reduced number of false positive predictions. It can be used in the context of gene finding through its recognition of the 5' end of genes. The implemented recognition model uses a composite-hierarchical approach, artificial intelligence, statistics, and signal processing techniques. It also exploits the separation of promoter sequences into those that are C+G-rich or C+G-poor. The system was evaluated on a large and diverse human sequence-set and exhibited several times higher accuracy than several publicly available TSS-finding programs. Results obtained using human chromosome 22 data showed even greater specificity than the evaluation set results. The system has been implemented in the Dragon Promoter Finder package, which can be accessed at http://sdmc.krdl.org.sg:8080/promoter/.
Collapse
Affiliation(s)
- Vladimir B Bajic
- Computational Immunology Group, BIC-LIT, Laboratories for Information Technology, 21 Heng Mui Keng Terrace, 119613 Singapore, Singapore
| | | | | | | | | | | |
Collapse
|