26
|
Tokuzawa Y, Yagi K, Yamashita Y, Nakachi Y, Nikaido I, Bono H, Ninomiya Y, Kanesaki-Yatsuka Y, Akita M, Motegi H, Wakana S, Noda T, Sablitzky F, Arai S, Kurokawa R, Fukuda T, Katagiri T, Schönbach C, Suda T, Mizuno Y, Okazaki Y. Id4, a new candidate gene for senile osteoporosis, acts as a molecular switch promoting osteoblast differentiation. PLoS Genet 2010; 6:e1001019. [PMID: 20628571 PMCID: PMC2900302 DOI: 10.1371/journal.pgen.1001019] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2010] [Accepted: 06/04/2010] [Indexed: 01/03/2023] Open
Abstract
Excessive accumulation of bone marrow adipocytes observed in senile osteoporosis or age-related osteopenia is caused by the unbalanced differentiation of MSCs into bone marrow adipocytes or osteoblasts. Several transcription factors are known to regulate the balance between adipocyte and osteoblast differentiation. However, the molecular mechanisms that regulate the balance between adipocyte and osteoblast differentiation in the bone marrow have yet to be elucidated. To identify candidate genes associated with senile osteoporosis, we performed genome-wide expression analyses of differentiating osteoblasts and adipocytes. Among transcription factors that were enriched in the early phase of differentiation, Id4 was identified as a key molecule affecting the differentiation of both cell types. Experiments using bone marrow-derived stromal cell line ST2 and Id4-deficient mice showed that lack of Id4 drastically reduces osteoblast differentiation and drives differentiation toward adipocytes. On the other hand knockdown of Id4 in adipogenic-induced ST2 cells increased the expression of Pparγ2, a master regulator of adipocyte differentiation. Similar results were observed in bone marrow cells of femur and tibia of Id4-deficient mice. However the effect of Id4 on Pparγ2 and adipocyte differentiation is unlikely to be of direct nature. The mechanism of Id4 promoting osteoblast differentiation is associated with the Id4-mediated release of Hes1 from Hes1-Hey2 complexes. Hes1 increases the stability and transcriptional activity of Runx2, a key molecule of osteoblast differentiation, which results in an enhanced osteoblast-specific gene expression. The new role of Id4 in promoting osteoblast differentiation renders it a target for preventing the onset of senile osteoporosis. Increased bone marrow adiposity is observed in the bone marrow of senile osteoporosis patients. This is caused by unbalanced differentiation of mesenchymal stem cells (MSCs) into osteoblast or adipocyte. Previous reports have indicated that several transcription factors play important roles in determining the direction of MSCs differentiation into osteoblast or adipocyte. So far, little is known about the overall dynamics and regulation of transcription factor expression changes leading to the imbalance of osteoblast and adipocyte differentiation inside the bone marrow. We have performed genome-wide gene expression analyses during the differentiation of MSCs into osteoblast or adipocyte. We identified basic helix-loop-helix transcription factor family member Id4 as a leading candidate controlling the differentiation toward adipocyte or osteoblast. Suppression of Id4 expression in MSCs repressed osteoblast differentiation and increased adipocyte differentiation. In contrast, overexpression of Id4 in MSCs promoted osteoblast differentiation and attenuated adipocyte differentiation. Moreover, Id4-mutant mice showed abnormal accumulation of lipid droplets in bone marrow and impaired bone formation activity. In summary, we have demonstrated a molecular function of Id4 in osteoblast differentiation. The findings revealed that Id4 is a molecular switch enhancing osteoblast differentiation at the expense of adipocyte differentiation.
Collapse
|
27
|
Mizuno Y, Kurochkin IV, Herberth M, Okazaki Y, Schönbach C. Predicted mouse peroxisome-targeted proteins and their actual subcellular locations. BMC Bioinformatics 2008; 9 Suppl 12:S16. [PMID: 19091015 PMCID: PMC2638156 DOI: 10.1186/1471-2105-9-s12-s16] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The import of most intraperoxisomal proteins is mediated by peroxisome targeting signals at their C-termini (PTS1) or N-terminal regions (PTS2). Both signals have been integrated in subcellular location prediction programs. However their present performance, particularly of PTS2-targeting did not seem fitting for large-scale screening of sequences. RESULTS We modified an earlier reported PTS1 screening method to identify PTS2-containing mouse candidates using a combination of computational and manual annotation. For rapid confirmation of five new PTS2- and two previously identified PTS1-containing candidates we developed the new cell line CHO-perRed which stably expresses the peroxisomal marker dsRed-PTS1. Using CHO-perRed we confirmed the peroxisomal localization of PTS1-targeted candidate Zadh2. Preliminary characterization of Zadh2 expression suggested non-PPARalpha mediated activation. Notably, none of the PTS2 candidates located to peroxisomes. CONCLUSION In a few cases the PTS may oscillate from "silent" to "functional" depending on its surface accessibility indicating the potential for context-dependent conditional subcellular sorting. Overall, PTS2-targeting predictions are unlikely to improve without generation and integration of new experimental data from location proteomics, protein structures and quantitative Pex7 PTS2 peptide binding assays.
Collapse
|
28
|
Schönbach C. The RIKEN mouse transcriptome: lessons learned and implications for the regulation of immune reactions. NOVARTIS FOUNDATION SYMPOSIUM 2007; 281:25-34; discussion 34-7, 50-3, 208-9. [PMID: 17534063 DOI: 10.1002/9780470062128.ch3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Notably, the technology and analysis methods of the RIKEN mouse full-length cDNA project have contributed a lot to the capture of the transcriptional output of the mouse genome and the description of its combinatorial nature. However, one corollary of this large scale transcript resource is the dichotomy of vast and missing information. As such, the transcriptional and translational output of yet unknown size following non-canonical principles remains to be established and interpreted. The importance of identifying immune-related transcripts and establishing their molecular functions in context of complex immune system diseases is clear: knowledge about the transcriptome can advance the understanding of immune system regulation. Decipher ing the logic of transcriptomes is critical for understanding the ontogeny and effector functions of immune cells, but it is not sufficient. The next challenge will lie in the combined sampling and integrated analysis of genomic elements, transcripts, proteins and metabolites.
Collapse
|
29
|
Kurochkin IV, Mizuno Y, Konagaya A, Sakaki Y, Schönbach C, Okazaki Y. Novel peroxisomal protease Tysnd1 processes PTS1- and PTS2-containing enzymes involved in beta-oxidation of fatty acids. EMBO J 2007; 26:835-45. [PMID: 17255948 PMCID: PMC1794383 DOI: 10.1038/sj.emboj.7601525] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2006] [Accepted: 12/05/2006] [Indexed: 12/21/2022] Open
Abstract
Peroxisomes play an important role in beta-oxidation of fatty acids. All peroxisomal matrix proteins are synthesized in the cytosol and post-translationally sorted to the organelle. Two distinct peroxisomal signal targeting sequences (PTSs), the C-terminal PTS1 and the N-terminal PTS2, have been defined. Import of precursor PTS2 proteins into the peroxisomes is accompanied by a proteolytic removal of the N-terminal targeting sequence. Although the PTS1 signal is preserved upon translocation, many PTS1 proteins undergo a highly selective and limited cleavage. Here, we demonstrate that Tysnd1, a previously uncharacterized protein, is responsible both for the removal of the leader peptide from PTS2 proteins and for the specific processing of PTS1 proteins. All of the identified Tysnd1 substrates catalyze peroxisomal beta-oxidation. Tysnd1 itself undergoes processing through the removal of the presumably inhibitory N-terminal fragment. Tysnd1 expression is induced by the proliferator-activated receptor alpha agonist bezafibrate, along with the increase in its substrates. A model is proposed where the Tysnd1-mediated processing of the peroxisomal enzymes promotes their assembly into a supramolecular complex to enhance the rate of beta-oxidation.
Collapse
|
30
|
Brahmachary M, Schönbach C, Yang L, Huang E, Tan SL, Chowdhary R, Krishnan SPT, Lin CY, Hume DA, Kai C, Kawai J, Carninci P, Hayashizaki Y, Bajic VB. Computational promoter analysis of mouse, rat and human antimicrobial peptide-coding genes. BMC Bioinformatics 2006; 7 Suppl 5:S8. [PMID: 17254313 PMCID: PMC1764486 DOI: 10.1186/1471-2105-7-s5-s8] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Mammalian antimicrobial peptides (AMPs) are effectors of the innate immune response. A multitude of signals coming from pathways of mammalian pathogen/pattern recognition receptors and other proteins affect the expression of AMP-coding genes (AMPcgs). For many AMPcgs the promoter elements and transcription factors that control their tissue cell-specific expression have yet to be fully identified and characterized. RESULTS Based upon the RIKEN full-length cDNA and public sequence data derived from human, mouse and rat, we identified 178 candidate AMP transcripts derived from 61 genes belonging to 29 AMP families. However, only for 31 mouse genes belonging to 22 AMP families we were able to determine true orthologous relationships with 30 human and 15 rat sequences. We screened the promoter regions of AMPcgs in the three species for motifs by an ab initio motif finding method and analyzed the derived promoter characteristics. Promoter models were developed for alpha-defensins, penk and zap AMP families. The results suggest a core set of transcription factors (TFs) that regulate the transcription of AMPcg families in mouse, rat and human. The three most frequent core TFs groups include liver-, nervous system-specific and nuclear hormone receptors (NHRs). Out of 440 motifs analyzed, we found that three represent potentially novel TF-binding motifs enriched in promoters of AMPcgs, while the other four motifs appear to be species-specific. CONCLUSION Our large-scale computational analysis of promoters of 22 families of AMPcgs across three mammalian species suggests that their key transcriptional regulators are likely to be TFs of the liver-, nervous system-specific and NHR groups. The computationally inferred promoter elements and potential TF binding motifs provide a rich resource for targeted experimental validation of TF binding and signaling studies that aim at the regulation of mouse, rat or human AMPcgs.
Collapse
|
31
|
Bajic VB, Tan SL, Christoffels A, Schönbach C, Lipovich L, Yang L, Hofmann O, Kruger A, Hide W, Kai C, Kawai J, Hume DA, Carninci P, Hayashizaki Y. Mice and men: their promoter properties. PLoS Genet 2006; 2:e54. [PMID: 16683032 PMCID: PMC1449896 DOI: 10.1371/journal.pgen.0020054] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2005] [Accepted: 02/27/2006] [Indexed: 12/28/2022] Open
Abstract
Using the two largest collections of Mus musculus and Homo sapiens transcription start sites (TSSs) determined based on CAGE tags, ditags, full-length cDNAs, and other transcript data, we describe the compositional landscape surrounding TSSs with the aim of gaining better insight into the properties of mammalian promoters. We classified TSSs into four types based on compositional properties of regions immediately surrounding them. These properties highlighted distinctive features in the extended core promoters that helped us delineate boundaries of the transcription initiation domain space for both species. The TSS types were analyzed for associations with initiating dinucleotides, CpG islands, TATA boxes, and an extensive collection of statistically significant cis-elements in mouse and human. We found that different TSS types show preferences for different sets of initiating dinucleotides and cis-elements. Through Gene Ontology and eVOC categories and tissue expression libraries we linked TSS characteristics to expression. Moreover, we show a link of TSS characteristics to very specific genomic organization in an example of immune-response-related genes (GO:0006955). Our results shed light on the global properties of the two transcriptomes not revealed before and therefore provide the framework for better understanding of the transcriptional mechanisms in the two species, as well as a framework for development of new and more efficient promoter- and gene-finding tools.
Collapse
|
32
|
Kurochkin IV, Nagashima T, Konagaya A, Schönbach C. Sequence-based discovery of the human and rodent peroxisomal proteome. ACTA ACUST UNITED AC 2005; 4:93-104. [PMID: 16128611 DOI: 10.2165/00822942-200504020-00003] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
BACKGROUND Peroxisomes are metabolic organelles present in virtually all eukaryotic cells. They contain enzymes involved in hydrogen peroxide-based respiration and lipid metabolism. At present, only a small number of peroxisomal enzymes that are associated with oxidative stress response and metabolic disorders have been characterised biochemically. Therefore, we devised a sequence-based, multistep knowledge discovery strategy to identify potential novel peroxisomal protein candidates in small rodent model organisms and human. METHODS Screening of 130,629 putative translations of GenBank rodent and primate mRNA sequences was limited to the classical type-1 peroxisomal targeting signal [SA]-K-L. This motif is over-represented among peroxisomal proteins and has a high targeting efficiency. Subsequent steps of identifying co-occurring motifs, secondary structure properties, orthologues and variants, in combination with literature searching and visual inspection by domain experts, aimed at reduction of both false positive and negative validation targets. RESULTS Our method yielded 117 known peroxisome-targeted proteins and 29 novel candidate proteins. Of special interest were the mouse C530046K17Rik and 1300019N10Rik protein sequences that contain domains associated with enzymatic functions. C530046K17Rik showed no similarity to any known sequence of the animal kingdom, but weak similarity to the possible Leishmania quinone oxidoreductase and a putative cyanobacterium nicotinamide adenine dinucleotide phosphate (NADP)-dependent oxidoreductase. 1300019N10Rik contains two protease-related domains, glutamyl endopeptidase I and trypsin-like serine and cysteine proteases, which may have unique specificities to achieve efficient breakdown of proteins in the peroxisomes. CONCLUSION One mouse C57BL/6J strain-specific isocitrate dehydrogenase 1 isoform might be suitable to investigate potential phenotypes associated with the deficit of the intraperoxisomal reduced form of NADP (NADPH) and 2-oxoglutarate. Our biological knowledge discovery strategy enabled not only the identification of peroxisomal enzymes already described in the literature, but also the prediction of several novel proteins with possible roles in peroxisomal biochemistry and metabolism that are currently under experimental validation.
Collapse
|
33
|
Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, Kodzius R, Shimokawa K, Bajic VB, Brenner SE, Batalov S, Forrest ARR, Zavolan M, Davis MJ, Wilming LG, Aidinis V, Allen JE, Ambesi-Impiombato A, Apweiler R, Aturaliya RN, Bailey TL, Bansal M, Baxter L, Beisel KW, Bersano T, Bono H, Chalk AM, Chiu KP, Choudhary V, Christoffels A, Clutterbuck DR, Crowe ML, Dalla E, Dalrymple BP, de Bono B, Della Gatta G, di Bernardo D, Down T, Engstrom P, Fagiolini M, Faulkner G, Fletcher CF, Fukushima T, Furuno M, Futaki S, Gariboldi M, Georgii-Hemming P, Gingeras TR, Gojobori T, Green RE, Gustincich S, Harbers M, Hayashi Y, Hensch TK, Hirokawa N, Hill D, Huminiecki L, Iacono M, Ikeo K, Iwama A, Ishikawa T, Jakt M, Kanapin A, Katoh M, Kawasawa Y, Kelso J, Kitamura H, Kitano H, Kollias G, Krishnan SPT, Kruger A, Kummerfeld SK, Kurochkin IV, Lareau LF, Lazarevic D, Lipovich L, Liu J, Liuni S, McWilliam S, Madan Babu M, Madera M, Marchionni L, Matsuda H, Matsuzawa S, Miki H, Mignone F, Miyake S, Morris K, Mottagui-Tabar S, Mulder N, Nakano N, Nakauchi H, Ng P, Nilsson R, Nishiguchi S, Nishikawa S, Nori F, Ohara O, Okazaki Y, Orlando V, Pang KC, Pavan WJ, Pavesi G, Pesole G, Petrovsky N, Piazza S, Reed J, Reid JF, Ring BZ, Ringwald M, Rost B, Ruan Y, Salzberg SL, Sandelin A, Schneider C, Schönbach C, Sekiguchi K, Semple CAM, Seno S, Sessa L, Sheng Y, Shibata Y, Shimada H, Shimada K, Silva D, Sinclair B, Sperling S, Stupka E, Sugiura K, Sultana R, Takenaka Y, Taki K, Tammoja K, Tan SL, Tang S, Taylor MS, Tegner J, Teichmann SA, Ueda HR, van Nimwegen E, Verardo R, Wei CL, Yagi K, Yamanishi H, Zabarovsky E, Zhu S, Zimmer A, Hide W, Bult C, Grimmond SM, Teasdale RD, Liu ET, Brusic V, Quackenbush J, Wahlestedt C, Mattick JS, Hume DA, Kai C, Sasaki D, Tomaru Y, Fukuda S, Kanamori-Katayama M, Suzuki M, Aoki J, Arakawa T, Iida J, Imamura K, Itoh M, Kato T, Kawaji H, Kawagashira N, Kawashima T, Kojima M, Kondo S, Konno H, Nakano K, Ninomiya N, Nishio T, Okada M, Plessy C, Shibata K, Shiraki T, Suzuki S, Tagami M, Waki K, Watahiki A, Okamura-Oho Y, Suzuki H, Kawai J, Hayashizaki Y. The transcriptional landscape of the mammalian genome. Science 2005; 309:1559-63. [PMID: 16141072 DOI: 10.1126/science.1112014] [Citation(s) in RCA: 2607] [Impact Index Per Article: 137.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
This study describes comprehensive polling of transcription start and termination sites and analysis of previously unidentified full-length complementary DNAs derived from the mouse genome. We identify the 5' and 3' boundaries of 181,047 transcripts with extensive variation in transcripts arising from alternative promoter usage, splicing, and polyadenylation. There are 16,247 new mouse protein-coding transcripts, including 5154 encoding previously unidentified proteins. Genomic mapping of the transcriptome reveals transcriptional forests, with overlapping transcription on both strands, separated by deserts in which few transcripts are observed. The data provide a comprehensive platform for the comparative analysis of mammalian transcriptional regulation in differentiation and development.
Collapse
|
34
|
Schönbach C, Nagashima T, Konagaya A. Textmining in support of knowledge discovery for vaccine development. Methods 2005; 34:488-95. [PMID: 15542375 DOI: 10.1016/j.ymeth.2004.06.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/21/2004] [Indexed: 11/25/2022] Open
Abstract
Complete genome data of infectious microorganisms permit systematic computational sequence-based predictions and experimental testing of candidate vaccine epitopes. Both, predictions and the interpretation of experiments rely on existing information in the literature which is mostly manually extracted and curated. The growing amount of data and literature information has created a major bottleneck for the interpretation of results and maintenance of curated databases. The lack of suitable free-text information extraction, processing, and reporting tools prompted us to develop a knowledge discovery support system that enhances the understanding of immune response and vaccine development. The current prototype system, Gene expression/epitpopes/protein interaction (GEpi), focuses on molecular functions of HIV-infected T-cells and HIV epitope information, using textmining, and interrelation of biomolecular data from domain-specific databases with MEDLINE abstract-inferred information. Results showed that extraction and processing of molecular interaction, disease associations, and gene ontology-derived functional information generate intuitive knowledge reports that aid the interpretation of host-pathogen interaction. In contrast, epitope (word and sequence) information in MEDLINE abstracts is surprisingly sparse and often lacks necessary context information, such as HLA-restriction. Since the majority of epitope information is found in tables, figures, and legends of full-text articles, its extraction may not require sophisticated natural language processing techniques. Support of vaccine development through textmining requires therefore the timely development of domain-specific extraction rules for full-text articles, and a knowledge model for epitope-related information.
Collapse
|
35
|
Schönbach C, Koh JLY, Flower DR, Brusic V. An Update on the Functional Molecular Immunology (FIMM) Database. ACTA ACUST UNITED AC 2005; 4:25-31. [PMID: 16000010 DOI: 10.2165/00822942-200504010-00003] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Data on the major histocompatibility complex, T-cell epitopes, B-cell epitopes, antigens and diseases are heterogeneous and scattered among different databases and the literature. Since it has become increasingly difficult to obtain an integrated view of functional immune response components, we have developed and updated over several years the Functional molecular IMMunology (FIMM) database (http:// research.i2r.a-star.edu.sg/fimm/). FIMM contains integrated expert-curated data on protein antigens, and on human immunological receptors that recognise and bind them in healthy or disease states. Interfaces with multiple, intuitive query options and query reports provide immunologists with prioritised information that aids data interpretation, vaccine target discovery and immune disease research.
Collapse
|
36
|
Silva DG, Schönbach C, Brusic V, Socha LA, Nagashima T, Petrovsky N. Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system. BMC Genomics 2004; 5:28. [PMID: 15115540 PMCID: PMC420239 DOI: 10.1186/1471-2164-5-28] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2003] [Accepted: 04/29/2004] [Indexed: 11/24/2022] Open
Abstract
Background A major goal in the post-genomic era is to identify and characterise disease susceptibility genes and to apply this knowledge to disease prevention and treatment. Rodents and humans have remarkably similar genomes and share closely related biochemical, physiological and pathological pathways. In this work we utilised the latest information on the mouse transcriptome as revealed by the RIKEN FANTOM2 project to identify novel human disease-related candidate genes. We define a new term "patholog" to mean a homolog of a human disease-related gene encoding a product (transcript, anti-sense or protein) potentially relevant to disease. Rather than just focus on Mendelian inheritance, we applied the analysis to all potential pathologs regardless of their inheritance pattern. Results Bioinformatic analysis and human curation of 60,770 RIKEN full-length mouse cDNA clones produced 2,578 sequences that showed similarity (70–85% identity) to known human-disease genes. Using a newly developed biological information extraction and annotation tool (FACTS) in parallel with human expert analysis of 17,051 MEDLINE scientific abstracts we identified 182 novel potential pathologs. Of these, 36 were identified by computational tools only, 49 by human expert analysis only and 97 by both methods. These pathologs were related to neoplastic (53%), hereditary (24%), immunological (5%), cardio-vascular (4%), or other (14%), disorders. Conclusions Large scale genome projects continue to produce a vast amount of data with potential application to the study of human disease. For this potential to be realised we need intelligent strategies for data categorisation and the ability to link sequence data with relevant literature. This paper demonstrates the power of combining human expert annotation with FACTS, a newly developed bioinformatics tool, to identify novel pathologs from within large-scale mouse transcript datasets.
Collapse
|
37
|
Petrovsky N, Schönbach C, Brusic V. Bioinformatic strategies for better understanding of immune function. In Silico Biol 2004; 3:411-6. [PMID: 12954084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/04/2023]
Abstract
Novartis Foundation sponsored a Symposium which brought together a group of experimental immunologists, theoretical immunologists, and bioinformaticians to discuss the new field of immunoinformatics. The discussion focused on immunological databases, antigen processing and presentation, immunogenomics, host-pathogen interactions, and mathematical modelling of the immune system. A main conclusion of the meeting is the critical role played by immunoinformatics in current immunology research. In particular, immunoinformatics provides a foundation for the emerging fields of systems immunology and immunogenomics.
Collapse
|
38
|
Nagashima T, Matsuda H, Silva DG, Petrovsky N, Konagaya A, Schönbach C, Kasukawa T, Arakawa T, Carninci P, Kawai J, Hayashizaki Y. FREP: a database of functional repeats in mouse cDNAs. Nucleic Acids Res 2004; 32:D471-5. [PMID: 14681460 PMCID: PMC308857 DOI: 10.1093/nar/gkh123] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The FREP database (http://facts.gsc.riken.go.jp/FREP/) contains 31 396 RepeatMasker-identified non-redundant variant repeat sequences derived from 16,527 mouse cDNAs with protein-coding potential. The repeats were computationally associated with potential effects on transcriptional variation, translation, protein function or involvement in disease to identify Functional REPeats (FREPs). FREPs are defined by the (i) occurrence of exon-exon boundaries in repeats, (ii) presence of polyadenylation sites in 3'UTR-located repeats, (iii) effect on translation, (iv) position in the protein- coding region or protein domains or (v) conditional association with disease MeSH terms. Currently the database contains 9261 (29.5%) inferred FREPs derived from 6861 (41.5%) mouse cDNAs. Integrated evidence of the functional assignments and dynamically generated sequence similarity search results support the exploration and annotation of functional, ancestral or taxon-specific repeats. Keyword and pre-selected feature searches (e.g. coding sequence-repeat or splice site-repeat relations) support intuitive database querying as well as the retrieval of repeat sequences. Integrated sequence search and alignment tools allow the analysis of known or identification of new functional repeat candidates. FREP is a unique resource for illuminating the role of transposons and repetitive sequences in shaping the coding part of the mouse transcriptome and for selecting the appropriate experimental model to study diseases with suspected repeat etiology contributions.
Collapse
|
39
|
Schönbach C. From masking repeats to identifying functional repeats in the mouse transcriptome. Brief Bioinform 2004; 5:107-17. [PMID: 15260892 DOI: 10.1093/bib/5.2.107] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The back-to-back release of the mouse genome and the functionally annotated RIKEN mouse full-length cDNA collection was an important milestone in mammalian genomics. Yet much of the data remain to be explored in terms of biological effects and mechanisms. For example, interspersed repeats account for 39 per cent of the mouse genome sequence and 11 per cent of representative transcripts. A considerable number of transposable repeat elements are still active and propagating in mouse compared with human. While existing repeat databases and tools assist the classification of repeats or identification of new repeats, there is little bioinformatic support towards exploring the extent and role of repeats in transcriptional variation, modulation of protein function, or gene regulatory events. Since the mouse is used as a model organism to study human genes and their disease associations, this review focuses on information extraction and collation that captures the functional context of repeats in mouse transcripts to facilitate the biological interpretation and extrapolation of findings to the human.
Collapse
|
40
|
Nagashima T, Silva DG, Petrovsky N, Socha LA, Suzuki H, Saito R, Kasukawa T, Kurochkin IV, Konagaya A, Schönbach C. Inferring higher functional information for RIKEN mouse full-length cDNA clones with FACTS. Genome Res 2003; 13:1520-33. [PMID: 12819151 PMCID: PMC403704 DOI: 10.1101/gr.1019903] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2002] [Accepted: 03/04/2003] [Indexed: 01/22/2023]
Abstract
FACTS (Functional Association/Annotation of cDNA Clones from Text/Sequence Sources) is a semiautomated knowledge discovery and annotation system that integrates molecular function information derived from sequence analysis results (sequence inferred) with functional information extracted from text. Text-inferred information was extracted from keyword-based retrievals of MEDLINE abstracts and by matching of gene or protein names to OMIM, BIND, and DIP database entries. Using FACTS, we found that 47.5% of the 60,770 RIKEN mouse cDNA FANTOM2 clone annotations were informative for text searches. MEDLINE queries yielded molecular interaction-containing sentences for 23.1% of the clones. When disease MeSH and GO terms were matched with retrieved abstracts, 22.7% of clones were associated with potential diseases, and 32.5% with GO identifiers. A significant number (23.5%) of disease MeSH-associated clones were also found to have a hereditary disease association (OMIM Morbidmap). Inferred neoplastic and nervous system disease represented 49.6% and 36.0% of disease MeSH-associated clones, respectively. A comparison of sequence-based GO assignments with informative text-based GO assignments revealed that for 78.2% of clones, identical GO assignments were provided for that clone by either method, whereas for 21.8% of clones, the assignments differed. In contrast, for OMIM assignments, only 28.5% of clones had identical sequence-based and text-based OMIM assignments. Sequence, sentence, and term-based functional associations are included in the FACTS database (http://facts.gsc.riken.go.jp/), which permits results to be annotated and explored through web-accessible keyword and sequence search interfaces. The FACTS database will be a critical tool for investigating the functional complexity of the mouse transcriptome, cDNA-inferred interactome (molecular interactions), and pathome (pathologies).
Collapse
|
41
|
Brusic V, Pillai RS, Silva DG, Petrovsky N, Schönbach C. Cytokine-related genes identified from the RIKEN full-length mouse cDNA data set. Genome Res 2003; 13:1307-17. [PMID: 12819128 PMCID: PMC403723 DOI: 10.1101/gr.1016503] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
To identify novel cytokine-related genes, we searched the set of 60,770 annotated RIKEN mouse cDNA clones (FANTOM2 clones), using keywords such as cytokine itself or cytokine names (such as interferon, interleukin, epidermal growth factor, fibroblast growth factor, and transforming growth factor). This search produced 108 known cytokines and cytokine-related products such as cytokine receptors, cytokine-associated genes, or their products (enhancers, accessory proteins, cytokine-induced genes). We found 15 clusters of FANTOM2 clones that are candidates for novel cytokine-related genes. These encoded products with strong sequence similarity to guanylate-binding protein (GBP-5), interleukin-1 receptor-associated kinase 2 (IRAK-2), interleukin 20 receptor alpha isoform 3, a member of the interferon-inducible proteins of the Ifi 200 cluster, four members of the membrane-associated family 1-8 of interferon-inducible proteins, one p27-like protein, and a hypothetical protein containing a Toll/Interleukin receptor domain. All four clones representing novel candidates of gene products from the family contain a novel highly conserved cross-species domain. Clones similar to growth factor-related products included transforming growth factor beta-inducible early growth response protein 2 (TIEG-2), TGFbeta-induced factor 2, integrin beta-like 1, latent TGF-binding protein 4S, and FGF receptor 4B. We performed a detailed sequence analysis of the candidate novel genes to elucidate their likely functional properties.
Collapse
|
42
|
Silva DG, Schönbach C, Brusic V, Socha LA, Nagashima T, Petrovsky N. Identification of Novel “Pathologs” (Human Disease-Related Gene Candidates) From the RIKEN Full-Length Mouse cDNA Data Set. Genome Res 2003. [DOI: 10.1101/gr.1461303] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
43
|
Kanapin A, Batalov S, Davis MJ, Gough J, Grimmond S, Kawaji H, Magrane M, Matsuda H, Schönbach C, Teasdale RD, Yuan Z. Mouse proteome analysis. Genome Res 2003; 13:1335-44. [PMID: 12819131 PMCID: PMC403658 DOI: 10.1101/gr.978703] [Citation(s) in RCA: 77] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2002] [Accepted: 03/05/2003] [Indexed: 11/25/2022]
Abstract
A general overview of the protein sequence set for the mouse transcriptome produced during the FANTOM2 sequencing project is presented here. We applied different algorithms to characterize protein sequences derived from a nonredundant representative protein set (RPS) and a variant protein set (VPS) of the mouse transcriptome. The functional characterization and assignment of Gene Ontology terms was done by analysis of the proteome using InterPro. The Superfamily database analyses gave a detailed structural classification according to SCOP and provide additional evidence for the functional characterization of the proteome data. The MDS database analysis revealed new domains which are not presented in existing protein domain databases. Thus the transcriptome gives us a unique source of data for the detection of new functional groups. The data obtained for the RPS and VPS sets facilitated the comparison of different patterns of protein expression. A comparison of other existing mouse and human protein sequence sets (e.g., the International Protein Index) demonstrates the common patterns in mammalian proteomes. The analysis of the membrane organization within the transcriptome of multiple eukaryotes provides valuable statistics about the distribution of secretory and transmembrane proteins
Collapse
|
44
|
Suzuki H, Saito R, Kanamori M, Kai C, Schönbach C, Nagashima T, Hosaka J, Hayashizaki Y. The mammalian protein-protein interaction database and its viewing system that is linked to the main FANTOM2 viewer. Genome Res 2003; 13:1534-41. [PMID: 12819152 PMCID: PMC403706 DOI: 10.1101/gr.956303] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Here, we describe the development of a mammalian protein-protein interaction (PPI) database and of a PPI Viewer application to display protein interaction networks (http://fantom21.gsc.riken.go.jp/PPI/). In the database, we stored the mammalian PPIs identified through our PPI assays (internal PPIs), as well as those we extracted and processed (external PPIs) from publicly available data sources, the DIP and BIND databases and MEDLINE abstracts by using FACTS, a new functional inference and curation system. We integrated the internal and external PPIs into the PPI database, which is linked to the main FANTOM2 viewer. In addition, we incorporated into the PPI Viewer information regarding the luciferase reporter activity of internal PPIs and the data confidence of external PPIs; these data enable visualization and evaluation of the reliability of each interaction. Using the described system, we successfully identified several interactions of biological significance. Therefore, the PPI Viewer is a useful tool for exploring FANTOM2 clone-related protein interactions and their potential effects on signaling and cellular communication.
Collapse
|
45
|
Schönbach C. From immunogenetics to immunomics: functional prospecting of genes and transcripts. NOVARTIS FOUNDATION SYMPOSIUM 2003; 254:177-88; discussion 189-92, 216-22, 250-2. [PMID: 14712938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 04/27/2023]
Abstract
Human and mouse genome and transcriptome projects have expanded the field of 'immunogenetics' beyond the traditional study of the genetics and evolution of MHC, TCR and Ig loci into the new interdisciplinary area of 'immunomics'. Immunomics is the study of the molecular functions associated with all immune-related coding and non-coding mRNA transcripts. To unravel the function, regulation and diversity of the immunome requires that we identify and correctly categorize all immune-related transcripts. The importance of intercalated genes, antisense transcripts and non-coding RNAs and their potential role in regulation of immune development and function are only just starting to be appreciated. To better understand immune function and regulation, transcriptome projects (e.g. Functional Annotation of the Mouse, FANTOM), that focus on sequencing full-length transcripts from multiple tissue sources, ideally should include specific immune cells (e.g. T cell, B cells, macrophages, dendritic cells) at various states of development, in activated and unactivated states and in different disease contexts. Progress in deciphering immune regulatory networks will require the cooperative efforts of immunologists, immunogeneticists, molecular biologists and bioinformaticians. Although primary sequence analysis remains useful for annotation of new transcripts it is less useful for identifying novel functions of known transcripts in a new context (protein interaction network or pathway). The most efficient approach to mine useful information from the vast a priori knowledge contained in biological databases and the scientific literature, is to use a combination of computational and expert-driven knowledge discovery strategies. This paper will illustrate the challenges posed in attempts to functionally infer transcriptional regulation and interaction of immune-related genes from text and sequence-based data sources.
Collapse
|
46
|
Okazaki Y, Furuno M, Kasukawa T, Adachi J, Bono H, Kondo S, Nikaido I, Osato N, Saito R, Suzuki H, Yamanaka I, Kiyosawa H, Yagi K, Tomaru Y, Hasegawa Y, Nogami A, Schönbach C, Gojobori T, Baldarelli R, Hill DP, Bult C, Hume DA, Quackenbush J, Schriml LM, Kanapin A, Matsuda H, Batalov S, Beisel KW, Blake JA, Bradt D, Brusic V, Chothia C, Corbani LE, Cousins S, Dalla E, Dragani TA, Fletcher CF, Forrest A, Frazer KS, Gaasterland T, Gariboldi M, Gissi C, Godzik A, Gough J, Grimmond S, Gustincich S, Hirokawa N, Jackson IJ, Jarvis ED, Kanai A, Kawaji H, Kawasawa Y, Kedzierski RM, King BL, Konagaya A, Kurochkin IV, Lee Y, Lenhard B, Lyons PA, Maglott DR, Maltais L, Marchionni L, McKenzie L, Miki H, Nagashima T, Numata K, Okido T, Pavan WJ, Pertea G, Pesole G, Petrovsky N, Pillai R, Pontius JU, Qi D, Ramachandran S, Ravasi T, Reed JC, Reed DJ, Reid J, Ring BZ, Ringwald M, Sandelin A, Schneider C, Semple CAM, Setou M, Shimada K, Sultana R, Takenaka Y, Taylor MS, Teasdale RD, Tomita M, Verardo R, Wagner L, Wahlestedt C, Wang Y, Watanabe Y, Wells C, Wilming LG, Wynshaw-Boris A, Yanagisawa M, Yang I, Yang L, Yuan Z, Zavolan M, Zhu Y, Zimmer A, Carninci P, Hayatsu N, Hirozane-Kishikawa T, Konno H, Nakamura M, Sakazume N, Sato K, Shiraki T, Waki K, Kawai J, Aizawa K, Arakawa T, Fukuda S, Hara A, Hashizume W, Imotani K, Ishii Y, Itoh M, Kagawa I, Miyazaki A, Sakai K, Sasaki D, Shibata K, Shinagawa A, Yasunishi A, Yoshino M, Waterston R, Lander ES, Rogers J, Birney E, Hayashizaki Y. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 2002; 420:563-73. [PMID: 12466851 DOI: 10.1038/nature01266] [Citation(s) in RCA: 1226] [Impact Index Per Article: 55.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2002] [Accepted: 10/28/2002] [Indexed: 01/10/2023]
Abstract
Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 'transcriptional units', contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.
Collapse
MESH Headings
- Alternative Splicing/genetics
- Amino Acid Motifs
- Animals
- Chromosomes, Mammalian/genetics
- Cloning, Molecular
- DNA, Complementary/genetics
- Databases, Genetic
- Expressed Sequence Tags
- Genes/genetics
- Genomics/methods
- Humans
- Membrane Proteins/genetics
- Mice/genetics
- Physical Chromosome Mapping
- Protein Structure, Tertiary
- Proteome/chemistry
- Proteome/genetics
- RNA, Antisense/genetics
- RNA, Messenger/analysis
- RNA, Messenger/genetics
- RNA, Untranslated/analysis
- RNA, Untranslated/genetics
- Transcription Initiation Site
- Transcription, Genetic/genetics
Collapse
|
47
|
Abstract
Bioinformatics-driven T-cell epitope-identification methods can enhance vaccine target selection significantly. We evaluated three unrelated computational methods to screen Pol, Gag and Env sequences extracted from the Los Alamos HIV database for HLA-A*0201 and HLA-B*3501 T-cell epitope candidates. The hidden Markov model predicted 389 HLA-B*3501-restricted candidates from 374 HIV-1 and 97 HIV-2 sequences. The artificial neural network (ANN) model, and Bioinformatics and Molecular Analysis Section (BIMAS) quantitative matrix predictions for A*0201 yielded 1122 HIV-1 and 548 HIV-2 candidates. The overall sequence coverage of the predicted A*0201 T-cell epitopes was 2.7% (HIV-1)and 3.0% (HIV-2). HLA-B*3501-predicted epitopes covered 0.9% (HIV-1) and 1.4% (HIV-2) of the total sequence. Comparison of 890 ANN- and 397 BIMAS-derived HIV-1 A*0201- restricted epitope candidates showed that only 13-19% of the predicted and 26% of the experimentally confirmed T-cell epitopes were captured by both methods. Extrapolating these results, we estimated that at least 247 predicted HIV-1 epitopes are yet to be discovered as active A*0201-restricted T-cell epitopes. Adequate comparison and combined usage of various predictive bioinformatics methods, rather than uncritical use of any single prediction method, will enable cost-effective and efficient T-cell epitope screening.
Collapse
|
48
|
Yu K, Petrovsky N, Schönbach C, Koh JLY, Brusic V. Methods for Prediction of Peptide Binding to MHC Molecules: A Comparative Study. Mol Med 2002. [DOI: 10.1007/bf03402006] [Citation(s) in RCA: 95] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
49
|
Kawaji H, Schönbach C, Matsuo Y, Kawai J, Okazaki Y, Hayashizaki Y, Matsuda H. Exploration of novel motifs derived from mouse cDNA sequences. Genome Res 2002; 12:367-78. [PMID: 11875024 PMCID: PMC155289 DOI: 10.1101/gr.193702] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
We performed a systematic maximum density subgraph (MDS) detection of conserved sequence regions to discover new, biologically relevant motifs from a set of 21,050 conceptually translated mouse cDNA (FANTOM1) sequences. A total of 3202 candidate sequences, which shared similar regions over >20 amino acid residues, were screened against known conserved regions listed in Pfam, ProDom, and InterPro. The filtering procedure resulted in 139 FANTOM1 sequences belonging to 49 new motif candidates. Using annotations and multiple sequence alignment information, we removed by visual inspection 42 candidates whose members were found to be false positives because of sequence redundancy, alternative splicing, low complexity, transcribed retroviral repeat elements contained in the region of the predicted open reading frame, and reports in the literature. The remaining seven motifs have been expanded by hidden Markov model (HMM) profile searches of SWISS-PROT/TrEMBL from 28 FANTOM1 sequences to 164 members and analyzed in detail on sequence and structure level to elucidate the possible functions of motifs and members. The novel and conserved motif MDS00105 is specific for the mammalian inhibitor of growth (ING) family. Three submotifs MDS00105.1-3 are specific for ING1/ING1L, ING1-homolog, and ING3 subfamilies. The motif MDS00105 together with a PHD finger domain constitutes a module for ING proteins. Structural motif MDS00113 represents a leucine zipper-like motif. Conserved motif MDS00145 is a novel 1-acyl-SN-glycerol-3-phosphate acyltransferase (AGPAT) submotif containing a transmembrane domain that distinguishes AGPAT3 and AGPAT4 from all other acyltransferase domain-containing proteins. Functional motif MDS00148 overlaps with the kazal-type serine protease inhibitor domain but has been detected only in an extracellular loop region of solute carrier 21 (SLC21) (organic anion transporters) family members, which may regulate the specificity of anion uptake. Our motif discovery not only aided in the functional characterization of new mouse orthologs for potential drug targets but also allowed us to predict that at least 16 other new motifs are waiting to be discovered from the current SWISS-PROT/TrEMBL database.
Collapse
|
50
|
Brusic V, Bucci K, Schönbach C, Petrovsky N, Zeleznikow J, Kazura JW. Efficient discovery of immune response targets by cyclical refinement of QSAR models of peptide binding. J Mol Graph Model 2002; 19:405-11, 467. [PMID: 11552688 DOI: 10.1016/s1093-3263(00)00099-1] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Peptides that induce and recall T-cell responses are called T-cell epitopes. T-cell epitopes may be useful in a subunit vaccine against malaria. Computer models that simulate peptide binding to MHC are useful for selecting candidate T-cell epitopes since they minimize the number of experiments required for their identification. We applied a combination of computational and immunological strategies to select candidate T-cell epitopes. A total of 86 experimental binding assays were performed in three rounds of identification of HLA-A11 binding peptides from the six preerythrocytic malaria antigens. Thirty-six peptides were experimentally confirmed as binders. We show that the cyclical refinement of the ANN models results in a significant improvement of the efficiency of identifying potential T-cell epitopes.
Collapse
|