1
|
Xia X. Beyond Trees: Regulons and Regulatory Motif Characterization. Genes (Basel) 2020; 11:genes11090995. [PMID: 32854400 PMCID: PMC7564462 DOI: 10.3390/genes11090995] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Revised: 08/13/2020] [Accepted: 08/24/2020] [Indexed: 12/14/2022] Open
Abstract
Trees and their seeds regulate their germination, growth, and reproduction in response to environmental stimuli. These stimuli, through signal transduction, trigger transcription factors that alter the expression of various genes leading to the unfolding of the genetic program. A regulon is conceptually defined as a set of target genes regulated by a transcription factor by physically binding to regulatory motifs to accomplish a specific biological function, such as the CO-FT regulon for flowering timing and fall growth cessation in trees. Only with a clear characterization of regulatory motifs, can candidate target genes be experimentally validated, but motif characterization represents the weakest feature of regulon research, especially in tree genetics. I review here relevant experimental and bioinformatics approaches in characterizing transcription factors and their binding sites, outline problems in tree regulon research, and demonstrate how transcription factor databases can be effectively used to aid the characterization of tree regulons.
Collapse
Affiliation(s)
- Xuhua Xia
- Department of Biology, University of Ottawa, Ottawa, ON K1N 6N5, Canada;
- Ottawa Institute of Systems Biology, Ottawa, ON K1H 8M5, Canada
| |
Collapse
|
2
|
Abstract
Codon usage depends on mutation bias, tRNA-mediated selection, and the need for high efficiency and accuracy in translation. One codon in a synonymous codon family is often strongly over-used, especially in highly expressed genes, which often leads to a high dN/dS ratio because dS is very small. Many different codon usage indices have been proposed to measure codon usage and codon adaptation. Sense codon could be misread by release factors and stop codons misread by tRNAs, which also contribute to codon usage in rare cases. This chapter outlines the conceptual framework on codon evolution, illustrates codon-specific and gene-specific codon usage indices, and presents their applications. A new index for codon adaptation that accounts for background mutation bias (Index of Translation Elongation) is presented and contrasted with codon adaptation index (CAI) which does not consider background mutation bias. They are used to re-analyze data from a recent paper claiming that translation elongation efficiency matters little in protein production. The reanalysis disproves the claim.
Collapse
|
3
|
MYC-dependent recruitment of RUNX1 and GATA2 on the SET oncogene promoter enhances PP2A inactivation in acute myeloid leukemia. Oncotarget 2016; 8:53989-54003. [PMID: 28903318 PMCID: PMC5589557 DOI: 10.18632/oncotarget.9840] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Accepted: 05/22/2016] [Indexed: 01/15/2023] Open
Abstract
The SET (I2PP2A) oncoprotein is a potent inhibitor of protein phosphatase 2A (PP2A) that regulates many cell processes and important signaling pathways. Despite the importance of SET overexpression and its prognostic impact in both hematologic and solid tumors, little is known about the mechanisms involved in its transcriptional regulation. In this report, we define the minimal promoter region of the SET gene, and identify a novel multi-protein transcription complex, composed of MYC, SP1, RUNX1 and GATA2, which activates SET expression in AML. The role of MYC is crucial, since it increases the expression of the other three transcription factors of the complex, and supports their recruitment to the promoter of SET. These data shed light on a new regulatory mechanism in cancer, in addition to the already known PP2A-MYC and SET-PP2A. Besides, we show that there is a significant positive correlation between the expression of SET and MYC, RUNX1, and GATA2 in AML patients, which further endorses our results. Altogether, this study opens new directions for understanding the mechanisms that lead to SET overexpression, and demonstrates that MYC, SP1, RUNX1 and GATA2 are key transcriptional regulators of SET expression in AML.
Collapse
|
4
|
Xia X. Position weight matrix, gibbs sampler, and the associated significance tests in motif characterization and prediction. SCIENTIFICA 2012; 2012:917540. [PMID: 24278755 PMCID: PMC3820676 DOI: 10.6064/2012/917540] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2012] [Accepted: 10/11/2012] [Indexed: 05/31/2023]
Abstract
Position weight matrix (PWM) is not only one of the most widely used bioinformatic methods, but also a key component in more advanced computational algorithms (e.g., Gibbs sampler) for characterizing and discovering motifs in nucleotide or amino acid sequences. However, few generally applicable statistical tests are available for evaluating the significance of site patterns, PWM, and PWM scores (PWMS) of putative motifs. Statistical significance tests of the PWM output, that is, site-specific frequencies, PWM itself, and PWMS, are in disparate sources and have never been collected in a single paper, with the consequence that many implementations of PWM do not include any significance test. Here I review PWM-based methods used in motif characterization and prediction (including a detailed illustration of the Gibbs sampler for de novo motif discovery), present statistical and probabilistic rationales behind statistical significance tests relevant to PWM, and illustrate their application with real data. The multiple comparison problem associated with the test of site-specific frequencies is best handled by false discovery rate methods. The test of PWM, due to the use of pseudocounts, is best done by resampling methods. The test of individual PWMS for each sequence segment should be based on the extreme value distribution.
Collapse
Affiliation(s)
- Xuhua Xia
- Department of Biology, University of Ottawa, 30 Marie Curie, Ottawa, ON, Canada K1N 6N5
| |
Collapse
|
5
|
Claeys M, Storms V, Sun H, Michoel T, Marchal K. MotifSuite: workflow for probabilistic motif detection and assessment. Bioinformatics 2012; 28:1931-2. [DOI: 10.1093/bioinformatics/bts293] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
6
|
Sun H, Guns T, Fierro AC, Thorrez L, Nijssen S, Marchal K. Unveiling combinatorial regulation through the combination of ChIP information and in silico cis-regulatory module detection. Nucleic Acids Res 2012; 40:e90. [PMID: 22422841 PMCID: PMC3384348 DOI: 10.1093/nar/gks237] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Computationally retrieving biologically relevant cis-regulatory modules (CRMs) is not straightforward. Because of the large number of candidates and the imperfection of the screening methods, many spurious CRMs are detected that are as high scoring as the biologically true ones. Using ChIP-information allows not only to reduce the regions in which the binding sites of the assayed transcription factor (TF) should be located, but also allows restricting the valid CRMs to those that contain the assayed TF (here referred to as applying CRM detection in a query-based mode). In this study, we show that exploiting ChIP-information in a query-based way makes in silico CRM detection a much more feasible endeavor. To be able to handle the large datasets, the query-based setting and other specificities proper to CRM detection on ChIP-Seq based data, we developed a novel powerful CRM detection method 'CPModule'. By applying it on a well-studied ChIP-Seq data set involved in self-renewal of mouse embryonic stem cells, we demonstrate how our tool can recover combinatorial regulation of five known TFs that are key in the self-renewal of mouse embryonic stem cells. Additionally, we make a number of new predictions on combinatorial regulation of these five key TFs with other TFs documented in TRANSFAC.
Collapse
Affiliation(s)
- Hong Sun
- Department of Microbial and Molecular Systems, Katholieke Universiteit Leuven, Leuven, Belgium
| | | | | | | | | | | |
Collapse
|
7
|
Aparicio O, Carnero E, Abad X, Razquin N, Guruceaga E, Segura V, Fortes P. Adenovirus VA RNA-derived miRNAs target cellular genes involved in cell growth, gene expression and DNA repair. Nucleic Acids Res 2009; 38:750-63. [PMID: 19933264 PMCID: PMC2817457 DOI: 10.1093/nar/gkp1028] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Adenovirus virus-associated (VA) RNAs are processed to functional viral miRNAs or mivaRNAs. mivaRNAs are important for virus production, suggesting that they may target cellular or viral genes that affect the virus cell cycle. To look for cellular targets of mivaRNAs, we first identified genes downregulated in the presence of VA RNAs by microarray analysis. These genes were then screened for mivaRNA target sites using several bioinformatic tools. The combination of microarray analysis and bioinformatics allowed us to select the splicing and translation regulator TIA-1 as a putative mivaRNA target. We show that TIA-1 is downregulated at mRNA and protein levels in infected cells expressing functional mivaRNAs and in transfected cells that express mivaRNAI-138, one of the most abundant adenoviral miRNAs. Also, reporter assays show that TIA-1 is downregulated directly by mivaRNAI-138. To determine whether mivaRNAs could target other cellular genes we analyzed 50 additional putative targets. Thirty of them were downregulated in infected or transfected cells expressing mivaRNAs. Some of these genes are important for cell growth, transcription, RNA metabolism and DNA repair. We believe that a mivaRNA-mediated fine tune of the expression of some of these genes could be important in adenovirus cell cycle.
Collapse
Affiliation(s)
- Oscar Aparicio
- Digna Biotech and Bioinformatics Unit, CIMA, University of Navarra, Pio XII 55, 31008, Pamplona, Spain
| | | | | | | | | | | | | |
Collapse
|
8
|
SETBP1 overexpression is a novel leukemogenic mechanism that predicts adverse outcome in elderly patients with acute myeloid leukemia. Blood 2009; 115:615-25. [PMID: 19965692 DOI: 10.1182/blood-2009-06-227363] [Citation(s) in RCA: 129] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Acute myeloid leukemias (AMLs) result from multiple genetic alterations in hematopoietic stem cells. We describe a novel t(12;18)(p13;q12) involving ETV6 in a patient with AML. The translocation resulted in overexpression of SETBP1 (18q12), located close to the breakpoint. Overexpression of SETBP1 through retroviral insertion has been reported to confer growth advantage in hematopoietic progenitor cells. We show that SETBP1 overexpression protects SET from protease cleavage, increasing the amount of full-length SET protein and leading to the formation of a SETBP1-SET-PP2A complex that results in PP2A inhibition, promoting proliferation of the leukemic cells. The prevalence of SETBP1 overexpression in AML at diagnosis (n = 192) was 27.6% and was associated with unfavorable cytogenetic prognostic group, monosomy 7, and EVI1 overexpression (P < .01). Patients with SETBP1 overexpression had a significantly shorter overall survival, and the prognosis impact was remarkably poor in patients older than 60 years in both overall survival (P = .015) and event-free survival (P = .015). In summary, our data show a novel leukemogenic mechanism through SETBP1 overexpression; moreover, multivariate analysis confirms the negative prognostic impact of SETBP1 overexpression in AML, especially in elderly patients, where it could be used as a predictive factor in any future clinical trials with PP2A activators.
Collapse
|
9
|
Colecchia F, Kottwitz D, Wagner M, Pfenninger CV, Thiel G, Tamm I, Peterson C, Nuber UA. Tissue-specific regulatory network extractor (TS-REX): a database and software resource for the tissue and cell type-specific investigation of transcription factor-gene networks. Nucleic Acids Res 2009; 37:e82. [PMID: 19443447 PMCID: PMC2699531 DOI: 10.1093/nar/gkp311] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
The prediction of transcription factor binding sites in genomic sequences is in principle very useful to identify upstream regulatory factors. However, when applying this concept to genomes of multicellular organisms such as mammals, one has to deal with a large number of false positive predictions since many transcription factor genes are only expressed in specific tissues or cell types. We developed TS-REX, a database/software system that supports the analysis of tissue and cell type-specific transcription factor-gene networks based on expressed sequence tag abundance of transcription factor-encoding genes in UniGene EST libraries. The use of expression levels of transcription factor-encoding genes according to hierarchical anatomical classifications covering different tissues and cell types makes it possible to filter out irrelevant binding site predictions and to identify candidates of potential functional importance for further experimental testing. TS-REX covers ESTs from H. sapiens and M. musculus, and allows the characterization of both presence and specificity of transcription factors in user-specified tissues or cell types. The software allows users to interactively visualize transcription factor-gene networks, as well as to export data for further processing. TS-REX was applied to predict regulators of Polycomb group genes in six human tumor tissues and in human embryonic stem cells.
Collapse
Affiliation(s)
- Federico Colecchia
- Lund Strategic Research Center for Stem Cell Biology, Lund University, Sweden
| | | | | | | | | | | | | | | |
Collapse
|
10
|
A novel variant on chromosome 7q22.3 associated with mean platelet volume, counts, and function. Blood 2009; 113:3831-7. [PMID: 19221038 DOI: 10.1182/blood-2008-10-184234] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Mean platelet volume (MPV) and platelet count (PLT) are highly heritable and tightly regulated traits. We performed a genome-wide association study for MPV and identified one SNP, rs342293, as having highly significant and reproducible association with MPV (per-G allele effect 0.016 +/- 0.001 log fL; P < 1.08 x 10(-24)) and PLT (per-G effect -4.55 +/- 0.80 10(9)/L; P < 7.19 x 10(-8)) in 8586 healthy subjects. Whole-genome expression analysis in the 1-MB region showed a significant association with platelet transcript levels for PIK3CG (n = 35; P = .047). The G allele at rs342293 was also associated with decreased binding of annexin V to platelets activated with collagen-related peptide (n = 84; P = .003). The region 7q22.3 identifies the first QTL influencing platelet volume, counts, and function in healthy subjects. Notably, the association signal maps to a chromosome region implicated in myeloid malignancies, indicating this site as an important regulatory site for hematopoiesis. The identification of loci regulating MPV by this and other studies will increase our insight in the processes of megakaryopoiesis and proplatelet formation, and it may aid the identification of genes that are somatically mutated in essential thrombocytosis.
Collapse
|
11
|
Sun H, De Bie T, Storms V, Fu Q, Dhollander T, Lemmens K, Verstuyf A, De Moor B, Marchal K. ModuleDigger: an itemset mining framework for the detection of cis-regulatory modules. BMC Bioinformatics 2009; 10 Suppl 1:S30. [PMID: 19208131 PMCID: PMC2648767 DOI: 10.1186/1471-2105-10-s1-s30] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Background The detection of cis-regulatory modules (CRMs) that mediate transcriptional responses in eukaryotes remains a key challenge in the postgenomic era. A CRM is characterized by a set of co-occurring transcription factor binding sites (TFBS). In silico methods have been developed to search for CRMs by determining the combination of TFBS that are statistically overrepresented in a certain geneset. Most of these methods solve this combinatorial problem by relying on computational intensive optimization methods. As a result their usage is limited to finding CRMs in small datasets (containing a few genes only) and using binding sites for a restricted number of transcription factors (TFs) out of which the optimal module will be selected. Results We present an itemset mining based strategy for computationally detecting cis-regulatory modules (CRMs) in a set of genes. We tested our method by applying it on a large benchmark data set, derived from a ChIP-Chip analysis and compared its performance with other well known cis-regulatory module detection tools. Conclusion We show that by exploiting the computational efficiency of an itemset mining approach and combining it with a well-designed statistical scoring scheme, we were able to prioritize the biologically valid CRMs in a large set of coregulated genes using binding sites for a large number of potential TFs as input.
Collapse
Affiliation(s)
- Hong Sun
- Department of Electrical Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium.
| | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Turatsinze JV, Thomas-Chollier M, Defrance M, van Helden J. Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules. Nat Protoc 2008; 3:1578-88. [PMID: 18802439 DOI: 10.1038/nprot.2008.97] [Citation(s) in RCA: 201] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
This protocol shows how to detect putative cis-regulatory elements and regions enriched in such elements with the regulatory sequence analysis tools (RSAT) web server (http://rsat.ulb.ac.be/rsat/). The approach applies to known transcription factors, whose binding specificity is represented by position-specific scoring matrices, using the program matrix-scan. The detection of individual binding sites is known to return many false predictions. However, results can be strongly improved by estimating P value, and by searching for combinations of sites (homotypic and heterotypic models). We illustrate the detection of sites and enriched regions with a study case, the upstream sequence of the Drosophila melanogaster gene even-skipped. This protocol is also tested on random control sequences to evaluate the reliability of the predictions. Each task requires a few minutes of computation time on the server. The complete protocol can be executed in about one hour.
Collapse
Affiliation(s)
- Jean-Valery Turatsinze
- Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe), Université Libre de Bruxelles CP 263, Campus Plaine, Boulevard du Triomphe, Bruxelles, Belgium
| | | | | | | |
Collapse
|
13
|
Abstract
The AMIC@ Web Server offers a light-weight multi-method clustering engine for microarray gene-expression data. AMIC@ is a highly interactive tool that stresses user-friendliness and robustness by adopting AJAX technology, thus allowing an effective interleaved execution of different clustering algorithms and inspection of results. Among the salient features AMIC@ offers, there are: (i) automatic file format detection, (ii) suggestions on the number of clusters using a variant of the stability-based method of Tibshirani et al. (iii) intuitive visual inspection of the data via heatmaps and (iv) measurements of the clustering quality using cluster homogeneity. Large data sets can be processed efficiently by selecting algorithms (such as FPF-SB and k-Boost), specifically designed for this purpose. In case of very large data sets, the user can opt for a batch-mode use of the system by means of the Clustering wizard that runs all algorithms at once and delivers the results via email. AMIC@ is freely available and open to all users with no login requirement at the following URL http://bioalgo.iit.cnr.it/amica.
Collapse
Affiliation(s)
- Filippo Geraci
- Istituto di Informatica e Telematica del C.N.R., Via Moruzzi 1, Pisa, Italy
| | | | | |
Collapse
|
14
|
Prlić A, Down TA, Kulesha E, Finn RD, Kähäri A, Hubbard TJP. Integrating sequence and structural biology with DAS. BMC Bioinformatics 2007; 8:333. [PMID: 17850653 PMCID: PMC2031907 DOI: 10.1186/1471-2105-8-333] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2007] [Accepted: 09/12/2007] [Indexed: 11/16/2022] Open
Abstract
Background The Distributed Annotation System (DAS) is a network protocol for exchanging biological data. It is frequently used to share annotations of genomes and protein sequence. Results Here we present several extensions to the current DAS 1.5 protocol. These provide new commands to share alignments, three dimensional molecular structure data, add the possibility for registration and discovery of DAS servers, and provide a convention how to provide different types of data plots. We present examples of web sites and applications that use the new extensions. We operate a public registry of DAS sources, which now includes entries for more than 250 distinct sources. Conclusion Our DAS extensions are essential for the management of the growing number of services and exchange of diverse biological data sets. In addition the extensions allow new types of applications to be developed and scientific questions to be addressed. The registry of DAS sources is available at
Collapse
Affiliation(s)
- Andreas Prlić
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | - Thomas A Down
- Wellcome Trust/Cancer Research UK Gurdon Institute, Cambridge University, Cambridge, UK
| | - Eugene Kulesha
- European Bioinformatics Institute, Hinxton, Cambridge, UK
| | - Robert D Finn
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | - Andreas Kähäri
- European Bioinformatics Institute, Hinxton, Cambridge, UK
| | - Tim JP Hubbard
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| |
Collapse
|
15
|
Jackson ES, Fitzgerald WJ. A sequential Monte Carlo EM approach to the transcription factor binding site identification problem. Bioinformatics 2007; 23:1313-20. [PMID: 17387112 DOI: 10.1093/bioinformatics/btm054] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION A significant and stubbornly intractable problem in genome sequence analysis has been the de novo identification of transcription factor binding sites in promoter regions. Although theoretically pleasing, probabilistic methods have faced difficulties due to model mismatch and the nature of the biological sequence. These problems result in inference in a high dimensional, highly multimodal space, and consequently often display only local convergence and hence unsatisfactory performance. ALGORITHM In this article, we derive and demonstrate a novel method utilizing a sequential Monte Carlo-based expectation-maximization (EM) optimization to improve performance in this scenario. The Monte Carlo element should increase the robustness of the algorithm compared to classical EM. Furthermore, the parallel nature of the sequential Monte Carlo algorithm should be more robust than Gibbs sampling approaches to multimodality problems. RESULTS We demonstrate the superior performance of this algorithm on both semi-synthetic and real data from Escherichia coli. AVAILABILITY http://sigproc-eng.cam.ac.uk/ approximately ej230/smc_em_tfbsid.tar.gz
Collapse
Affiliation(s)
- Edmund S Jackson
- Signal Processing Laboratory, Department of Engineering, Cambridge University, UK.
| | | |
Collapse
|
16
|
Lebeer S, De Keersmaecker SCJ, Verhoeven TLA, Fadda AA, Marchal K, Vanderleyden J. Functional analysis of luxS in the probiotic strain Lactobacillus rhamnosus GG reveals a central metabolic role important for growth and biofilm formation. J Bacteriol 2006; 189:860-71. [PMID: 17098890 PMCID: PMC1797292 DOI: 10.1128/jb.01394-06] [Citation(s) in RCA: 111] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Quorum sensing is involved in the regulation of multicellular behavior through communication via small molecules. Given the high number and diversity of the gastrointestinal microbiota, it is postulated that members of this community communicate to coordinate a variety of adaptive processes. AI-2 is suggested to be a universal bacterial signaling molecule synthesized by the LuxS enzyme, which forms an integral part of the activated methyl cycle. We have previously reported that the well-documented probiotic strain Lactobacillus rhamnosus GG, a human isolate, produces AI-2-like molecules. In this study, we identified the luxS homologue of L. rhamnosus GG. luxS seems to be located in an operon with a yxjH gene encoding a putative cobalamin-independent methionine synthase. In silico analysis revealed a methionine-specific T box in the leader sequence of the putative yxjH-luxS operon. However, transcriptional analysis showed that luxS is expressed mainly as a monocistronic transcript. Construction of a luxS knockout mutant confirmed that the luxS gene is responsible for AI-2 production in L. rhamnosus GG. However, this mutation also resulted in pleiotropic effects on the growth of this fastidious strain. Cysteine, pantothenate, folic acid, and biotin could partially complement growth, suggesting a central metabolic role for luxS in L. rhamnosus GG. Interestingly, the luxS mutant also showed a defect in monospecies biofilm formation. Experiments with chemically synthesized (S)-4,5-dihydroxy-2,3-pentanedione, coculture with the wild type, and nutritional complementation suggested that the main cause of this defect has a metabolic nature. Moreover, our data indicate that suppressor mutations are likely to occur in luxS mutants of L. rhamnosus GG. Therefore, results of luxS-related studies should be carefully interpreted.
Collapse
Affiliation(s)
- Sarah Lebeer
- Centre of Microbial and Plant Genetics, K U Leuven, Kasteelpark Arenberg 20, 3001 Leuven, Belgium
| | | | | | | | | | | |
Collapse
|
17
|
XML schemas for common bioinformatic data types and their application in workflow systems. BMC Bioinformatics 2006; 7:490. [PMID: 17087823 PMCID: PMC2001303 DOI: 10.1186/1471-2105-7-490] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2006] [Accepted: 11/06/2006] [Indexed: 11/30/2022] Open
Abstract
Background Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data – therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats. Results Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at , the BioDOM library can be obtained at . Conclusion The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios.
Collapse
|
18
|
Benedict C, Geisler M, Trygg J, Huner N, Hurry V. Consensus by democracy. Using meta-analyses of microarray and genomic data to model the cold acclimation signaling pathway in Arabidopsis. PLANT PHYSIOLOGY 2006; 141:1219-32. [PMID: 16896234 PMCID: PMC1533918 DOI: 10.1104/pp.106.083527] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
The whole-genome response of Arabidopsis (Arabidopsis thaliana) exposed to different types and durations of abiotic stress has now been described by a wealth of publicly available microarray data. When combined with studies of how gene expression is affected in mutant and transgenic Arabidopsis with altered ability to transduce the low temperature signal, these data can be used to test the interactions between various low temperature-associated transcription factors and their regulons. We quantized a collection of Affymetrix microarray data so that each gene in a particular regulon could vote on whether a cis-element found in its promoter conferred induction (+1), repression (-1), or no transcriptional change (0) during cold stress. By statistically comparing these election results with the voting behavior of all genes on the same gene chip, we verified the bioactivity of novel cis-elements and defined whether they were inductive or repressive. Using in silico mutagenesis we identified functional binding consensus variants for the transcription factors studied. Our results suggest that the previously identified ICEr1 (induction of CBF expression region 1) consensus does not correlate with cold gene induction, while the ICEr3/ICEr4 consensuses identified using our algorithms are present in regulons of genes that were induced coordinate with observed ICE1 transcript accumulation and temporally preceding genes containing the dehydration response element. Statistical analysis of overlap and cis-element enrichment in the ICE1, CBF2, ZAT12, HOS9, and PHYA regulons enabled us to construct a regulatory network supported by multiple lines of evidence that can be used for future hypothesis testing.
Collapse
Affiliation(s)
- Catherine Benedict
- Umeå Plant Science Centre, Department of Plant Physiology , Umeå University, S-901 87 Umea, Sweden.
| | | | | | | | | |
Collapse
|
19
|
De Bodt S, Theissen G, Van de Peer Y. Promoter Analysis of MADS-Box Genes in Eudicots Through Phylogenetic Footprinting. Mol Biol Evol 2006; 23:1293-303. [PMID: 16581940 DOI: 10.1093/molbev/msk016] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The MIKC MADS-box gene family has been shaped by extensive gene duplications giving rise to subfamilies of genes with distinct functions and expression patterns. However, within these subfamilies the functional assignment is not that clear-cut, and considerable functional redundancy exists. One way to investigate the diversity in regulation present in these subfamilies is promoter sequence analysis. With the advent of genome sequencing projects, we are now able to exert a comparative analysis of Arabidopsis and poplar promoters of MADS-box genes belonging to the same subfamily. Based on the principle of phylogenetic footprinting, sequences conserved between the promoters of homologous genes are thought to be functional. Here, we have investigated the evolution of MADS-box genes at the promoter level and show that many genes have diverged in their regulatory sequences after duplication and/or speciation. Furthermore, using phylogenetic footprinting, a distinction can be made between redundancy, neo/nonfunctionalization, and subfunctionalization.
Collapse
Affiliation(s)
- Stefanie De Bodt
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology, Ghent University, Ghent, Belgium
| | | | | |
Collapse
|
20
|
Monsieurs P, Thijs G, Fadda AA, De Keersmaecker SCJ, Vanderleyden J, De Moor B, Marchal K. More robust detection of motifs in coexpressed genes by using phylogenetic information. BMC Bioinformatics 2006; 7:160. [PMID: 16549017 PMCID: PMC1525208 DOI: 10.1186/1471-2105-7-160] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2005] [Accepted: 03/20/2006] [Indexed: 11/30/2022] Open
Abstract
Background Several motif detection algorithms have been developed to discover overrepresented motifs in sets of coexpressed genes. However, in a noisy gene list, the number of genes containing the motif versus the number lacking the motif might not be sufficiently high to allow detection by classical motif detection tools. To still recover motifs which are not significantly enriched but still present, we developed a procedure in which we use phylogenetic footprinting to first delineate all potential motifs in each gene. Then we mutually compare all detected motifs and identify the ones that are shared by at least a few genes in the data set as potential candidates. Results We applied our methodology to a compiled test data set containing known regulatory motifs and to two biological data sets derived from genome wide expression studies. By executing four consecutive steps of 1) identifying conserved regions in orthologous intergenic regions, 2) aligning these conserved regions, 3) clustering the conserved regions containing similar regulatory regions followed by extraction of the regulatory motifs and 4) screening the input intergenic sequences with detected regulatory motif models, our methodology proves to be a powerful tool for detecting regulatory motifs when a low signal to noise ratio is present in the input data set. Comparing our results with two other motif detection algorithms points out the robustness of our algorithm. Conclusion We developed an approach that can reliably identify multiple regulatory motifs lacking a high degree of overrepresentation in a set of coexpressed genes (motifs belonging to sparsely connected hubs in the regulatory network) by exploiting the advantages of using both coexpression and phylogenetic information.
Collapse
Affiliation(s)
- Pieter Monsieurs
- ESAT-SCD/SISTA, K.U. Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
| | - Gert Thijs
- ESAT-SCD/SISTA, K.U. Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
| | - Abeer A Fadda
- Centre of Microbial and Plant Genetics, K.U. Leuven, Kasteelpark Arenberg 20, 3001 Leuven-Heverlee, Belgium
| | - Sigrid CJ De Keersmaecker
- Centre of Microbial and Plant Genetics, K.U. Leuven, Kasteelpark Arenberg 20, 3001 Leuven-Heverlee, Belgium
| | - Jozef Vanderleyden
- Centre of Microbial and Plant Genetics, K.U. Leuven, Kasteelpark Arenberg 20, 3001 Leuven-Heverlee, Belgium
| | - Bart De Moor
- ESAT-SCD/SISTA, K.U. Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
| | - Kathleen Marchal
- Centre of Microbial and Plant Genetics, K.U. Leuven, Kasteelpark Arenberg 20, 3001 Leuven-Heverlee, Belgium
| |
Collapse
|
21
|
Van Hellemont R, Monsieurs P, Thijs G, De Moor B, Van de Peer Y, Marchal K. A novel approach to identifying regulatory motifs in distantly related genomes. Genome Biol 2005; 6:R113. [PMID: 16420672 PMCID: PMC1414112 DOI: 10.1186/gb-2005-6-13-r113] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2005] [Revised: 08/22/2005] [Accepted: 12/01/2005] [Indexed: 11/25/2022] Open
Abstract
A two-step procedure for identifying regulatory motifs in distantly related organisms is described that combines the advantages of sequence alignment and motif detection approaches. Although proven successful in the identification of regulatory motifs, phylogenetic footprinting methods still show some shortcomings. To assess these difficulties, most apparent when applying phylogenetic footprinting to distantly related organisms, we developed a two-step procedure that combines the advantages of sequence alignment and motif detection approaches. The results on well-studied benchmark datasets indicate that the presented method outperforms other methods when the sequences become either too long or too heterogeneous in size.
Collapse
Affiliation(s)
- Ruth Van Hellemont
- ESAT-SCD, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
| | - Pieter Monsieurs
- ESAT-SCD, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
| | - Gert Thijs
- ESAT-SCD, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
| | - Bart De Moor
- ESAT-SCD, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
| | - Yves Van de Peer
- Plant Systems Biology, Bioinformatics and Evolutionary Genomics, VIB/Ghent University, Technologiepark 927, 9052 Gent, Belgium
| | - Kathleen Marchal
- ESAT-SCD, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
- Department of Microbial and Molecular Systems, KU Leuven, Kasteelpark Arenberg 20, 3001 Leuven-Heverlee, Belgium
| |
Collapse
|
22
|
Hong JA, Kang Y, Abdullaev Z, Flanagan PT, Pack SD, Fischette MR, Adnani MT, Loukinov DI, Vatolin S, Risinger JI, Custer M, Chen GA, Zhao M, Nguyen DM, Barrett JC, Lobanenkov VV, Schrump DS. Reciprocal binding of CTCF and BORIS to the NY-ESO-1 promoter coincides with derepression of this cancer-testis gene in lung cancer cells. Cancer Res 2005; 65:7763-74. [PMID: 16140944 DOI: 10.1158/0008-5472.can-05-0823] [Citation(s) in RCA: 141] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Regulatory sequences recognized by the unique pair of paralogous factors, CTCF and BORIS, have been implicated in epigenetic regulation of imprinting and X chromosome inactivation. Lung cancers exhibit genome-wide demethylation associated with derepression of a specific class of genes encoding cancer-testis (CT) antigens such as NY-ESO-1. CT genes are normally expressed in BORIS-positive male germ cells deficient in CTCF and meCpG contents, but are strictly silenced in somatic cells. The present study was undertaken to ascertain if aberrant activation of BORIS contributes to derepression of NY-ESO-1 during pulmonary carcinogenesis. Preliminary experiments indicated that NY-ESO-1 expression coincided with derepression of BORIS in cultured lung cancer cells. Quantitative reverse transcription-PCR analysis revealed robust, coincident induction of BORIS and NY-ESO-1 expression in lung cancer cells, but not normal human bronchial epithelial cells following 5-aza-2'-deoxycytidine (5-azadC), Depsipeptide FK228 (DP), or sequential 5-azadC/DP exposure under clinically relevant conditions. Bisulfite sequencing, methylation-specific PCR, and chromatin immunoprecipitation (ChIP) experiments showed that induction of BORIS coincided with direct modulation of chromatin structure within a CpG island in the 5'-flanking noncoding region of this gene. Cotransfection experiments using promoter-reporter constructs confirmed that BORIS modulates NY-ESO-1 expression in lung cancer cells. Gel shift and ChIP experiments revealed a novel CTCF/BORIS-binding site in the NY-ESO-1 promoter, which unlike such sites in the H19-imprinting control region and X chromosome, is insensitive to CpG methylation in vitro. In vivo occupancy of this site by CTCF was associated with silencing of the NY-ESO-1 promoter, whereas switching from CTCF to BORIS occupancy coincided with derepression of NY-ESO-1. Collectively, these data indicate that reciprocal binding of CTCF and BORIS to the NY-ESO-1 promoter mediates epigenetic regulation of this CT gene in lung cancer cells, and suggest that induction of BORIS may be a novel strategy to augment immunogenicity of pulmonary carcinomas.
Collapse
Affiliation(s)
- Julie A Hong
- Thoracic Oncology Section, Surgery Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892-1201, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Viemann D, Schulze-Osthoff K, Roth J. Potentials and pitfalls of DNA array analysis of the endothelial stress response. BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR CELL RESEARCH 2005; 1746:73-84. [PMID: 16300842 DOI: 10.1016/j.bbamcr.2005.09.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2005] [Revised: 09/26/2005] [Accepted: 09/26/2005] [Indexed: 11/17/2022]
Abstract
Endothelial cells respond to inflammatory stimuli with complex genetic alterations that determine the immune response and the outcome of the inflammatory process. An additional layer of complexity is added by the different phenotypes and functional heterogeneity of endothelial cells in the various tissues. To understand these complex gene response patterns and the regulatory pathways involved, many investigators increasingly use DNA microarray analysis. There are, however, many potential pitfalls in the use of microarrays that can result in false data and erroneous conclusions. This review surveys the principles of DNA microarray technology and its applications in endothelial cell research. We also attempt to outline some of the caveats and standard criteria that have to be considered in order to realize the full potential of microarrays in inflammation research.
Collapse
Affiliation(s)
- Dorothee Viemann
- Institute of Experimental Dermatology, Department of Pediatrics and Integrated Functional Genomics, University of Münster, Röntgenstr. 21, Germany
| | | | | |
Collapse
|
24
|
Shamir R, Maron-Katz A, Tanay A, Linhart C, Steinfeld I, Sharan R, Shiloh Y, Elkon R. EXPANDER--an integrative program suite for microarray data analysis. BMC Bioinformatics 2005; 6:232. [PMID: 16176576 PMCID: PMC1261157 DOI: 10.1186/1471-2105-6-232] [Citation(s) in RCA: 214] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2005] [Accepted: 09/21/2005] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Gene expression microarrays are a prominent experimental tool in functional genomics which has opened the opportunity for gaining global, systems-level understanding of transcriptional networks. Experiments that apply this technology typically generate overwhelming volumes of data, unprecedented in biological research. Therefore the task of mining meaningful biological knowledge out of the raw data is a major challenge in bioinformatics. Of special need are integrative packages that provide biologist users with advanced but yet easy to use, set of algorithms, together covering the whole range of steps in microarray data analysis. RESULTS Here we present the EXPANDER 2.0 (EXPression ANalyzer and DisplayER) software package. EXPANDER 2.0 is an integrative package for the analysis of gene expression data, designed as a 'one-stop shop' tool that implements various data analysis algorithms ranging from the initial steps of normalization and filtering, through clustering and biclustering, to high-level functional enrichment analysis that points to biological processes that are active in the examined conditions, and to promoter cis-regulatory elements analysis that elucidates transcription factors that control the observed transcriptional response. EXPANDER is available with pre-compiled functional Gene Ontology (GO) and promoter sequence-derived data files for yeast, worm, fly, rat, mouse and human, supporting high-level analysis applied to data obtained from these six organisms. CONCLUSION EXPANDER integrated capabilities and its built-in support of multiple organisms make it a very powerful tool for analysis of microarray data. The package is freely available for academic users at http://www.cs.tau.ac.il/~rshamir/expander.
Collapse
Affiliation(s)
- Ron Shamir
- School of Computer Science, Sackler Faculty of Exact Sciences. Tel Aviv University, Tel Aviv 69978 Israel
| | - Adi Maron-Katz
- School of Computer Science, Sackler Faculty of Exact Sciences. Tel Aviv University, Tel Aviv 69978 Israel
| | - Amos Tanay
- School of Computer Science, Sackler Faculty of Exact Sciences. Tel Aviv University, Tel Aviv 69978 Israel
| | - Chaim Linhart
- School of Computer Science, Sackler Faculty of Exact Sciences. Tel Aviv University, Tel Aviv 69978 Israel
| | - Israel Steinfeld
- School of Computer Science, Sackler Faculty of Exact Sciences. Tel Aviv University, Tel Aviv 69978 Israel
| | - Roded Sharan
- School of Computer Science, Sackler Faculty of Exact Sciences. Tel Aviv University, Tel Aviv 69978 Israel
| | - Yosef Shiloh
- The David and Inez Myers Laboratory for Genetic Research, Department of Human Genetics, Sackler School of Medicine. Tel Aviv University, Tel Aviv 69978, Israel
| | - Ran Elkon
- The David and Inez Myers Laboratory for Genetic Research, Department of Human Genetics, Sackler School of Medicine. Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
25
|
De Keersmaecker SCJ, Marchal K, Verhoeven TLA, Engelen K, Vanderleyden J, Detweiler CS. Microarray analysis and motif detection reveal new targets of the Salmonella enterica serovar Typhimurium HilA regulatory protein, including hilA itself. J Bacteriol 2005; 187:4381-91. [PMID: 15968047 PMCID: PMC1151768 DOI: 10.1128/jb.187.13.4381-4391.2005] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
DNA regulatory motifs reflect the direct transcriptional interactions between regulators and their target genes and contain important information regarding transcriptional networks. In silico motif detection strategies search for DNA patterns that are present more frequently in a set of related sequences than in a set of unrelated sequences. Related sequences could be genes that are coexpressed and are therefore expected to share similar conserved regulatory motifs. We identified coexpressed genes by carrying out microarray-based transcript profiling of Salmonella enterica serovar Typhimurium in response to the spent culture supernatant of the probiotic strain Lactobacillus rhamnosus GG. Probiotics are live microorganisms which, when administered in adequate amounts, confer a health benefit on the host. They are known to antagonize intestinal pathogens in vivo, including salmonellae. S. enterica serovar Typhimurium causes human gastroenteritis. Infection is initiated by entry of salmonellae into intestinal epithelial cells. The expression of invasion genes is tightly regulated by environmental conditions, as well as by many bacterial factors including the key regulator HilA. One mechanism by which probiotics may antagonize intestinal pathogens is by influencing invasion gene expression. Our microarray experiment yielded a cluster of coexpressed Salmonella genes that are predicted to be down-regulated by spent culture supernatant. This cluster was enriched for genes known to be HilA dependent. In silico motif detection revealed a motif that overlaps the previously described HilA box in the promoter region of three of these genes, spi4_H, sicA, and hilA. Site-directed mutagenesis, beta-galactosidase reporter assays, and gel mobility shift experiments indicated that sicA expression requires HilA and that hilA is negatively autoregulated.
Collapse
|
26
|
Abstract
The effective integration of data and knowledge from many disparate sources will be crucial to future drug discovery. Data integration is a key element of conducting scientific investigations with modern platform technologies, managing increasingly complex discovery portfolios and processes, and fully realizing economies of scale in large enterprises. However, viewing data integration as simply an 'IT problem' underestimates the novel and serious scientific and management challenges it embodies - challenges that could require significant methodological and even cultural changes in our approach to data.
Collapse
Affiliation(s)
- David B Searls
- Bioinformatics Division, Genetics Research, GlaxoSmithKline Pharmaceuticals, 709 Swedeland Road, P.O. Box 1539, King of Prussia, Pennsylvania 19406, USA.
| |
Collapse
|
27
|
Hu Z, Fu Y, Halees AS, Kielbasa SM, Weng Z. SeqVISTA: a new module of integrated computational tools for studying transcriptional regulation. Nucleic Acids Res 2004; 32:W235-41. [PMID: 15215387 PMCID: PMC441621 DOI: 10.1093/nar/gkh483] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Transcriptional regulation is one of the most basic regulatory mechanisms in the cell. The accumulation of multiple metazoan genome sequences and the advent of high-throughput experimental techniques have motivated the development of a large number of bioinformatics methods for the detection of regulatory motifs. The regulatory process is extremely complex and individual computational algorithms typically have very limited success in genome-scale studies. Here, we argue the importance of integrating multiple computational algorithms and present an infrastructure that integrates eight web services covering key areas of transcriptional regulation. We have adopted the client-side integration technology and built a consistent input and output environment with a versatile visualization tool named SeqVISTA. The infrastructure will allow for easy integration of gene regulation analysis software that is scattered over the Internet. It will also enable bench biologists to perform an arsenal of analysis using cutting-edge methods in a familiar environment and bioinformatics researchers to focus on developing new algorithms without the need to invest substantial effort on complex pre- or post-processors. SeqVISTA is freely available to academic users and can be launched online at http://zlab.bu.edu/SeqVISTA/web.jnlp, provided that Java Web Start has been installed. In addition, a stand-alone version of the program can be downloaded and run locally. It can be obtained at http://zlab.bu.edu/SeqVISTA.
Collapse
Affiliation(s)
- Zhenjun Hu
- Bioinformatics Program, Boston University, 44 Cummington Street, Boston, MA 02215, USA
| | | | | | | | | |
Collapse
|
28
|
Hokamp K, Roche FM, Acab M, Rousseau ME, Kuo B, Goode D, Aeschliman D, Bryan J, Babiuk LA, Hancock REW, Brinkman FSL. ArrayPipe: a flexible processing pipeline for microarray data. Nucleic Acids Res 2004; 32:W457-9. [PMID: 15215429 PMCID: PMC441584 DOI: 10.1093/nar/gkh446] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A number of microarray analysis software packages exist already; however, none combines the user-friendly features of a web-based interface with potential ability to analyse multiple arrays at once using flexible analysis steps. The ArrayPipe web server (freely available at www.pathogenomics.ca/arraypipe) allows the automated application of complex analyses to microarray data which can range from single slides to large data sets including replicates and dye-swaps. It handles output from most commonly used quantification software packages for dual-labelled arrays. Application features range from quality assessment of slides through various data visualizations to multi-step analyses including normalization, detection of differentially expressed genes, andcomparison and highlighting of gene lists. A highly customizable action set-up facilitates unrestricted arrangement of functions, which can be stored as action profiles. A unique combination of web-based and command-line functionality enables comfortable configuration of processes that can be repeatedly applied to large data sets in high throughput. The output consists of reports formatted as standard web pages and tab-delimited lists of calculated values that can be inserted into other analysis programs. Additional features, such as web-based spreadsheet functionality, auto-parallelization and password protection make this a powerful tool in microarray research for individuals and large groups alike.
Collapse
Affiliation(s)
- Karsten Hokamp
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Gao Y, Li J, Strickland E, Hua S, Zhao H, Chen Z, Qu L, Deng XW. An arabidopsis promoter microarray and its initial usage in the identification of HY5 binding targets in vitro. PLANT MOLECULAR BIOLOGY 2004; 54:683-699. [PMID: 15356388 DOI: 10.1023/b:plan.0000040898.86788.59] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
To analyze transcription factor-promoter interactions in Arabidopsis, a general strategy for generating a promoter microarray has been established. This includes an integrated platform for promoter sequence extraction and the design of primers for the PCR amplification of the promoter regions of annotated genes in the Arabidopsis genome. A web-interfaced primer-retrieval program was used to obtain up to 10 primer pairs with a suitability ranking given to each gene. We selected primer pairs for the promoters of about 3800 genes, and greater than 95% of the promoter fragments from the total genomic DNA were successfully amplified by PCR. These PCR products were purified and used to print an Arabidopsis promoter microarray. This initial promoter microarray was used to study the in vitro binding of the transcription factor HY5 to its promoter targets. A set of promoter fragments exhibited consistent and strong interaction with the HY5 protein in vitro, and computational analysis revealed that they were enriched with the HY5 consensus binding G-box motif. Thus, a promoter microarray can be a useful tool for identifying transcription factor binding sites at the genomic scale in higher plants.
Collapse
Affiliation(s)
- Ying Gao
- Peking-Yale Joint Center of Plant Molecular Genetics and Agrobiotechnology, College of Life Sciences, Peking University, Beijing 100871, PR China
| | | | | | | | | | | | | | | |
Collapse
|
30
|
Glenisson P, Mathys J, De Moor B. Meta-clustering of gene expression data and literature-based information. ACTA ACUST UNITED AC 2003. [DOI: 10.1145/980972.980985] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
The current tendency in the life sciences to spawn ever growing amounts of high-throughput assays has led to a situation where the interpretation of data and the formulation of hypotheses lag the pace at which information is produced. Although the first generation of statistical algorithms scrutinizing single, large-scale data sets found their way into the biological community, the great challenge to connect their results to existing knowledge still remains. Despite the fairly large number of biological databases that is currently available, a lot of relevant information is found in free-text format (such as textual annotations, scientific abstracts and full publications). In this paper we explore how an
integrated
analysis of expression data and literature-extracted information can reveal biologically meaningful clusters not identified when using microarray information alone. The joint analysis is validated in terms of transcriptional regulation.
Collapse
|