501
|
Montgomery S. Current computational methods for prioritizing candidate regulatory polymorphisms. Methods Mol Biol 2009; 569:89-114. [PMID: 19623487 DOI: 10.1007/978-1-59745-524-4_5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Discovery of DNA sequence variants responsible for human phenotypic variation is key to advances in molecular diagnostics and medicines. Historically, variants that alter the protein-coding sequence of genes have been targeted when attempting to identify a trait's etiology; this is done because the rules governing these regions are generally well-understood and candidate variants can be easily selected. However, the effects of variants on gene regulation are increasingly regarded as being as important as protein-coding variation in uncovering the nature of phenotypic variation. I discuss resources and methodology that have recently been developed to computationally prioritize variants that may alter gene expression.
Collapse
|
502
|
Stojanovic N. A Study of the Distribution of Phylogenetically Conserved Blocks within Clusters of Mammalian Homeobox Genes. Genet Mol Biol 2009; 32:666-673. [PMID: 20209015 PMCID: PMC2832180 DOI: 10.1590/s1415-47572009000300034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2008] [Accepted: 05/25/2009] [Indexed: 11/22/2022] Open
Abstract
Genome sequencing efforts of the last decade have produced a large amount of data, which has enabled whole-genome comparative analyses in order to locate potentially functional elements and study the overall patterns of phylogenetic conservation. In this paper we present a statistically based method for the characterization of these patterns in mammalian DNA sequences. We have applied this approach to the study of exceptionally well conserved homeobox gene clusters (Hox), based on the alignment of six species, and we have constructed a map of Hox cataloguing the conserved fragments, along with their locations in relation to the genes and other landmarks, sometimes showing unexpected layouts.
Collapse
Affiliation(s)
- Nikola Stojanovic
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA
| |
Collapse
|
503
|
Sandve GK, Abul O, Drabløs F. Compo: composite motif discovery using discrete models. BMC Bioinformatics 2008; 9:527. [PMID: 19063744 PMCID: PMC2614996 DOI: 10.1186/1471-2105-9-527] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2008] [Accepted: 12/08/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Computational discovery of motifs in biomolecular sequences is an established field, with applications both in the discovery of functional sites in proteins and regulatory sites in DNA. In recent years there has been increased attention towards the discovery of composite motifs, typically occurring in cis-regulatory regions of genes. RESULTS This paper describes Compo: a discrete approach to composite motif discovery that supports richer modeling of composite motifs and a more realistic background model compared to previous methods. Furthermore, multiple parameter and threshold settings are tested automatically, and the most interesting motifs across settings are selected. This avoids reliance on single hard thresholds, which has been a weakness of previous discrete methods. Comparison of motifs across parameter settings is made possible by the use of p-values as a general significance measure. Compo can either return an ordered list of motifs, ranked according to the general significance measure, or a Pareto front corresponding to a multi-objective evaluation on sensitivity, specificity and spatial clustering. CONCLUSION Compo performs very competitively compared to several existing methods on a collection of benchmark data sets. These benchmarks include a recently published, large benchmark suite where the use of support across sequences allows Compo to correctly identify binding sites even when the relevant PWMs are mixed with a large number of noise PWMs. Furthermore, the possibility of parameter-free running offers high usability, the support for multi-objective evaluation allows a rich view of potential regulators, and the discrete model allows flexibility in modeling and interpretation of motifs.
Collapse
Affiliation(s)
- Geir Kjetil Sandve
- Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway.
| | | | | |
Collapse
|
504
|
Chang WC, Lee TY, Huang HD, Huang HY, Pan RL. PlantPAN: Plant promoter analysis navigator, for identifying combinatorial cis-regulatory elements with distance constraint in plant gene groups. BMC Genomics 2008; 9:561. [PMID: 19036138 PMCID: PMC2633311 DOI: 10.1186/1471-2164-9-561] [Citation(s) in RCA: 204] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2008] [Accepted: 11/26/2008] [Indexed: 11/23/2022] Open
Abstract
Background The elucidation of transcriptional regulation in plant genes is important area of research for plant scientists, following the mapping of various plant genomes, such as A. thaliana, O. sativa and Z. mays. A variety of bioinformatic servers or databases of plant promoters have been established, although most have been focused only on annotating transcription factor binding sites in a single gene and have neglected some important regulatory elements (tandem repeats and CpG/CpNpG islands) in promoter regions. Additionally, the combinatorial interaction of transcription factors (TFs) is important in regulating the gene group that is associated with the same expression pattern. Therefore, a tool for detecting the co-regulation of transcription factors in a group of gene promoters is required. Results This study develops a database-assisted system, PlantPAN (Plant Promoter Analysis Navigator), for recognizing combinatorial cis-regulatory elements with a distance constraint in sets of plant genes. The system collects the plant transcription factor binding profiles from PLACE, TRANSFAC (public release 7.0), AGRIS, and JASPER databases and allows users to input a group of gene IDs or promoter sequences, enabling the co-occurrence of combinatorial transcription factor binding sites (TFBSs) within a defined distance (20 bp to 200 bp) to be identified. Furthermore, the new resource enables other regulatory features in a plant promoter, such as CpG/CpNpG islands and tandem repeats, to be displayed. The regulatory elements in the conserved regions of the promoters across homologous genes are detected and presented. Conclusion In addition to providing a user-friendly input/output interface, PlantPAN has numerous advantages in the analysis of a plant promoter. Several case studies have established the effectiveness of PlantPAN. This novel analytical resource is now freely available at .
Collapse
Affiliation(s)
- Wen-Chi Chang
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsin-Chu 300, Taiwan.
| | | | | | | | | |
Collapse
|
505
|
Kuntz SG, Schwarz EM, DeModena JA, De Buysscher T, Trout D, Shizuya H, Sternberg PW, Wold BJ. Multigenome DNA sequence conservation identifies Hox cis-regulatory elements. Genome Res 2008; 18:1955-68. [PMID: 18981268 DOI: 10.1101/gr.085472.108] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
To learn how well ungapped sequence comparisons of multiple species can predict cis-regulatory elements in Caenorhabditis elegans, we made such predictions across the large, complex ceh-13/lin-39 locus and tested them transgenically. We also examined how prediction quality varied with different genomes and parameters in our comparisons. Specifically, we sequenced approximately 0.5% of the C. brenneri and C. sp. 3 PS1010 genomes, and compared five Caenorhabditis genomes (C. elegans, C. briggsae, C. brenneri, C. remanei, and C. sp. 3 PS1010) to find regulatory elements in 22.8 kb of noncoding sequence from the ceh-13/lin-39 Hox subcluster. We developed the MUSSA program to find ungapped DNA sequences with N-way transitive conservation, applied it to the ceh-13/lin-39 locus, and transgenically assayed 21 regions with both high and low degrees of conservation. This identified 10 functional regulatory elements whose activities matched known ceh-13/lin-39 expression, with 100% specificity and a 77% recovery rate. One element was so well conserved that a similar mouse Hox cluster sequence recapitulated the native nematode expression pattern when tested in worms. Our findings suggest that ungapped sequence comparisons can predict regulatory elements genome-wide.
Collapse
Affiliation(s)
- Steven G Kuntz
- Division of Biology, California Institute of Technology, Pasadena, California 91125, USA
| | | | | | | | | | | | | | | |
Collapse
|
506
|
Coassin S, Brandstätter A, Kronenberg F. An optimized procedure for the design and evaluation of Ecotilling assays. BMC Genomics 2008; 9:510. [PMID: 18973671 PMCID: PMC2586031 DOI: 10.1186/1471-2164-9-510] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2008] [Accepted: 10/30/2008] [Indexed: 11/17/2022] Open
Abstract
Background Single nucleotide polymorphisms (SNPs) are the most common form of genetic variability in the human genome and play a prominent role in the heritability of phenotypes. Especially rare alleles with frequencies less than 5% may exhibit a particularly strong influence on the development of complex diseases. The detection of rare alleles by standard DNA sequencing is time-consuming and cost-intensive. Here we discuss an alternative approach for a high throughput detection of rare mutations in large population samples using Ecotilling embedded in a collection of bioinformatic analysis tools. Ecotilling originally was introduced as TILLING for the screening for rare chemically induced mutations in plants and later adopted for human samples, showing an outstanding suitability for the detection of rare alleles in humans. An actual problem in the use of Ecotilling for large mutation screening projects in humans without bioinformatic support is represented by the lack of solutions to quickly yet comprehensively evaluate each newly found variation and place it into the correct genomic context. Results We present an optimized strategy for the design, evaluation and interpretation of Ecotilling results by integrating several mostly freely available bioinformatic tools. A major focus of our investigations was the evaluation and meaningful economical combination of these software tools for the inference of different possible regulatory functions for each newly detected mutation. Conclusion Our streamlined procedure significantly facilitates the experimental design and evaluation of Ecotilling assays and strongly improves the decision process on prioritizing the newly found SNPs for further downstream analysis.
Collapse
Affiliation(s)
- Stefan Coassin
- Division of Genetic Epidemiology, Department of Medical Genetics, Molecular and Clinical Pharmacology, Innsbruck Medical University, Innsbruck, Austria.
| | | | | |
Collapse
|
507
|
Newburger DE, Bulyk ML. UniPROBE: an online database of protein binding microarray data on protein-DNA interactions. Nucleic Acids Res 2008; 37:D77-82. [PMID: 18842628 PMCID: PMC2686578 DOI: 10.1093/nar/gkn660] [Citation(s) in RCA: 285] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The UniPROBE (Universal PBM Resource for Oligonucleotide Binding Evaluation) database hosts data generated by universal protein binding microarray (PBM) technology on the in vitro DNA-binding specificities of proteins. This initial release of the UniPROBE database provides a centralized resource for accessing comprehensive PBM data on the preferences of proteins for all possible sequence variants (‘words’) of length k (‘k-mers’), as well as position weight matrix (PWM) and graphical sequence logo representations of the k-mer data. In total, the database hosts DNA-binding data for over 175 nonredundant proteins from a diverse collection of organisms, including the prokaryote Vibrio harveyi, the eukaryotic malarial parasite Plasmodium falciparum, the parasitic Apicomplexan Cryptosporidium parvum, the yeast Saccharomyces cerevisiae, the worm Caenorhabditis elegans, mouse and human. Current web tools include a text-based search, a function for assessing motif similarity between user-entered data and database PWMs, and a function for locating putative binding sites along user-entered nucleotide sequences. The UniPROBE database is available at http://thebrain.bwh.harvard.edu/uniprobe/.
Collapse
Affiliation(s)
- Daniel E Newburger
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School and Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, MA 02115, USA
| | | |
Collapse
|
508
|
Jacox E, Elnitski L. Finding Occurrences of Relevant Functional Elements in Genomic Signatures. INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE 2008; 2:599-606. [PMID: 20046539 PMCID: PMC2800375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
For genomic applications, signature-finding algorithms identify over-represented signatures (words) in collections of DNA sequences. The results can be presented as a specific sequence of bases, a consensus sequence showing possible combination of bases, or a matrix of weighted possibilities at each position. These results are often compared to a biological set of binding sites (i.e., known functional elements), which are usually represented as weighted matrices. The comparison is made by scoring the signatures against each weight matrix to identify the best option for a positive hit. However, this approach can misclassify results when applied to short sequences, which are a frequent result of signature finders. We describe a novel method using a window around the original sequences (those which the signature is based upon) to improve the comparison and identify a more significant measure of similarity. In doing so, our method transforms a list of DNA signatures into a resource of characterized binding sites with known functional roles and identifies novel elements in need of further elucidation.
Collapse
|
509
|
Kumaki Y, Ukai-Tadenuma M, Uno KID, Nishio J, Masumoto KH, Nagano M, Komori T, Shigeyoshi Y, Hogenesch JB, Ueda HR. Analysis and synthesis of high-amplitude Cis-elements in the mammalian circadian clock. Proc Natl Acad Sci U S A 2008; 105:14946-51. [PMID: 18815372 PMCID: PMC2553039 DOI: 10.1073/pnas.0802636105] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2008] [Indexed: 01/06/2023] Open
Abstract
Mammalian circadian clocks consist of regulatory loops mediated by Clock/Bmal1-binding elements, DBP/E4BP4 binding elements, and RevErbA/ROR binding elements. As a step toward system-level understanding of the dynamic transcriptional regulation of the oscillator, we constructed and used a mammalian promoter/enhancer database (http://promoter.cdb.riken.jp/) with computational models of the Clock/Bmal1-binding elements, DBP/E4BP4 binding elements, and RevErbA/ROR binding elements to predict new targets of the clock and subsequently validated these targets at the level of the cell and organism. We further demonstrated the predictive nature of these models by generating and testing synthetic regulatory elements that do not occur in nature and showed that these elements produced high-amplitude circadian gene regulation. Biochemical experiments to characterize these synthetic elements revealed the importance of the affinity balance between transactivators and transrepressors in generating high-amplitude circadian transcriptional output. These results highlight the power of comparative genomics approaches for system-level identification and knowledge-based design of dynamic regulatory circuits.
Collapse
Affiliation(s)
- Yuichi Kumaki
- *Laboratory for Systems Biology and
- INTEC Systems Institute, Inc., 1-3-3 Shinsuna, Koto-ku, Tokyo 136-0075, Japan
| | | | - Ken-ichiro D. Uno
- Functional Genomics Unit, Center for Developmental Biology, RIKEN, 2-2-3 Minatojima-Minamimachi, Chuo-ku, Kobe 650-0047, Japan
| | - Junko Nishio
- Functional Genomics Unit, Center for Developmental Biology, RIKEN, 2-2-3 Minatojima-Minamimachi, Chuo-ku, Kobe 650-0047, Japan
| | - Koh-hei Masumoto
- *Laboratory for Systems Biology and
- Department of Anatomy and Neurobiology, Kinki University School of Medicine, 377-2 Ohno-Higashi, Osaka-Sayama, Osaka 589-8511, Japan; and
| | - Mamoru Nagano
- Department of Anatomy and Neurobiology, Kinki University School of Medicine, 377-2 Ohno-Higashi, Osaka-Sayama, Osaka 589-8511, Japan; and
| | - Takashi Komori
- INTEC Systems Institute, Inc., 1-3-3 Shinsuna, Koto-ku, Tokyo 136-0075, Japan
| | - Yasufumi Shigeyoshi
- Department of Anatomy and Neurobiology, Kinki University School of Medicine, 377-2 Ohno-Higashi, Osaka-Sayama, Osaka 589-8511, Japan; and
| | - John B. Hogenesch
- Institute for Translational Medicine and Therapeutics and the Department of Pharmacology, University of Pennsylvania School of Medicine, 810 Biomedical Research Building II/III, 421 Curie Boulevard, Philadelphia, PA 19104-6160
| | - Hiroki R. Ueda
- *Laboratory for Systems Biology and
- Functional Genomics Unit, Center for Developmental Biology, RIKEN, 2-2-3 Minatojima-Minamimachi, Chuo-ku, Kobe 650-0047, Japan
| |
Collapse
|
510
|
Fulp CT, Cho G, Marsh ED, Nasrallah IM, Labosky PA, Golden JA. Identification of Arx transcriptional targets in the developing basal forebrain. Hum Mol Genet 2008; 17:3740-60. [PMID: 18799476 PMCID: PMC2581427 DOI: 10.1093/hmg/ddn271] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Mutations in the aristaless-related homeobox (ARX) gene are associated with multiple neurologic disorders in humans. Studies in mice indicate Arx plays a role in neuronal progenitor proliferation and development of the cerebral cortex, thalamus, hippocampus, striatum, and olfactory bulbs. Specific defects associated with Arx loss of function include abnormal interneuron migration and subtype differentiation. How disruptions in ARX result in human disease and how loss of Arx in mice results in these phenotypes remains poorly understood. To gain insight into the biological functions of Arx, we performed a genome-wide expression screen to identify transcriptional changes within the subpallium in the absence of Arx. We have identified 84 genes whose expression was dysregulated in the absence of Arx. This population was enriched in genes involved in cell migration, axonal guidance, neurogenesis, and regulation of transcription and includes genes implicated in autism, epilepsy, and mental retardation; all features recognized in patients with ARX mutations. Additionally, we found Arx directly repressed three of the identified transcription factors: Lmo1, Ebf3 and Shox2. To further understand how the identified genes are involved in neural development, we used gene set enrichment algorithms to compare the Arx gene regulatory network (GRN) to the Dlx1/2 GRN and interneuron transcriptome. These analyses identified a subset of genes in the Arx GRN that are shared with that of the Dlx1/2 GRN and that are enriched in the interneuron transcriptome. These data indicate Arx plays multiple roles in forebrain development, both dependent and independent of Dlx1/2, and thus provides further insights into the understanding of the mechanisms underlying the pathology of mental retardation and epilepsy phenotypes resulting from ARX mutations.
Collapse
Affiliation(s)
- Carl T Fulp
- Neuroscience Graduate Group, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA
| | | | | | | | | | | |
Collapse
|
511
|
Zhang ZD, Rozowsky J, Snyder M, Chang J, Gerstein M. Modeling ChIP sequencing in silico with applications. PLoS Comput Biol 2008; 4:e1000158. [PMID: 18725927 PMCID: PMC2507756 DOI: 10.1371/journal.pcbi.1000158] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2008] [Accepted: 07/15/2008] [Indexed: 11/22/2022] Open
Abstract
ChIP sequencing (ChIP-seq) is a new method for genomewide mapping of protein binding sites on DNA. It has generated much excitement in functional genomics. To score data and determine adequate sequencing depth, both the genomic background and the binding sites must be properly modeled. To develop a computational foundation to tackle these issues, we first performed a study to characterize the observed statistical nature of this new type of high-throughput data. By linking sequence tags into clusters, we show that there are two components to the distribution of tag counts observed in a number of recent experiments: an initial power-law distribution and a subsequent long right tail. Then we develop in silico ChIP-seq, a computational method to simulate the experimental outcome by placing tags onto the genome according to particular assumed distributions for the actual binding sites and for the background genomic sequence. In contrast to current assumptions, our results show that both the background and the binding sites need to have a markedly nonuniform distribution in order to correctly model the observed ChIP-seq data, with, for instance, the background tag counts modeled by a gamma distribution. On the basis of these results, we extend an existing scoring approach by using a more realistic genomic-background model. This enables us to identify transcription-factor binding sites in ChIP-seq data in a statistically rigorous fashion. ChIP-seq is an apt combination of chromosome immunoprecipitation and next-generation sequencing to identify transcription factor binding sites in vivo on the whole-genome scale. Since its advent, this new method has generated much excitement in the field of functional genomics. Proper computational modeling of the ChIP-seq process is needed for both data scoring and determination of adequate sequencing depth, as it provides the computational foundation for analyzing ChIP-seq data. In our study, we show the characteristics of ChIP-seq data and present in silico ChIP sequencing, a computational method to simulate the experimental outcome. On the basis of our data characterization, we observed transcription factor binding sites with excessive enrichment of sequence tags. Our simulation results reveal that both the genomic background and the binding sites are not uniform. On the basis of our simulation results, we propose a statistical procedure using the more realistic genomic background model to identify binding sites in ChIP-seq data.
Collapse
Affiliation(s)
- Zhengdong D. Zhang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
| | - Joel Rozowsky
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
| | - Michael Snyder
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, Connecticut, United States of America
| | - Joseph Chang
- Department of Statistics, Yale University, New Haven, Connecticut, United States of America
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Computer Science, Yale University, New Haven, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
512
|
Berger MF, Badis G, Gehrke AR, Talukder S, Philippakis AA, Peña-Castillo L, Alleyne TM, Mnaimneh S, Botvinnik OB, Chan ET, Khalid F, Zhang W, Newburger D, Jaeger SA, Morris QD, Bulyk ML, Hughes TR. Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell 2008; 133:1266-76. [PMID: 18585359 DOI: 10.1016/j.cell.2008.05.024] [Citation(s) in RCA: 488] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2007] [Revised: 03/10/2008] [Accepted: 05/12/2008] [Indexed: 12/29/2022]
Abstract
Most homeodomains are unique within a genome, yet many are highly conserved across vast evolutionary distances, implying strong selection on their precise DNA-binding specificities. We determined the binding preferences of the majority (168) of mouse homeodomains to all possible 8-base sequences, revealing rich and complex patterns of sequence specificity and showing that there are at least 65 distinct homeodomain DNA-binding activities. We developed a computational system that successfully predicts binding sites for homeodomain proteins as distant from mouse as Drosophila and C. elegans, and we infer full 8-mer binding profiles for the majority of known animal homeodomains. Our results provide an unprecedented level of resolution in the analysis of this simple domain structure and suggest that variation in sequence recognition may be a factor in its functional diversity and evolutionary success.
Collapse
Affiliation(s)
- Michael F Berger
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
513
|
The commensal Streptococcus salivarius K12 downregulates the innate immune responses of human epithelial cells and promotes host-microbe homeostasis. Infect Immun 2008; 76:4163-75. [PMID: 18625732 DOI: 10.1128/iai.00188-08] [Citation(s) in RCA: 195] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Streptococcus salivarius is an early colonizer of human oral and nasopharyngeal epithelia, and strain K12 has reported probiotic effects. An emerging paradigm indicates that commensal bacteria downregulate immune responses through the action on NF-kappaB signaling pathways, but additional mechanisms underlying probiotic actions are not well understood. Our objective here was to identify host genes specifically targeted by K12 by comparing their responses with responses elicited by pathogens and to determine if S. salivarius modulates epithelial cell immune responses. RNA was extracted from human bronchial epithelial cells (16HBE14O- cells) cocultured with K12 or bacterial pathogens. cDNA was hybridized to a human 21K oligonucleotide-based array. Data were analyzed using ArrayPipe, InnateDB, PANTHER, and oPOSSUM. Interleukin 8 (IL-8) and growth-regulated oncogene alpha (Groalpha) secretion were determined by enzyme-linked immunosorbent assay. It was demonstrated that S. salivarius K12 specifically altered the expression of 565 host genes, particularly those involved in multiple innate defense pathways, general epithelial cell function and homeostasis, cytoskeletal remodeling, cell development and migration, and signaling pathways. It inhibited baseline IL-8 secretion and IL-8 responses to LL-37, Pseudomonas aeruginosa, and flagellin in epithelial cells and attenuated Groalpha secretion in response to flagellin. Immunosuppression was coincident with the inhibition of activation of the NF-kappaB pathway. Thus, the commensal and probiotic behaviors of S. salivarius K12 are proposed to be due to the organism (i) eliciting no proinflammatory response, (ii) stimulating an anti-inflammatory response, and (iii) modulating genes associated with adhesion to the epithelial layer and homeostasis. S. salivarius K12 might thereby ensure that it is tolerated by the host and maintained on the epithelial surface while actively protecting the host from inflammation and apoptosis induced by pathogens.
Collapse
|
514
|
Won KJ, Sandelin A, Marstrand TT, Krogh A. Modeling promoter grammars with evolving hidden Markov models. ACTA ACUST UNITED AC 2008; 24:1669-75. [PMID: 18535083 DOI: 10.1093/bioinformatics/btn254] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Describing and modeling biological features of eukaryotic promoters remains an important and challenging problem within computational biology. The promoters of higher eukaryotes in particular display a wide variation in regulatory features, which are difficult to model. Often several factors are involved in the regulation of a set of co-regulated genes. If so, promoters can be modeled with connected regulatory features, where the network of connections is characteristic for a particular mode of regulation. RESULTS With the goal of automatically deciphering such regulatory structures, we present a method that iteratively evolves an ensemble of regulatory grammars using a hidden Markov Model (HMM) architecture composed of interconnected blocks representing transcription factor binding sites (TFBSs) and background regions of promoter sequences. The ensemble approach reduces the risk of overfitting and generally improves performance. We apply this method to identify TFBSs and to classify promoters preferentially expressed in macrophages, where it outperforms other methods due to the increased predictive power given by the grammar. AVAILABILITY The software and the datasets are available from http://modem.ucsd.edu/won/eHMM.tar.gz
Collapse
Affiliation(s)
- Kyoung-Jae Won
- The Bioinformatics Centre, Department of Biology & Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen N, Denmark
| | | | | | | |
Collapse
|
515
|
Hooghe B, Hulpiau P, van Roy F, De Bleser P. ConTra: a promoter alignment analysis tool for identification of transcription factor binding sites across species. Nucleic Acids Res 2008; 36:W128-32. [PMID: 18453628 PMCID: PMC2447729 DOI: 10.1093/nar/gkn195] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Transcription factors (TFs) are key components in signaling pathways, and the presence of their binding sites in the promoter regions of DNA is essential for their regulation of the expression of the corresponding genes. Orthologous promoter sequences are commonly used to increase the specificity with which potentially functional transcription factor binding sites (TFBSs) are recognized and to detect possibly important similarities or differences between the different species. The ConTra (conserved TFBSs) web server provides the biologist at the bench with a user-friendly tool to interactively visualize TFBSs predicted using either TransFac (1) or JASPAR (2) position weight matrix libraries, on a promoter alignment of choice. The visualization can be preceded by a simple scoring analysis to explore which TFs are the most likely to bind to the promoter of interest. The ConTra web server is available at http://bioit.dmbr.ugent.be/ConTra/index.php.
Collapse
Affiliation(s)
- Bart Hooghe
- Bioinformatics Core Facility, VIB, Department of Molecular Biology, Ghent University and Department for Molecular Biomedical Research, VIB, B-9052 Ghent, Belgium
| | | | | | | |
Collapse
|
516
|
Marstrand TT, Frellsen J, Moltke I, Thiim M, Valen E, Retelska D, Krogh A. Asap: a framework for over-representation statistics for transcription factor binding sites. PLoS One 2008; 3:e1623. [PMID: 18286180 PMCID: PMC2229843 DOI: 10.1371/journal.pone.0001623] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2007] [Accepted: 01/21/2008] [Indexed: 12/02/2022] Open
Abstract
Background In studies of gene regulation the efficient computational detection of over-represented transcription factor binding sites is an increasingly important aspect. Several published methods can be used for testing whether a set of hypothesised co-regulated genes share a common regulatory regime based on the occurrence of the modelled transcription factor binding sites. However there is little or no information available for guiding the end users choice of method. Furthermore it would be necessary to obtain several different software programs from various sources to make a well-founded choice. Methodology We introduce a software package, Asap, for fast searching with position weight matrices that include several standard methods for assessing over-representation. We have compared the ability of these methods to detect over-represented transcription factor binding sites in artificial promoter sequences. Controlling all aspects of our input data we are able to identify the optimal statistics across multiple threshold values and for sequence sets containing different distributions of transcription factor binding sites. Conclusions We show that our implementation is significantly faster than more naïve scanning algorithms when searching with many weight matrices in large sequence sets. When comparing the various statistics, we show that those based on binomial over-representation and Fisher's exact test performs almost equally good and better than the others. An online server is available at http://servers.binf.ku.dk/asap/.
Collapse
Affiliation(s)
- Troels T. Marstrand
- Bioinformatics Centre, Department of Molecular Biology and Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark
- *E-mail:
| | - Jes Frellsen
- Bioinformatics Centre, Department of Molecular Biology and Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark
| | - Ida Moltke
- Bioinformatics Centre, Department of Molecular Biology and Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark
| | - Martin Thiim
- Bioinformatics Centre, Department of Molecular Biology and Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark
| | - Eivind Valen
- Bioinformatics Centre, Department of Molecular Biology and Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark
| | - Dorota Retelska
- Swiss Institute of Bioinformatics, Swiss Institute for Experimental Cancer Research (ISREC), Epalinges, Switzerland
| | - Anders Krogh
- Bioinformatics Centre, Department of Molecular Biology and Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
517
|
Abstract
In this article, we have applied the ChIP-on-chip approach to pursue a large scale identification of ERalpha- and ERbeta-binding DNA regions in intact chromatin. We show that there is a high degree of overlap between the regions identified as bound by ERalpha and ERbeta, respectively, but there are also regions that are bound by ERalpha only in the presence of ERbeta, as well as regions that are selectively bound by either receptor. Analysis of bound regions shows that regions bound by ERalpha have distinct properties in terms of genome landscape, sequence features, and conservation compared with regions that are bound by ERbeta. ERbeta-bound regions are, as a group, located more closely to transcription start sites. ERalpha- and ERbeta-bound regions differ in sequence properties, with ERalpha-bound regions having an overrepresentation of TA-rich motifs including forkhead binding sites and ERbeta-bound regions having a predominance of classical estrogen response elements (EREs) and GC-rich motifs. Differences in the properties of ER bound regions might explain some of the differences in gene expression programs and physiological effects shown by the respective estrogen receptors.
Collapse
|
518
|
Crisci V, Orso CA. [Involvement and paralysis of the ulnar nerve in elbow arthrosis]. Genome Biol 1969; 16:22. [PMID: 25723102 PMCID: PMC4310165 DOI: 10.1186/s13059-014-0560-6] [Citation(s) in RCA: 536] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2014] [Accepted: 12/03/2014] [Indexed: 12/19/2022] Open
Abstract
The FANTOM5 project investigates transcription initiation activities in more than 1,000 human and mouse primary cells, cell lines and tissues using CAGE. Based on manual curation of sample information and development of an ontology for sample classification, we assemble the resulting data into a centralized data resource (http://fantom.gsc.riken.jp/5/). This resource contains web-based tools and data-access points for the research community to search and extract data related to samples, genes, promoter activities, transcription factors and enhancers across the FANTOM5 atlas.
Collapse
|