1
|
Farahmand S, Riley T, Zarringhalam K. ModEx: A text mining system for extracting mode of regulation of transcription factor-gene regulatory interaction. J Biomed Inform 2019; 102:103353. [PMID: 31857203 DOI: 10.1016/j.jbi.2019.103353] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2019] [Revised: 11/22/2019] [Accepted: 12/10/2019] [Indexed: 10/25/2022]
Abstract
BACKGROUND Transcription factors (TFs) are proteins that are fundamental to transcription and regulation of gene expression. Each TF may regulate multiple genes and each gene may be regulated by multiple TFs. TFs can act as either activator or repressor of gene expression. This complex network of interactions between TFs and genes underlies many developmental and biological processes and is implicated in several human diseases such as cancer. Hence deciphering the network of TF-gene interactions with information on mode of regulation (activation vs. repression) is an important step toward understanding the regulatory pathways that underlie complex traits. There are many experimental, computational, and manually curated databases of TF-gene interactions. In particular, high-throughput ChIP-Seq datasets provide a large-scale map or transcriptional regulatory interactions. However, these interactions are not annotated with information on context and mode of regulation. Such information is crucial to gain a global picture of gene regulatory mechanisms and can aid in developing machine learning models for applications such as biomarker discovery, prediction of response to therapy, and precision medicine. METHODS In this work, we introduce a text-mining system to annotate ChIP-Seq derived interaction with such meta data through mining PubMed articles. We evaluate the performance of our system using gold standard small scale manually curated databases. RESULTS Our results show that the method is able to accurately extract mode of regulation with F-score 0.77 on TRRUST curated interaction and F-score 0.96 on intersection of TRUSST and ChIP-network. We provide a HTTP REST API for our code to facilitate usage. Availibility: Source code and datasets are available for download on GitHub: https://github.com/samanfrm/modex.
Collapse
Affiliation(s)
- Saman Farahmand
- Computational Sciences PhD program, University of Massachusetts Boston, Boston, USA; Department of Biology, University of Massachusetts Boston, Boston, USA
| | - Todd Riley
- Department of Biology, University of Massachusetts Boston, Boston, USA
| | | |
Collapse
|
2
|
Farahmand S, O’Connor C, Macoska JA, Zarringhalam K. Causal Inference Engine: a platform for directional gene set enrichment analysis and inference of active transcriptional regulators. Nucleic Acids Res 2019; 47:11563-11573. [PMID: 31701125 PMCID: PMC7145661 DOI: 10.1093/nar/gkz1046] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2019] [Revised: 09/19/2019] [Accepted: 10/28/2019] [Indexed: 02/07/2023] Open
Abstract
Inference of active regulatory mechanisms underlying specific molecular and environmental perturbations is essential for understanding cellular response. The success of inference algorithms relies on the quality and coverage of the underlying network of regulator-gene interactions. Several commercial platforms provide large and manually curated regulatory networks and functionality to perform inference on these networks. Adaptation of such platforms for open-source academic applications has been hindered by the lack of availability of accurate, high-coverage networks of regulatory interactions and integration of efficient causal inference algorithms. In this work, we present CIE, an integrated platform for causal inference of active regulatory mechanisms form differential gene expression data. Using a regularized Gaussian Graphical Model, we construct a transcriptional regulatory network by integrating publicly available ChIP-seq experiments with gene-expression data from tissue-specific RNA-seq experiments. Our GGM approach identifies high confidence transcription factor (TF)-gene interactions and annotates the interactions with information on mode of regulation (activation vs. repression). Benchmarks against manually curated databases of TF-gene interactions show that our method can accurately detect mode of regulation. We demonstrate the ability of our platform to identify active transcriptional regulators by using controlled in vitro overexpression and stem-cell differentiation studies and utilize our method to investigate transcriptional mechanisms of fibroblast phenotypic plasticity.
Collapse
Affiliation(s)
- Saman Farahmand
- Computational Sciences PhD program, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Corey O’Connor
- Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Jill A Macoska
- Center for Personalized Cancer Therapy, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Kourosh Zarringhalam
- Computational Sciences PhD program, University of Massachusetts Boston, Boston, MA 02125, USA
- Department of Mathematics, University of Massachusetts Boston, Boston, MA 02125, USA
| |
Collapse
|
3
|
Abstract
Designing the expression cassettes with desired properties remains the most important consideration of gene engineering technology. One of the challenges for predictive gene expression is the modeling of synthetic gene switches to regulate one or more target genes which would directly respond to specific chemical, environmental, and physiological stimuli. Assessment of natural promoter, high-throughput sequencing, and modern biotech inventory aided in deciphering the structure of cis elements and molding the native cis elements into desired synthetic promoter. Synthetic promoters which are molded by rearrangement of cis motifs can greatly benefit plant biotechnology applications. This review gives a glimpse of the manual in vivo gene regulation through synthetic promoters. It summarizes the integrative design strategy of synthetic promoters and enumerates five approaches for constructing synthetic promoters. Insights into the pattern of cis regulatory elements in the pursuit of desirable "gene switches" to date has also been reevaluated. Joint strategies of bioinformatics modeling and randomized biochemical synthesis are addressed in an effort to construct synthetic promoters for intricate gene regulation.
Collapse
|
4
|
Lichtblau Y, Zimmermann K, Haldemann B, Lenze D, Hummel M, Leser U. Comparative assessment of differential network analysis methods. Brief Bioinform 2017; 18:837-850. [PMID: 27473063 DOI: 10.1093/bib/bbw061] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Indexed: 12/31/2022] Open
Abstract
Differential network analysis (DiNA) denotes a recent class of network-based Bioinformatics algorithms which focus on the differences in network topologies between two states of a cell, such as healthy and disease, to identify key players in the discriminating biological processes. In contrast to conventional differential analysis, DiNA identifies changes in the interplay between molecules, rather than changes in single molecules. This ability is especially important in cases where effectors are changed, e.g. mutated, but their expression is not. A number of different DiNA approaches have been proposed, yet a comparative assessment of their performance in different settings is still lacking. In this paper, we evaluate 10 different DiNA algorithms regarding their ability to recover genetic key players from transcriptome data. We construct high-quality regulatory networks and enrich them with co-expression data from four different types of cancer. Next, we assess the results of applying DiNA algorithms on these data sets using a gold standard list (GSL). We find that local DiNA algorithms are generally superior to global algorithms, and that all DiNA algorithms outperform conventional differential expression analysis. We also assess the ability of DiNA methods to exploit additional knowledge in the underlying cellular networks. To this end, we enrich the cancer-type specific networks with known regulatory miRNAs and compare the algorithms performance in networks with and without miRNA. We find that including miRNAs consistently and considerably improves the performance of almost all tested algorithms. Our results underline the advantages of comprehensive cell models for the analysis of -omics data.
Collapse
|
5
|
Human Genes Encoding Transcription Factors and Chromatin-Modifying Proteins Have Low Levels of Promoter Polymorphism: A Study of 1000 Genomes Project Data. Int J Genomics 2015; 2015:260159. [PMID: 26417590 PMCID: PMC4568383 DOI: 10.1155/2015/260159] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2015] [Accepted: 07/29/2015] [Indexed: 12/15/2022] Open
Abstract
The expression level of each gene is controlled by its regulatory regions, which determine the precise regulation in a tissue-specific manner, according to the developmental stage of the body and the necessity of a response to external stimuli. Nucleotide substitutions in regulatory gene regions may modify the affinity of transcription factors to their specific DNA binding sites, affecting the transcription rates of genes. In our previous research, we found that genes controlling the sensory perception of smell and genes involved in antigen processing and presentation were overrepresented significantly among genes with high SNP contents in their promoter regions. The goal of our study was to reveal functional features of human genes containing extremely small numbers of SNPs in promoter regions. Two functional groups were found to be overrepresented among genes whose promoters did not contain SNPs: (1) genes involved in gene-specific transcription and (2) genes controlling chromatin organization. We revealed that the 5′-regulatory regions of genes encoding transcription factors and chromatin-modifying proteins were characterized by reduced genetic variability. One important exception from this rule refers to genes encoding transcription factors with zinc-coordinating DNA-binding domains (DBDs), which underwent extensive expansion in vertebrates, particularly, in primate evolution. Hence, we obtained new evidence for evolutionary forces shaping variability in 5′-regulatory regions of genes.
Collapse
|
6
|
Ignatieva EV, Levitsky VG, Yudin NS, Moshkin MP, Kolchanov NA. Genetic basis of olfactory cognition: extremely high level of DNA sequence polymorphism in promoter regions of the human olfactory receptor genes revealed using the 1000 Genomes Project dataset. Front Psychol 2014; 5:247. [PMID: 24715883 PMCID: PMC3970011 DOI: 10.3389/fpsyg.2014.00247] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2013] [Accepted: 03/05/2014] [Indexed: 11/13/2022] Open
Abstract
The molecular mechanism of olfactory cognition is very complicated. Olfactory cognition is initiated by olfactory receptor proteins (odorant receptors), which are activated by olfactory stimuli (ligands). Olfactory receptors are the initial player in the signal transduction cascade producing a nerve impulse, which is transmitted to the brain. The sensitivity to a particular ligand depends on the expression level of multiple proteins involved in the process of olfactory cognition: olfactory receptor proteins, proteins that participate in signal transduction cascade, etc. The expression level of each gene is controlled by its regulatory regions, and especially, by the promoter [a region of DNA about 100–1000 base pairs long located upstream of the transcription start site (TSS)]. We analyzed single nucleotide polymorphisms using human whole-genome data from the 1000 Genomes Project and revealed an extremely high level of single nucleotide polymorphisms in promoter regions of olfactory receptor genes and HLA genes. We hypothesized that the high level of polymorphisms in olfactory receptor promoters was responsible for the diversity in regulatory mechanisms controlling the expression levels of olfactory receptor proteins. Such diversity of regulatory mechanisms may cause the great variability of olfactory cognition of numerous environmental olfactory stimuli perceived by human beings (air pollutants, human body odors, odors in culinary etc.). In turn, this variability may provide a wide range of emotional and behavioral reactions related to the vast variety of olfactory stimuli.
Collapse
Affiliation(s)
- Elena V Ignatieva
- Laboratory of Evolutionary Bioinformatics and Theoretical Genetics, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences Novosibirsk, Russia ; Department of Natural Science, Novosibirsk State University Novosibirsk, Russia
| | - Victor G Levitsky
- Department of Natural Science, Novosibirsk State University Novosibirsk, Russia ; Laboratory of Molecular-Genetic Systems, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences Novosibirsk, Russia
| | - Nikolay S Yudin
- Department of Natural Science, Novosibirsk State University Novosibirsk, Russia ; Laboratory of Human Molecular Genetics, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences Novosibirsk, Russia
| | - Mikhail P Moshkin
- Department of Natural Science, Novosibirsk State University Novosibirsk, Russia ; Laboratory of Mammalian Ecological Genetics, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences Novosibirsk, Russia
| | - Nikolay A Kolchanov
- Department of Natural Science, Novosibirsk State University Novosibirsk, Russia ; Department of Systems Biology, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences Novosibirsk, Russia ; National Research centre "Kurchatov Institute" Moscow, Russia
| |
Collapse
|
7
|
Kumar GR, Sakthivel K, Sundaram R, Neeraja C, Balachandran S, Rani NS, Viraktamath B, Madhav M. Allele mining in crops: Prospects and potentials. Biotechnol Adv 2010; 28:451-61. [DOI: 10.1016/j.biotechadv.2010.02.007] [Citation(s) in RCA: 88] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2009] [Revised: 09/21/2009] [Accepted: 09/25/2009] [Indexed: 12/26/2022]
|
8
|
Ma'ayan A. Network integration and graph analysis in mammalian molecular systems biology. IET Syst Biol 2009; 2:206-21. [PMID: 19045817 DOI: 10.1049/iet-syb:20070075] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Abstraction of intracellular biomolecular interactions into networks is useful for data integration and graph analysis. Network analysis tools facilitate predictions of novel functions for proteins, prediction of functional interactions and identification of intracellular modules. These efforts are linked with drug and phenotype data to accelerate drug-target and biomarker discovery. This review highlights the currently available varieties of mammalian biomolecular networks, and surveys methods and tools to construct, compare, integrate, visualise and analyse such networks.
Collapse
Affiliation(s)
- A Ma'ayan
- Mount Sinai School of Medicine, Department of Pharmacology and Systems Therapeutics, New York, NY 10029-6574, USA.
| |
Collapse
|
9
|
Temple MD, Murray V. Footprinting the 'essential regulatory region' of the retinoblastoma gene promoter in intact human cells. Int J Biochem Cell Biol 2005; 37:665-78. [PMID: 15618023 DOI: 10.1016/j.biocel.2004.09.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2004] [Revised: 08/27/2004] [Accepted: 09/02/2004] [Indexed: 02/04/2023]
Abstract
The retinoblastoma tumour suppressor protein is a key cell cycle regulator. Protein-DNA interactions at the retinoblastoma (RB1) promoter, including the 'essential regulatory region', were investigated using novel DNA-targeted nitrogen mustards in intact human cells. The footprinting experiments were carried out in two different environments: in intact HeLa and K562 cells where the access of DNA-targeted probes to chromatin is affected by cellular protein-DNA interactions associated with gene regulation; and in purified DNA where their access is unencumbered by protein-DNA interactions. Using the ligation-mediated PCR (LMPCR) technique, the sites of damage were determined at base pair resolution on DNA sequencing gels. Our results demonstrate that, in intact cells, footprints were observed at the E2F, ATF and RBF1/Sp1 DNA binding motifs in the RB1 promoter. In addition, a novel footprint was observed at a previously unidentified cycle homology region (CHR) and at four uncharacterised protein-DNA binding sites. In further experiments, nitrogen mustard-treated cells were FACS sorted into G1, S and G2/M phases of the cell cycle prior to LMPCR analysis. Expression of the RB1 gene is cell cycle-regulated and footprinting studies of the promoter in FACS-sorted cells indicated that transcription factor binding at the GC box, CHR binding motif and the 'essential regulatory region' are cell cycle dependent.
Collapse
Affiliation(s)
- Mark D Temple
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | | |
Collapse
|
10
|
Rombauts S, Florquin K, Lescot M, Marchal K, Rouzé P, van de Peer Y. Computational approaches to identify promoters and cis-regulatory elements in plant genomes. PLANT PHYSIOLOGY 2003; 132:1162-76. [PMID: 12857799 PMCID: PMC167057 DOI: 10.1104/pp.102.017715] [Citation(s) in RCA: 77] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2002] [Revised: 01/10/2003] [Accepted: 03/17/2003] [Indexed: 05/19/2023]
Abstract
The identification of promoters and their regulatory elements is one of the major challenges in bioinformatics and integrates comparative, structural, and functional genomics. Many different approaches have been developed to detect conserved motifs in a set of genes that are either coregulated or orthologous. However, although recent approaches seem promising, in general, unambiguous identification of regulatory elements is not straightforward. The delineation of promoters is even harder, due to its complex nature, and in silico promoter prediction is still in its infancy. Here, we review the different approaches that have been developed for identifying promoters and their regulatory elements. We discuss the detection of cis-acting regulatory elements using word-counting or probabilistic methods (so-called "search by signal" methods) and the delineation of promoters by considering both sequence content and structural features ("search by content" methods). As an example of search by content, we explored in greater detail the association of promoters with CpG islands. However, due to differences in sequence content, the parameters used to detect CpG islands in humans and other vertebrates cannot be used for plants. Therefore, a preliminary attempt was made to define parameters that could possibly define CpG and CpNpG islands in Arabidopsis, by exploring the compositional landscape around the transcriptional start site. To this end, a data set of more than 5,000 gene sequences was built, including the promoter region, the 5'-untranslated region, and the first introns and coding exons. Preliminary analysis shows that promoter location based on the detection of potential CpG/CpNpG islands in the Arabidopsis genome is not straightforward. Nevertheless, because the landscape of CpG/CpNpG islands differs considerably between promoters and introns on the one side and exons (whether coding or not) on the other, more sophisticated approaches can probably be developed for the successful detection of "putative" CpG and CpNpG islands in plants.
Collapse
Affiliation(s)
- Stephane Rombauts
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology, Ghent University, B-9000 Gent, Belgium
| | | | | | | | | | | |
Collapse
|
11
|
Zheng J, Wu J, Sun Z. An approach to identify over-represented cis-elements in related sequences. Nucleic Acids Res 2003; 31:1995-2005. [PMID: 12655017 PMCID: PMC152803 DOI: 10.1093/nar/gkg287] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2002] [Revised: 12/23/2002] [Accepted: 02/03/2003] [Indexed: 11/14/2022] Open
Abstract
Computational identification of transcription factor binding sites is an important research area of computational biology. Positional weight matrix (PWM) is a model to describe the sequence pattern of binding sites. Usually, transcription factor binding sites prediction methods based on PWMs require user-defined thresholds. The arbitrary threshold and also the relatively low specificity of the algorithm prevent the result of such an analysis from being properly interpreted. In this study, a method was developed to identify over-represented cis-elements with PWM-based similarity scores. Three sets of closely related promoters were analyzed, and only over- represented motifs with high PWM similarity scores were reported. The thresholds to evaluate the similarity scores to the PWMs of putative transcription factors binding sites can also be automatically determined during the analysis, which can also be used in further research with the same PWMs. The online program is available on the website: http://www.bioinfo.tsinghua.edu.cn/- zhengjsh/OTFBS/.
Collapse
Affiliation(s)
- Jiashun Zheng
- Institute of Bioinformatics, State Key Laboratory of Biomembrane and Membrane Biotechnology, Department of Biological Sciences and Biotechnology, Tsinghua University, Beijing 100084, China
| | | | | |
Collapse
|
12
|
Kim JT, Martinetz T, Polani D. Bioinformatic principles underlying the information content of transcription factor binding sites. J Theor Biol 2003; 220:529-44. [PMID: 12623284 DOI: 10.1006/jtbi.2003.3153] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Empirically, it has been observed in several cases that the information content of transcription factor binding site sequences (R(sequence)) approximately equals the information content of binding site positions (R(frequency)). A general framework for formal models of transcription factors and binding sites is developed to address this issue. Measures for information content in transcription factor binding sites are revisited and theoretic analyses are compared on this basis. These analyses do not lead to consistent results. A comparative review reveals that these inconsistent approaches do not include a transcription factor state space. Therefore, a state space for mathematically representing transcription factors with respect to their binding site recognition properties is introduced into the modelling framework. Analysis of the resulting comprehensive model shows that the structure of genome state space favours equality of R(sequence) and R(frequency) indeed, but the relation between the two information quantities also depends on the structure of the transcription factor state space. This might lead to significant deviations between R(sequence) and R(frequency). However, further investigation and biological arguments show that the effects of the structure of the transcription factor state space on the relation of R(sequence) and R(frequency) are strongly limited for systems which are autonomous in the sense that all DNA-binding proteins operating on the genome are encoded in the genome itself. This provides a theoretical explanation for the empirically observed equality.
Collapse
Affiliation(s)
- Jan T Kim
- Institut für Neuro- und Bioinformatik, Seelandstrasse 1a, 23569 Lübeck, Germany.
| | | | | |
Collapse
|
13
|
Fielden MR, Matthews JB, Fertuck KC, Halgren RG, Zacharewski TR. In silico approaches to mechanistic and predictive toxicology: an introduction to bioinformatics for toxicologists. Crit Rev Toxicol 2002; 32:67-112. [PMID: 11951993 DOI: 10.1080/20024091064183] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Bioinformatics, or in silico biology, is a rapidly growing field that encompasses the theory and application of computational approaches to model, predict, and explain biological function at the molecular level. This information rich field requires new skills and new understanding of genome-scale studies in order to take advantage of the rapidly increasing amount of sequence, expression, and structure information in public and private databases. Toxicologists are poised to take advantage of the large public databases in an effort to decipher the molecular basis of toxicity. With the advent of high-throughput sequencing and computational methodologies, expressed sequences can be rapidly detected and quantitated in target tissues by database searching. Novel genes can also be isolated in silico, while their function can be predicted and characterized by virtue of sequence homology to other known proteins. Genomic DNA sequence data can be exploited to predict target genes and their modes of regulation, as well as identify susceptible genotypes based on single nucleotide polymorphism data. In addition, highly parallel gene expression profiling technologies will allow toxicologists to mine large databases of gene expression data to discover molecular biomarkers and other diagnostic and prognostic genes or expression profiles. This review serves to introduce to toxicologists the concepts of in silico biology most relevant to mechanistic and predictive toxicology, while highlighting the applicability of in silico methods using select examples.
Collapse
Affiliation(s)
- Mark R Fielden
- Department of Biochemistry and Molecular Biology, National Food Safety and Toxicology Center, Michigan State University, East Lansing 48824, USA
| | | | | | | | | |
Collapse
|
14
|
Ananko EA, Podkolodny NL, Stepanenko IL, Ignatieva EV, Podkolodnaya OA, Kolchanov NA. GeneNet: a database on structure and functional organisation of gene networks. Nucleic Acids Res 2002; 30:398-401. [PMID: 11752348 PMCID: PMC99151 DOI: 10.1093/nar/30.1.398] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2001] [Revised: 10/19/2001] [Accepted: 10/19/2001] [Indexed: 11/14/2022] Open
Abstract
The GeneNet database is designed for accumulation of information on gene networks. Original technology applied in GeneNet enables description of not only a gene network structure and functional relationships between components, but also metabolic and signal transduction pathways. Specialised software, GeneNet Viewer, automatically displays the graphical diagram of gene networks described in the database. Current release 3.0 of GeneNet database contains descriptions of 25 gene networks, 945 proteins, 567 genes, 151 other substances and 1364 relationships between components of gene networks. Information distributed between 14 interlinked tables was obtained by annotating 968 scientific publications. The SRS-version of GeneNet database is freely available (http://wwwmgs.bionet.nsc.ru/mgs/systems/genenet/).
Collapse
Affiliation(s)
- E A Ananko
- Institute of Cytology and Genetics (Siberian Branch of the Russian Academy of Sciences), 10 Lavrentiev Avenue, Novosibirsk, 630090, Russia.
| | | | | | | | | | | |
Collapse
|
15
|
Liebich I, Bode J, Frisch M, Wingender E. S/MARt DB: a database on scaffold/matrix attached regions. Nucleic Acids Res 2002; 30:372-4. [PMID: 11752340 PMCID: PMC99064 DOI: 10.1093/nar/30.1.372] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
S/MARt DB, the S/MAR transaction database, is a relational database covering scaffold/matrix attached regions (S/MARs) and nuclear matrix proteins that are involved in the chromosomal attachment to the nuclear scaffold. The data are mainly extracted from original publications, but a World Wide Web interface for direct submissions is also available. S/MARt DB is closely linked to the TRANSFAC database on transcription factors and their binding sites. It is freely accessible through the World Wide Web (http://transfac.gbf.de/SMARtDB/) for non-profit research.
Collapse
Affiliation(s)
- Ines Liebich
- Research Group Bioinformatics, GBF, Mascheroder Weg 1, D-38124 Braunschweig, Germany.
| | | | | | | |
Collapse
|
16
|
Praz V, Périer R, Bonnard C, Bucher P. The Eukaryotic Promoter Database, EPD: new entry types and links to gene expression data. Nucleic Acids Res 2002; 30:322-4. [PMID: 11752326 PMCID: PMC99099 DOI: 10.1093/nar/30.1.322] [Citation(s) in RCA: 80] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The Eukaryotic Promoter Database (EPD) is an annotated, non-redundant collection of eukaryotic Pol II promoters, for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes a description of the initiation site mapping data, exhaustive cross-references to the EMBL nucleotide sequence database, SWISS-PROT, TRANSFAC and other databases, as well as bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis. World Wide Web-based interfaces have been developed which enable the user to view EPD entries in different formats, to select and extract promoter sequences according to a variety of criteria, and to navigate to related databases exploiting different cross-references. The EPD web site also features yearly updated base frequency matrices for major eukaryotic promoter elements. EPD can be accessed at http://www.epd.isb-sib.ch.
Collapse
Affiliation(s)
- Viviane Praz
- Swiss Institute of Bioinformatics and Swiss Institute for Experimental Cancer Research, Ch. des Boveresses 155, 1066-Epalinges s/Lausanne, Switzerland
| | | | | | | |
Collapse
|
17
|
Kolchanov NA, Ignatieva EV, Ananko EA, Podkolodnaya OA, Stepanenko IL, Merkulova TI, Pozdnyakov MA, Podkolodny NL, Naumochkin AN, Romashchenko AG. Transcription Regulatory Regions Database (TRRD): its status in 2002. Nucleic Acids Res 2002; 30:312-7. [PMID: 11752324 PMCID: PMC99088 DOI: 10.1093/nar/30.1.312] [Citation(s) in RCA: 78] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Transcription Regulatory Regions Database (TRRD) is an informational resource containing an integrated description of the gene transcription regulation. An entry of the database corresponds to a gene and contains the data on localization and functions of the transcription regulatory regions as well as gene expression patterns. TRRD contains only experimental data that are inputted into the database through annotating scientific publication. TRRD release 6.0 comprises the information on 1167 genes, 5537 transcription factor binding sites, 1714 regulatory regions, 14 locus control regions and 5335 expression patterns obtained through annotating 3898 scientific papers. This information is arranged in seven databases: TRRDGENES (general gene description), TRRDLCR (locus control regions); TRRDUNITS (regulatory regions: promoters, enhancers, silencers, etc.), TRRDSITES (transcription factor binding sites), TRRDFACTORS (transcription factors), TRRDEXP (expression patterns) and TRRDBIB (experimental publications). Sequence Retrieval System (SRS) is used as a basic tool for navigating and searching TRRD and integrating it with external informational and software resources. The visualization tool, TRRD Viewer, provides the information representation in a form of maps of gene regulatory regions. The option allowing nucleotide sequences to be searched for according to their homology using BLAST is also included. TRRD is available at http://www.bionet.nsc.ru/trrd/.
Collapse
Affiliation(s)
- N A Kolchanov
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Lavrentieva 10, Novosibirsk 630090, Russia.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Stark K, Kirk DL, Schmitt R. Two enhancers and one silencer located in the introns of regA control somatic cell differentiation in Volvox carteri. Genes Dev 2001; 15:1449-60. [PMID: 11390364 PMCID: PMC312706 DOI: 10.1101/gad.195101] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
The regA gene plays a central role in germ-soma differentiation of Volvox carteri by suppressing all reproductive functions in somatic cells. Here we show that the minimal promoter of regA consists of only 42 bp immediately upstream of the transcription start site, and that it contains no discernible regulatory elements. However, introns 3 and 5 are both required for regA expression in somatic cells, and intron 7 is essential for silencing regA in gonidia (asexual reproductive cells). A regA gene lacking intron 7 rescues the normal phenotype of mutant somatic cells, but also results in gonidia that reproduce only weakly and soon die out. The same phenotype is observed when a regA gene containing intron 7 is placed under control of a constitutive promoter, suggesting that the silencing activity of intron 7 is promoter specific. Intron 7 is unusual in that it contains a potential ORF that is in frame with exons 7 and 8, and some transcripts are produced in which intron 7 is retained. However, a regulatory role for the intron 7 translation product can be ruled out, because a construct in which intron 7 must be translated, and one in which it cannot be translated, both result in wild-type development of both cell types. Furthermore, intron 7 is unable to act in trans to silence regA, but is able to exert its normal effect when placed in a different location within the gene. Therefore, it appears that intron 7 functions in gonidia as a classical cell-type-specific and promoter-specific enhancer, of the inhibitory type that is often referred to as a silencer.
Collapse
Affiliation(s)
- K Stark
- Lehrstuhl für Genetik, University of Regensburg, D-93040 Regensburg, Germany
| | | | | |
Collapse
|
19
|
Abstract
The analysis of regulatory sequences is greatly facilitated by database-assisted bioinformatic approaches. The TRANSFAC database contains information on transcription factors and their origins, functional properties and sequence-specific binding activities. Software tools enable us to screen the database with a given DNA sequence for interacting transcription factors. If a regulatory function is already attributed to this sequence then the database-assisted identification of binding sites for proteins or protein classes and subsequent experimental verification might establish functionally relevant sites within this sequence. The binding transcription factors and interacting factors might already be present in the database.
Collapse
Affiliation(s)
- R Hehl
- Institut für Genetik, Technische Universität Braunschweig, Spielmannstr. 7, D-38106,., Braunschweig, Germany.
| | | |
Collapse
|
20
|
Ponomarenko JV, Merkulova TI, Vasiliev GV, Levashova ZB, Orlova GV, Lavryushev SV, Fokin ON, Ponomarenko MP, Frolov AS, Sarai A. rSNP_Guide, a database system for analysis of transcription factor binding to target sequences: application to SNPs and site-directed mutations. Nucleic Acids Res 2001; 29:312-6. [PMID: 11125123 PMCID: PMC29847 DOI: 10.1093/nar/29.1.312] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
rSNP_Guide is a novel curated database system for analysis of transcription factor (TF) binding to target sequences in regulatory gene regions altered by mutations. It accumulates experimental data on naturally occurring site variants in regulatory gene regions and site-directed mutations. This database system also contains the web tools for SNP analysis, i.e., active applet applying weight matrices to predict the regulatory site candidates altered by a mutation. The current version of the rSNP_Guide is supplemented by six sub-databases: (i) rSNP_DB, on DNA-protein interaction caused by mutation; (ii) SYSTEM, on experimental systems; (iii) rSNP_BIB, on citations to original publications; (iv) SAMPLES, on experimentally identified sequences of known regulatory sites; (v) MATRIX, on weight matrices of known TF sites; (vi) rSNP_Report, on characteristic examples of successful rSNP_Tools implementation. These databases are useful for the analysis of natural SNPs and site-directed mutations. The databases are available through the Web, http://wwwmgs.bionet.nsc.ru/mgs/systems/rsnp/.
Collapse
Affiliation(s)
- J V Ponomarenko
- Institute of Cytology and Genetics, 10 Lavrentyev Avenue, Novosibirsk, 630090, Russia.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Nakata K, Takai-Igarashi T, Nakano T, Kaminuma T. Extension of The Receptor Database (RDB). CHEM-BIO INFORMATICS JOURNAL 2001. [DOI: 10.1273/cbij.1.115] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Affiliation(s)
- Kotoko Nakata
- Division of Chem-Bio Informatics, National Institute of Health Sciences
| | | | - Tatsuya Nakano
- Division of Chem-Bio Informatics, National Institute of Health Sciences
| | | |
Collapse
|
22
|
Ponomarenko JV, Furman DP, Frolov AS, Podkolodny NL, Orlova GV, Ponomarenko MP, Kolchanov NA, Sarai A. ACTIVITY: a database on DNA/RNA sites activity adapted to apply sequence-activity relationships from one system to another. Nucleic Acids Res 2001; 29:284-7. [PMID: 11125114 PMCID: PMC29829 DOI: 10.1093/nar/29.1.284] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
ACTIVITY is a database on DNA/RNA site sequences with known activity magnitudes, measurement systems, sequence-activity relationships under fixed experimental conditions and procedures to adapt these relationships from one measurement system to another. This database deposits information on DNA/RNA affinities to proteins and cell nuclear extracts, cutting efficiencies, gene transcription activity, mRNA translation efficiencies, mutability and other biological activities of natural sites occurring within promoters, mRNA leaders, and other regulatory regions in pro- and eukaryotic genomes, their mutant forms and synthetic analogues. Since activity magnitudes are heavily system-dependent, the current version of ACTIVITY is supplemented by three novel sub-databases: (i) SYSTEM, measurement systems; (ii) KNOWLEDGE, sequence-activity relationships under fixed experimental conditions; and (iii) CROSS_TEST, procedures adapting a relationship from one measurement system to another. These databases are useful in molecular biology, pharmacogenetics, metabolic engineering, drug design and biotechnology. The databases can be queried using SRS and are available through the Web, http://wwwmgs. bionet.nsc.ru/systems/Activity/.
Collapse
Affiliation(s)
- J V Ponomarenko
- Institute of Cytology and Genetics, 10 Lavrentyev Avenue, Novosibirsk, 630090, Russia.
| | | | | | | | | | | | | | | |
Collapse
|