1
|
Lavrekha VV, Levitsky VG, Tsukanov AV, Bogomolov AG, Grigorovich DA, Omelyanchuk N, Ubogoeva EV, Zemlyanskaya EV, Mironova V. CisCross: A gene list enrichment analysis to predict upstream regulators in Arabidopsis thaliana. Front Plant Sci 2022; 13:942710. [PMID: 36061801 PMCID: PMC9434332 DOI: 10.3389/fpls.2022.942710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 07/26/2022] [Indexed: 06/15/2023]
Abstract
Having DNA-binding profiles for a sufficient number of genome-encoded transcription factors (TFs) opens up the perspectives for systematic evaluation of the upstream regulators for the gene lists. Plant Cistrome database, a large collection of TF binding profiles detected using the DAP-seq method, made it possible for Arabidopsis. Here we re-processed raw DAP-seq data with MACS2, the most popular peak caller that leads among other ones according to quality metrics. In the benchmarking study, we confirmed that the improved collection of TF binding profiles supported a more precise gene list enrichment procedure, and resulted in a more relevant ranking of potential upstream regulators. Moreover, we consistently recovered the TF binding profiles that were missing in the previous collection of DAP-seq peak sets. We developed the CisCross web service (https://plamorph.sysbio.ru/ciscross/) that gives more flexibility in the analysis of potential upstream TF regulators for Arabidopsis thaliana genes.
Collapse
Affiliation(s)
- Viktoriya V. Lavrekha
- Department of Systems Biology, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
- Department of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| | - Victor G. Levitsky
- Department of Systems Biology, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
- Department of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| | - Anton V. Tsukanov
- Department of Systems Biology, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
| | - Anton G. Bogomolov
- Department of Cell Biology, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
| | - Dmitry A. Grigorovich
- Service of Information Technologies, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
| | - Nadya Omelyanchuk
- Department of Systems Biology, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
| | - Elena V. Ubogoeva
- Department of Systems Biology, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
| | - Elena V. Zemlyanskaya
- Department of Systems Biology, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
- Department of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| | - Victoria Mironova
- Department of Systems Biology, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
- Department of Plant Systems Physiology, RIBES, Radboud University, Nijmegen, Netherlands
| |
Collapse
|
2
|
Abstract
The PDBSite database provides comprehensive structural and functional information on various protein sites (post-translational modification, catalytic active, organic and inorganic ligand binding, protein-protein, protein-DNA and protein-RNA interactions) in the Protein Data Bank (PDB). The PDBSite is available online at http://wwwmgs.bionet.nsc.ru/mgs/gnw/pdbsite/. It consists of functional sites extracted from PDB using the SITE records and of an additional set containing the protein interaction sites inferred from the contact residues in heterocomplexes. The PDBSite was set up by automated processing of the PDB. The PDBSite database can be queried through the functional description and the structural characteristics of the site and its environment. The PDBSite is integrated with the PDBSiteScan tool allowing structural comparisons of a protein against the functional sites. The PDBSite enables the recognition of functional sites in protein tertiary structures, providing annotation of function through structure. The PDBSite is updated after each new PDB release.
Collapse
Affiliation(s)
- Vladimir A Ivanisenko
- Institute of Cytology and Genetics SBRAS, Lavrentyev Avenue 10, Novosibirsk 630090, Russia.
| | | | | | | |
Collapse
|
3
|
Ivanisenko VA, Pintus SS, Grigorovich DA, Kolchanov NA. PDBSiteScan: a program for searching for active, binding and posttranslational modification sites in the 3D structures of proteins. Nucleic Acids Res 2004; 32:W549-54. [PMID: 15215447 PMCID: PMC441577 DOI: 10.1093/nar/gkh439] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
PDBSiteScan is a web-accessible program designed for searching three-dimensional (3D) protein fragments similar in structure to known active, binding and posttranslational modification sites. A collection of known sites we designated as PDBSite was set up by automated processing of the PDB database using the data on site localization in the SITE field. Additionally, protein-protein interaction sites were generated by analysis of atom coordinates in heterocomplexes. The total number of collected sites was more than 8100; they were assigned to more than 80 functional groups. PDBSiteScan provides automated search of the 3D protein fragments whose maximum distance mismatch (MDM) between N, Calpha and C atoms in a fragment and a functional site is not larger than the MDM threshold defined by the user. PDBSiteScan requires perfect matching of amino acids. PDBSiteScan enables recognition of functional sites in tertiary structures of proteins and allows proteins with functional information to be annotated. The program PDBSiteScan is available at http://wwwmgs.bionet.nsc.ru/mgs/systems/fastprot/pdbsitescan.html.
Collapse
Affiliation(s)
- Vladimir A Ivanisenko
- Institute of Cytology and Genetics SBRAS, Lavrentyev Avenue 10, Novosibirsk 630090, Russia
| | | | | | | |
Collapse
|
4
|
Oshchepkov DY, Vityaev EE, Grigorovich DA, Ignatieva EV, Khlebodarova TM. SITECON: a tool for detecting conservative conformational and physicochemical properties in transcription factor binding site alignments and for site recognition. Nucleic Acids Res 2004; 32:W208-12. [PMID: 15215382 PMCID: PMC441612 DOI: 10.1093/nar/gkh474] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The local DNA conformation in the region of transcription factor binding sites, determined by context, is one of the factors underlying the specificity of DNA-protein interactions. Analysis of the local conformation of a set of functional DNA sequences may allow for determination of the conservative conformational and physicochemical parameters reflecting molecular mechanisms of interaction. The web resource SITECON is designed to detect conservative conformational and physicochemical properties in transcription factor binding sites, contains a knowledge base of conservative properties for >100 high-quality sample sites and allows for recognition of potential transcription factor binding sites based on conservative properties from both the knowledge base and the results of analysis of a sample proposed by a user. The resource SITECON is available at http://wwwmgs.bionet.nsc.ru/mgs/programs/sitecon/.
Collapse
Affiliation(s)
- D Y Oshchepkov
- Institute of Cytology and Genetics SB RAS, Novosibirsk 630090, Russia
| | | | | | | | | |
Collapse
|
5
|
Nizolenko LP, Bachinsky AG, Naumochkin AN, Yarigin AA, Grigorovich DA. Database of patterns PROF_PAT for detecting local similarities. In Silico Biol 2004; 3:205-13. [PMID: 12762856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2023]
Abstract
We have developed PROF_PAT, a database of patterns, constructed for groups of related proteins and designed to maximize representation of amino acid sequences from the SWISS-PROT database. The purpose of the current study was to demonstrate that PROT_PAT is not only as good as known analogs but surpasses them in some features. 10938 new amino acid sequences from the SWISS-PROT bank were compared with patterns constructed for protein families in the PROF_PAT 1.10 bank. The aim of the comparisons was to estimate some threshold values of "Score" parameter to distinguish random similarities from significant ones. From the 10938 new sequences, 638 did not reveal any similarities with PROF_PAT patterns. Cases of found similarities were divided into three sets: 'positive', 'putative' (or 'unknown'), and 'false positive', containing 7719, 2297 and 284 sequences respectively. Using 20 amino acid sequences from the TrEMBL bank that have no descriptions, PROF_PAT demonstrated specificity at a level that was as good as the best-known "secondary" banks. At the same time, its pattern content and variety of included proteins was significantly richer, and its search speed was 3-10 times higher than those of any other protein family bank used for comparison.
Collapse
Affiliation(s)
- Lily P Nizolenko
- Theoretical Department, Research Institute of Molecular Biology, SRC VB "Vector", Koltsovo, Novosibirsk region, 630559, Russia
| | | | | | | | | |
Collapse
|
6
|
Nizolenko LF, Bachinskiĭ AG, Naumochkin AN, Iarygin AA, Grigorovich DA. [Bank of samples from the Prof_Pat protein family, assessment of efficacy]. Mol Biol (Mosk) 2004; 38:256-64. [PMID: 15125231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
The PROF_PAT protein pattern database has been created and maintained so as to comprise the maximal number of the SWISS-PROT + TrEMBL proteins as patterns. The present paper describes some characteristic features of PROF_PAT to assist the potential user. New amino acid sequences (10938) from the SWISS-PROT database have been analyzed to determine the boundary values of the "score" parameter to distinguish random and significant similarities. Analysis through the Internet of 20 amino acid sequences having no descriptions in the TrEMBL database demonstrated that PROF_PAT, being highly competitive with its counterparts in specificity, surpasses them in amplitude and variety of proteins, working several times as fast. The real representation of protein families in the PROF_PAT database (release 1.11), which contains 50,149 patterns of 344,429 proteins, has been estimated at 31,450.
Collapse
Affiliation(s)
- L F Nizolenko
- Institute of Molecular Biology, State Research Center of Virology and Biotechnology VECTOR, Russian Ministry of Health, Koltsovo, Novosibirsk Region, 630559 Russia.
| | | | | | | | | |
Collapse
|
7
|
Kolchanov NA, Podkolodnaia OA, Anan'ko EA, Ignat'eva EV, Podkolodnyĭ NL, Merkulov VM, Stepanenko IL, Pozdniakov MA, Belova OE, Grigorovich DA, Naumochkin AN. [Regulation of eukaryotic gene transcription: description in the TRRD database]. Mol Biol (Mosk) 2001; 35:934-42. [PMID: 11771140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
Abstract
The structure of the Transcription Regulatory Regions Database (TRRD) and the principles of considering transcription regulation of eukaryotic genes in TRRD are concerned. Formal description of the structural and functional organization of the regulatory gene regions is illustrated with examples. By now, TRRD is based on 3500 original works and contains data on transcription regulation of more than 1100 genes known to possess more than 5000 transcription factor-binding sites and about 1600 regulatory elements (promoters, enhancers, silencers). TRRD is available at http://www.bionet.nsc.ru/trrd/.
Collapse
|
8
|
Valuev VP, Afoninkov DA, Petrenko OI, Grigorovich DA. [Properties of artificial evolution of proteins and peptides]. Mol Biol (Mosk) 2001; 35:1048-55. [PMID: 11771129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
Abstract
In vitro evolution is used to study protein sequences, structures, and interactions and to obtain proteins with new properties. To analyze the specific features of this process in experiments with phage display, we studied the amino acid composition of selected sequences, constructed a matrix of amino acid substitutions, and identified pairs of coadaptive substitutions. Amino acid frequency proved to be tightly associated with the number of corresponding codons; numerous correlated substitutions were found.
Collapse
|
9
|
Kochetov AV, Grigorovich DA, Titov II, Vorob'ev DG, Syrnik OA, Vishnevskiĭ OV, Sarai A, Kolchanov NA. [mRNA-FAST (mRNA-Function, Activity, STructure) computer system]. Mol Biol (Mosk) 2001; 35:1039-47. [PMID: 11771128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
Abstract
Computer system mRNA-FAST (mRNA--Function, Activity, STructure; http://wwwmgs.bionet.nsc.ru/mgs/dbases/trsig/) is described. The system has been developed to analyze nucleotide sequences of mRNA and to measure their essential properties. The system compiles the data base on translation signals including nucleotide sequences of the regulatory regions with structural and experimental information on their specific activities. It also contains programs to search for local homology between mRNA and translation signals, to search for potential signals basing on analysis of the oligonucleotide dictionaries, and to model secondary RNA structure. Possible applications of the system mRNA-FAST are discussed.
Collapse
|
10
|
Frolov AS, Lavriushev SV, Grigorovich DA, Kel AE, Ptitsyn AA, Kolchanov NA, Podkolodnyĭ NL, Solov'ev VV, Milanesi L, Bourne P. [WWWMGS: an integrated server for molecular-genetic studies]. Biofizika 1999; 44:832-6. [PMID: 10624522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
Abstract
We report an integrative technology for molecular biology studies in the field of transcription regulation by using Internet. A set of databases, programs, and systems are included into WWWMGS Web server. For example, the use of TRRD database information for site prediction is described. Using this method, the computer system SeqAnn was developed. The system performs the "real time" searching for prediction of initiation transcription site position according to database information. WWWMGS is available at URL: http://wwwmgs.bionet.nsc.ru/.
Collapse
Affiliation(s)
- A S Frolov
- Institute of Cytology and Genetics, Russian Academy of Sciences, Novosibirsk, Russia
| | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Kolchanov NA, Ponomarenko MP, Frolov AS, Ananko EA, Kolpakov FA, Ignatieva EV, Podkolodnaya OA, Goryachkovskaya TN, Stepanenko IL, Merkulova TI, Babenko VV, Ponomarenko YV, Kochetov AV, Podkolodny NL, Vorobiev DV, Lavryushev SV, Grigorovich DA, Kondrakhin YV, Milanesi L, Wingender E, Solovyev V, Overton GC. Integrated databases and computer systems for studying eukaryotic gene expression. Bioinformatics 1999; 15:669-86. [PMID: 10487874 DOI: 10.1093/bioinformatics/15.7.669] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The goal of the work was to develop a WWW-oriented computer system providing a maximal integration of informational and software resources on the regulation of gene expression and navigation through them. Rapid growth of the variety and volume of information accumulated in the databases on regulation of gene expression necessarily requires the development of computer systems for automated discovery of the knowledge that can be further used for analysis of regulatory genomic sequences. RESULTS The GeneExpress system developed includes the following major informational and software modules: (1) Transcription Regulation (TRRD) module, which contains the databases on transcription regulatory regions of eukaryotic genes and TRRD Viewer for data visualization; (2) Site Activity Prediction (ACTIVITY), the module for analysis of functional site activity and its prediction; (3) Site Recognition module, which comprises (a) B-DNA-VIDEO system for detecting the conformational and physicochemical properties of DNA sites significant for their recognition, (b) Consensus and Weight Matrices (ConsFrec) and (c) Transcription Factor Binding Sites Recognition (TFBSR) systems for detecting conservative contextual regions of functional sites and their recognition; (4) Gene Networks (GeneNet), which contains an object-oriented database accumulating the data on gene networks and signal transduction pathways, and the Java-based Viewer for exploration and visualization of the GeneNet information; (5) mRNA Translation (Leader mRNA), designed to analyze structural and contextual properties of mRNA 5'-untranslated regions (5'-UTRs) and predict their translation efficiency; (6) other program modules designed to study the structure-function organization of regulatory genomic sequences and regulatory proteins. AVAILABILITY GeneExpress is available at http://wwwmgs.bionet.nsc. ru/systems/GeneExpress/ and the links to the mirror site(s) can be found at http://wwwmgs.bionet.nsc.ru/mgs/links/mirrors.html+ ++.
Collapse
Affiliation(s)
- N A Kolchanov
- Institute of Cytology & Genetics, Siberian Branch of the Russian Academy of Sciences, Prosp. Lavrentieva 10, Novosibirsk 630090, Russia.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Ptitsyn AA, Rogozin IB, Grigorovich DA, Strelets VB, Kel' AE, Milanezi L, Kolchanov NA. [Computer system "AutoGene" for automatic analysis of nucleotide sequences]. Mol Biol (Mosk) 1996; 30:432-41. [PMID: 8724776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
13
|
Ptitsyn AA, Grigorovich DA. Object-oriented data handler for sequence analysis software development. Comput Appl Biosci 1995; 11:583-9. [PMID: 8808573 DOI: 10.1093/bioinformatics/11.6.583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
We report an object-oriented data handler and supplementary tools for the development of molecular genetics application software for various sequence analyses. Our data handler has a flexible and expandable format that supports the most common data types for molecular genetic software. New data types can be constructed in an object-oriented manner from the basic units. The data handler includes an object library, a format-converting program and a viewer that can visualize simultaneously the data contained in several files to construct a general picture from separate data. This software has been implemented on an IBM PC-compatible personal computer.
Collapse
Affiliation(s)
- A A Ptitsyn
- Institute of Cytology and Genetics, Novosibirsk, Russia.
| | | |
Collapse
|