1
|
Orlov YL, Chen M, Kolchanov NA, Hofestädt R. BGRS: bioinformatics of genome regulation and data integration. J Integr Bioinform 2023; 20:jib-2023-0032. [PMID: 37972410 PMCID: PMC10757072 DOI: 10.1515/jib-2023-0032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023] Open
Affiliation(s)
- Yuriy L. Orlov
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, 630090Novosibirsk, Russia
- Life Sciences Department, Novosibirsk State University, 630090Novosibirsk, Russia
- The Digital Health Institute, I.M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation (Sechenov University), 119991Moscow, Russia
- Agrarian and Technological Institute, Peoples’ Friendship University of Russia, 117198Moscow, Russia
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou310058, China
| | - Nikolay A. Kolchanov
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, 630090Novosibirsk, Russia
- Life Sciences Department, Novosibirsk State University, 630090Novosibirsk, Russia
| | - Ralf Hofestädt
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
| |
Collapse
|
2
|
Orlov YL, Tatarinova TV, Oparina NY, Galieva ER, Baranova AV. Editorial: Bioinformatics of Genome Regulation, Volume I. Front Genet 2021; 12:803273. [PMID: 34938326 PMCID: PMC8687738 DOI: 10.3389/fgene.2021.803273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 11/08/2021] [Indexed: 11/23/2022] Open
Affiliation(s)
- Yuriy L Orlov
- Institute of Digital Medicine, I.M.Sechenov First Moscow State Medical University (Sechenov University), Moscow, Russia.,Agrarian and Technological Institute, Peoples' Friendship University of Russia (RUDN University), Moscow, Russia.,Life Sciences Department, Novosibirsk State University, Novosibirsk, Russia.,Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
| | | | - Nina Y Oparina
- Institute of Medicine, University of Gothenburg, Göteborg, Sweden
| | - Elvira R Galieva
- Life Sciences Department, Novosibirsk State University, Novosibirsk, Russia
| | - Ancha V Baranova
- School of Systems Biology, George Mason University, Fairfax, VA, United States.,Research Centre for Medical Genetics, Moscow, Russia
| |
Collapse
|
3
|
Dergilev AI, Orlova NG, Dobrovolskaya OB, Orlov YL. Statistical estimates of multiple transcription factors binding in the model plant genomes based on ChIP-seq data. J Integr Bioinform 2021; 19:jib-2020-0036. [PMID: 34953471 PMCID: PMC9069649 DOI: 10.1515/jib-2020-0036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 11/25/2021] [Indexed: 12/03/2022] Open
Abstract
The development of high-throughput genomic sequencing coupled with chromatin immunoprecipitation technologies allows studying the binding sites of the protein transcription factors (TF) in the genome scale. The growth of data volume on the experimentally determined binding sites raises qualitatively new problems for the analysis of gene expression regulation, prediction of transcription factors target genes, and regulatory gene networks reconstruction. Genome regulation remains an insufficiently studied though plants have complex molecular regulatory mechanisms of gene expression and response to environmental stresses. It is important to develop new software tools for the analysis of the TF binding sites location and their clustering in the plant genomes, visualization, and the following statistical estimates. This study presents application of the analysis of multiple TF binding profiles in three evolutionarily distant model plant organisms. The construction and analysis of non-random ChIP-seq binding clusters of the different TFs in mammalian embryonic stem cells were discussed earlier using similar bioinformatics approaches. Such clusters of TF binding sites may indicate the gene regulatory regions, enhancers and gene transcription regulatory hubs. It can be used for analysis of the gene promoters as well as a background for transcription networks reconstruction. We discuss the statistical estimates of the TF binding sites clusters in the model plant genomes. The distributions of the number of different TFs per binding cluster follow same power law distribution for all the genomes studied. The binding clusters in Arabidopsis thaliana genome were discussed here in detail.
Collapse
Affiliation(s)
- Arthur I. Dergilev
- Novosibirsk State University, 630090Novosibirsk, Russia
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, 630090Novosibirsk, Russia
| | - Nina G. Orlova
- Financial University under the Government of the Russian Federation, 125993Moscow, Russia
- Moscow State Technical University of Civil Aviation, 125993Moscow, Russia
| | - Oxana B. Dobrovolskaya
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, 630090Novosibirsk, Russia
- Agrarian and Technological Institute, Peoples’ Friendship University of Russia,117198Moscow, Russia
| | - Yuriy L. Orlov
- Novosibirsk State University, 630090Novosibirsk, Russia
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, 630090Novosibirsk, Russia
- Agrarian and Technological Institute, Peoples’ Friendship University of Russia,117198Moscow, Russia
- The Digital Health Institute, I.M.Sechenov First Moscow State Medical University (Sechenov University), 119991Moscow, Russia
| |
Collapse
|
4
|
Gubanova NV, Orlova NG, Dergilev AI, Oparina NY, Orlov YL. Glioblastoma gene network reconstruction and ontology analysis by online bioinformatics tools. J Integr Bioinform 2021; 18:jib-2021-0031. [PMID: 34783229 PMCID: PMC8709738 DOI: 10.1515/jib-2021-0031] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 10/18/2021] [Indexed: 12/13/2022] Open
Abstract
Glioblastoma is the most aggressive type of brain tumors resistant to a number of antitumor drugs. The problem of therapy and drug treatment course is complicated by extremely high heterogeneity in the benign cell populations, the random arrangement of tumor cells, and polymorphism of their nuclei. The pathogenesis of gliomas needs to be studied using modern cellular technologies, genome- and transcriptome-wide technologies of high-throughput sequencing, analysis of gene expression on microarrays, and methods of modern bioinformatics to find new therapy targets. Functional annotation of genes related to the disease could be retrieved based on genetic databases and cross-validated by integrating complementary experimental data. Gene network reconstruction for a set of genes (proteins) proved to be effective approach to study mechanisms underlying disease progression. We used online bioinformatics tools for annotation of gene list for glioma, reconstruction of gene network and comparative analysis of gene ontology categories. The available tools and the databases for glioblastoma gene analysis are discussed together with the recent progress in this field.
Collapse
Affiliation(s)
- Natalya V Gubanova
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, 630090 Novosibirsk, Russia
| | - Nina G Orlova
- Financial University under the Government of the Russian Federation, 119991 Moscow, Russia.,Moscow State Technical University of Civil Aviation, 125993 Moscow, Russia
| | | | | | - Yuriy L Orlov
- Novosibirsk State University, 630090 Novosibirsk, Russia.,The Digital Health Institute, I.M.Sechenov First Moscow State Medical University of the Russian Ministry of Health, 119991 Moscow, Russia
| |
Collapse
|
5
|
Anashkina AA, Leberfarb EY, Orlov YL. Recent Trends in Cancer Genomics and Bioinformatics Tools Development. Int J Mol Sci 2021; 22:ijms222212146. [PMID: 34830028 PMCID: PMC8618360 DOI: 10.3390/ijms222212146] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 11/08/2021] [Indexed: 02/07/2023] Open
Affiliation(s)
- Anastasia A. Anashkina
- The Digital Health Institute, I.M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation (Sechenov University), 119991 Moscow, Russia;
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Elena Y. Leberfarb
- Department of Medicinal Chemistry, Novosibirsk State Medical University, 630091 Novosibirsk, Russia;
| | - Yuriy L. Orlov
- The Digital Health Institute, I.M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation (Sechenov University), 119991 Moscow, Russia;
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia
- Life Sciences Department, Novosibirsk State University, 630090 Novosibirsk, Russia
- Agrarian and Technological Institute, Peoples’ Friendship University of Russia, 117198 Moscow, Russia
- Correspondence: or
| |
Collapse
|
6
|
Orlov YL, Baranova AV. Editorial: Bioinformatics of Genome Regulation and Systems Biology. Front Genet 2020; 11:625. [PMID: 32849761 PMCID: PMC7399369 DOI: 10.3389/fgene.2020.00625] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Accepted: 05/26/2020] [Indexed: 12/31/2022] Open
Affiliation(s)
- Yuriy L Orlov
- Institute of Digital Medicine, I.M.Sechenov First Moscow State Medical University (Sechenov University), Moscow, Russia.,Life Sciences Department, Novosibirsk State University, Novosibirsk, Russia.,Agrobiotechnology Department, Agrarian and Technological Institute, Peoples' Friendship University of Russia, Moscow, Russia
| | - Ancha V Baranova
- School of Systems Biology, George Mason University, Fairfax, VA, United States
| |
Collapse
|
7
|
Beyene SS, Ling T, Ristevski B, Chen M. A novel riboswitch classification based on imbalanced sequences achieved by machine learning. PLoS Comput Biol 2020; 16:e1007760. [PMID: 32687488 PMCID: PMC7392346 DOI: 10.1371/journal.pcbi.1007760] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 07/30/2020] [Accepted: 05/13/2020] [Indexed: 11/24/2022] Open
Abstract
Riboswitch, a part of regulatory mRNA (50-250nt in length), has two main classes: aptamer and expression platform. One of the main challenges raised during the classification of riboswitch is imbalanced data. That is a circumstance in which the records of a sequences of one group are very small compared to the others. Such circumstances lead classifier to ignore minority group and emphasize on majority ones, which results in a skewed classification. We considered sixteen riboswitch families, to be in accord with recent riboswitch classification work, that contain imbalanced sequences. The sequences were split into training and test set using a newly developed pipeline. From 5460 k-mers (k value 1 to 6) produced, 156 features were calculated based on CfsSubsetEval and BestFirst function found in WEKA 3.8. Statistically tested result was significantly difference between balanced and imbalanced sequences (p < 0.05). Besides, each algorithm also showed a significant difference in sensitivity, specificity, accuracy, and macro F-score when used in both groups (p < 0.05). Several k-mers clustered from heat map were discovered to have biological functions and motifs at the different positions like interior loops, terminal loops and helices. They were validated to have a biological function and some are riboswitch motifs. The analysis has discovered the importance of solving the challenges of majority bias analysis and overfitting. Presented results were generalized evaluation of both balanced and imbalanced models, which implies their ability of classifying, to classify novel riboswitches. The Python source code is available at https://github.com/Seasonsling/riboswitch.
Collapse
Affiliation(s)
- Solomon Shiferaw Beyene
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Tianyi Ling
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
- School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Blagoj Ristevski
- Faculty of Information and Communication Technologies, Bitola, St. Kliment Ohridski University Bitola, ul. Partizanska Bitola, Republic of North Macedonia
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| |
Collapse
|