1
|
Nakajima T, Harada K, Tomooka Y, Sato T. In silico screening system based on a transcription factors regulatory network only using transcriptomic data. PLoS One 2025; 20:e0319971. [PMID: 40193394 PMCID: PMC11975132 DOI: 10.1371/journal.pone.0319971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Accepted: 02/12/2025] [Indexed: 04/09/2025] Open
Abstract
In this study, we developed a method to identify core transcription factors (TFs) involved in differentiation using only comprehensive gene analysis. The theory of in silico screening using TFs regulatory network analysis (ISNA) required the following requirements: (1) estimating promoter regions, (2) constructing TFs regulatory network (TRN) relationships using the nucleotide sequence information in the promoters and score matrices derived from TF consensus sequences, and (3) identifying candidate core TFs by determining dissociation constants (Kd values) within the relationships of TRN. ISNA demonstrated the ability to predict the core TFs involved in the endothelial-to-mesenchymal transition of human umbilical vein endothelial cell (HUVEC) and the differentiation of human embryonic stem cells into mesodermal cells. Using ISNA, we identified HMGA2 as a novel core TF in uterine epithelium. Notably, HMGA2 expression was predominantly detected in uterine epithelium, where it regulated cell proliferation in response to estrogen. These findings highlight ISNA's potential to identify core TFs based on transcriptomic data.
Collapse
Affiliation(s)
- Tadaaki Nakajima
- Department of Science, Yokohama City University, Yokohama, Japan
- Department of Biological Science and Technology, Faculty of Industrial Science and Technology, Tokyo University of Science, Tokyo, Japan
| | - Kentaro Harada
- Department of Biological Science and Technology, Faculty of Industrial Science and Technology, Tokyo University of Science, Tokyo, Japan
| | - Yasuhiro Tomooka
- Department of Biological Science and Technology, Faculty of Industrial Science and Technology, Tokyo University of Science, Tokyo, Japan
| | - Tomomi Sato
- Department of Science, Yokohama City University, Yokohama, Japan
- Graduate School of Nanobioscience, Yokohama City University, Yokohama, Japan
| |
Collapse
|
2
|
Hase T, Ghosh S, Aisaki KI, Kitajima S, Kanno J, Kitano H, Yachie A. DTox: A deep neural network-based in visio lens for large scale toxicogenomics data. J Toxicol Sci 2024; 49:105-115. [PMID: 38432953 DOI: 10.2131/jts.49.105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2024]
Abstract
With the advancement of large-scale omics technologies, particularly transcriptomics data sets on drug and treatment response repositories available in public domain, toxicogenomics has emerged as a key field in safety pharmacology and chemical risk assessment. Traditional statistics-based bioinformatics analysis poses challenges in its application across multidimensional toxicogenomic data, including administration time, dosage, and gene expression levels. Motivated by the visual inspection workflow of field experts to augment their efficiency of screening significant genes to derive meaningful insights, together with the ability of deep neural architectures to learn the image signals, we developed DTox, a deep neural network-based in visio approach. Using the Percellome toxicogenomics database, instead of utilizing the numerical gene expression values of the transcripts (gene probes of the microarray) for dose-time combinations, DTox learned the image representation of 3D surface plots of distinct time and dosage data points to train the classifier on the experts' labels of gene probe significance. DTox outperformed statistical threshold-based bioinformatics and machine learning approaches based on numerical expression values. This result shows the ability of image-driven neural networks to overcome the limitations of classical numeric value-based approaches. Further, by augmenting the model with explainability modules, our study showed the potential to reveal the visual analysis process of human experts in toxicogenomics through the model weights. While the current work demonstrates the application of the DTox model in toxicogenomic studies, it can be further generalized as an in visio approach for multi-dimensional numeric data with applications in various fields in medical data sciences.
Collapse
Affiliation(s)
- Takeshi Hase
- The Systems Biology Institute, Saisei Ikedayama Bldg
- SBX BioSciences, Inc, Canada
- Institute of Education, Tokyo Medical and Dental University
- Faculty of Pharmacy, Keio University
- Center for Mathematical Modelling and Data Science, Osaka University
| | - Samik Ghosh
- The Systems Biology Institute, Saisei Ikedayama Bldg
| | - Ken-Ichi Aisaki
- Division of Cellular and Molecular Toxicology, Center for Biological Safety and Research (CBSR), National Institute of Health Sciences (NIHS)
| | - Satoshi Kitajima
- Division of Cellular and Molecular Toxicology, Center for Biological Safety and Research (CBSR), National Institute of Health Sciences (NIHS)
| | - Jun Kanno
- The Systems Biology Institute, Saisei Ikedayama Bldg
- Division of Cellular and Molecular Toxicology, Center for Biological Safety and Research (CBSR), National Institute of Health Sciences (NIHS)
- Faculty of Medicine, University of Tsukuba
| | - Hiroaki Kitano
- The Systems Biology Institute, Saisei Ikedayama Bldg
- Integrated Open Systems Unit, Okinawa Institute of Science and Technology (OIST)
| | - Ayako Yachie
- The Systems Biology Institute, Saisei Ikedayama Bldg
- SBX BioSciences, Inc, Canada
| |
Collapse
|
3
|
Human lipoproteins comprise at least 12 different classes that are lognormally distributed. PLoS One 2022; 17:e0275066. [DOI: 10.1371/journal.pone.0275066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 09/09/2022] [Indexed: 11/12/2022] Open
Abstract
This study presents the results of HPLC, a gentler and rapid separation method in comparison with the conventional ultracentrifugation, for 55 human serum samples. The elution patterns were analysed parametrically, and the attribute of each class was confirmed biochemically. Human samples contained 12 classes of lipoproteins, each of which may consist primarily of proteins. There are three classes of VLDLs. The level of each class was distributed lognormally, and the standard amount and the 95% range were estimated. Some lipoprotein classes with a narrow range could become ideal indicators of specific diseases. This lognormal character suggests that the levels are controlled by the synergy of multiple factors; multiple undesirable lifestyle habits may drastically increase the levels of specific lipoprotein classes. Lipoproteins in medical samples have been measured by enzymatic methods that coincide with conventional ultracentrifugation; however, the high gravity and time required for ultracentrifugation can cause sample degradation. Actually, the enzymatic methods measured the levels of several mixed classes. The targets of enzymatic methods have to be revised.
Collapse
|
4
|
Unno K, Muguruma Y, Inoue K, Konishi T, Taguchi K, Hasegawa-Ishii S, Shimada A, Nakamura Y. Theanine, Antistress Amino Acid in Tea Leaves, Causes Hippocampal Metabolic Changes and Antidepressant Effects in Stress-Loaded Mice. Int J Mol Sci 2020; 22:ijms22010193. [PMID: 33379343 PMCID: PMC7795947 DOI: 10.3390/ijms22010193] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 12/23/2020] [Accepted: 12/23/2020] [Indexed: 02/08/2023] Open
Abstract
By comprehensively measuring changes in metabolites in the hippocampus of stress-loaded mice, we investigated the reasons for stress vulnerability and the effect of theanine, i.e., an abundant amino acid in tea leaves, on the metabolism. Stress sensitivity was higher in senescence-accelerated mouse prone 10 (SAMP10) mice than in normal ddY mice when these mice were loaded with stress on the basis of territorial consciousness in males. Group housing was used as the low-stress condition reference. Among the statistically altered metabolites, depression-related kynurenine and excitability-related histamine were significantly higher in SAMP10 mice than in ddY mice. In contrast, carnosine, which has antidepressant-like activity, and ornithine, which has antistress effects, were significantly lower in SAMP10 mice than in ddY mice. The ingestion of theanine, an excellent antistress amino acid, modulated the levels of kynurenine, histamine, and carnosine only in the stress-loaded SAMP10 mice and not in the group-housing mice. Depression-like behavior was suppressed in mice that had ingested theanine only under stress loading. Taken together, changes in these metabolites, such as kynurenine, histamine, carnosine, and ornithine, were suggested to be associated with the stress vulnerability and depression-like behavior of stressed SAMP10 mice. It was also shown that theanine action appears in the metabolism of mice only under stress loading.
Collapse
Affiliation(s)
- Keiko Unno
- Tea Science Center, University of Shizuoka, 52-1 Yada, Suruga-ku, Shizuoka 422-8526, Japan; (K.T.); (Y.N.)
- Correspondence: ; Tel.: +81-54-264-5822
| | - Yoshio Muguruma
- Graduate School of Pharmaceutical Sciences, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan; (Y.M.); (K.I.)
| | - Koichi Inoue
- Graduate School of Pharmaceutical Sciences, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan; (Y.M.); (K.I.)
| | - Tomokazu Konishi
- Faculty of Bioresource Sciences, Akita Prefectural University, Shimoshinjo Nakano, Akita 010-0195, Japan;
| | - Kyoko Taguchi
- Tea Science Center, University of Shizuoka, 52-1 Yada, Suruga-ku, Shizuoka 422-8526, Japan; (K.T.); (Y.N.)
| | - Sanae Hasegawa-Ishii
- Faculty of Health Sciences, Kyorin University, 5-4-1 Shimorenjaku, Mitaka, Tokyo 181-8612, Japan; (S.H.-I.); (A.S.)
| | - Atsuyoshi Shimada
- Faculty of Health Sciences, Kyorin University, 5-4-1 Shimorenjaku, Mitaka, Tokyo 181-8612, Japan; (S.H.-I.); (A.S.)
| | - Yoriyuki Nakamura
- Tea Science Center, University of Shizuoka, 52-1 Yada, Suruga-ku, Shizuoka 422-8526, Japan; (K.T.); (Y.N.)
| |
Collapse
|
5
|
Konishi T, Ohrui H. A distribution-dependent analysis of open-field test movies. CHEM-BIO INFORMATICS JOURNAL 2020. [DOI: 10.1273/cbij.20.44] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Affiliation(s)
- Tomokazu Konishi
- Graduate School of Bioresource Sciences, Akita Prefectural University
| | - Haruna Ohrui
- Graduate School of Bioresource Sciences, Akita Prefectural University
| |
Collapse
|
6
|
Konishi T. Concerns regarding the deterioration of objectivity in molecular biology . CHEM-BIO INFORMATICS JOURNAL 2018. [DOI: 10.1273/cbij.18.173] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Affiliation(s)
- Tomokazu Konishi
- Graduate School of Bioresource Sciences, Akita Prefectural University
| |
Collapse
|
7
|
Cell types differ in global coordination of splicing and proportion of highly expressed genes. Sci Rep 2016; 6:32249. [PMID: 27577089 PMCID: PMC5006053 DOI: 10.1038/srep32249] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2016] [Accepted: 08/01/2016] [Indexed: 01/01/2023] Open
Abstract
Balance in the transcriptome is regulated by coordinated synthesis and degradation of RNA molecules. Here we investigated whether mammalian cell types intrinsically differ in global coordination of gene splicing and expression levels. We analyzed RNA-seq transcriptome profiles of 8 different purified mouse cell types. We found that different cell types vary in proportion of highly expressed genes and the number of alternatively spliced transcripts expressed per gene, and that the cell types that express more variants of alternatively spliced transcripts per gene are those that have higher proportion of highly expressed genes. Cell types segregated into two clusters based on high or low proportion of highly expressed genes. Biological functions involved in negative regulation of gene expression were enriched in the group of cell types with low proportion of highly expressed genes, and biological functions involved in regulation of transcription and RNA splicing were enriched in the group of cell types with high proportion of highly expressed genes. Our findings show that cell types differ in proportion of highly expressed genes and the number of alternatively spliced transcripts expressed per gene, which represent distinct properties of the transcriptome and may reflect intrinsic differences in global coordination of synthesis, splicing, and degradation of RNA molecules.
Collapse
|
8
|
Konishi T. Parametric analysis of RNA-seq expression data. Genes Cells 2016; 21:639-47. [PMID: 27198878 DOI: 10.1111/gtc.12372] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2015] [Accepted: 04/04/2016] [Indexed: 11/30/2022]
Abstract
Various methods had been introduced for normalization and comparison of RNA-seq count data. However, they lacked objectivity because they based on ad hoc assumptions that were never verified their appropriateness. Here, we introduced a method that assumes parsimony models on data distribution; the assumptions were verified according to exploratory data analysis. As was expected, count data were lognormally distributed. The level of noise in recent data appeared to be much higher than those of microarrays. Still, the appropriate distribution model would improve certainty and accuracy of normalization, by finding out the reliable range of data. Primary cause of noise was not the principle of the methodology; that is, each read is a trial that which transcript is read. Rather, the cause would be overlooking of transcripts, and the overlooking occurred more often among lower range of data. To find out genes likely to be overlooked, number of replications would be more important than read depth, which will not prevent overlooking. Both signal and noise in the reliable range of data were distributed normally, showing the suitability to use generalized linear model to evaluate differences in expression levels. In the framework, normalized data can be compared and combined freely beyond studies.
Collapse
Affiliation(s)
- Tomokazu Konishi
- Faculty of Bioresource Sciences, Akita Prefectural University, Akita, 010-0195, Japan
| |
Collapse
|
9
|
A novel method of transcriptome interpretation reveals a quantitative suppressive effect on tomato immune signaling by two domains in a single pathogen effector protein. BMC Genomics 2016; 17:229. [PMID: 26976140 PMCID: PMC4790048 DOI: 10.1186/s12864-016-2534-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2015] [Accepted: 02/25/2016] [Indexed: 11/21/2022] Open
Abstract
Background Effector proteins are translocated into host cells by plant-pathogens to undermine pattern-triggered immunity (PTI), the plant response to microbe-associated molecular patterns that interferes with the infection process. Individual effectors are found in variable repertoires where some constituents target the same pathways. The effector protein AvrPto from Pseudomonas syringae has a core domain (CD) and C-terminal domain (CTD) that each promotes bacterial growth and virulence in tomato. The individual contributions of each domain and whether they act redundantly is unknown. Results We use RNA-Seq to elucidate the contribution of the CD and CTD to the suppression of PTI in tomato leaves 6 h after inoculation. Unexpectedly, each domain alters transcript levels of essentially the same genes but to a different degree. This difference, when quantified, reveals that although targeting the same host genes, the two domains act synergistically. AvrPto has a relatively greater effect on genes whose expression is suppressed during PTI, and the effect on these genes appears to be diminished by saturation. Conclusions RNA-Seq profiles can be used to observe relative contributions of effector subdomains to PTI suppression. Our analysis shows the CD and CTD multiplicatively affect the same gene transcript levels with a greater relative impact on genes whose expression is suppressed during PTI. The higher degree of up-regulation versus down-regulation during PTI is plausibly an evolutionary adaptation against effectors that target immune signaling. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2534-4) contains supplementary material, which is available to authorized users.
Collapse
|
10
|
Gokhale S, Nyayanit D, Gadgil C. A systems view of the protein expression process. SYSTEMS AND SYNTHETIC BIOLOGY 2011. [PMID: 23205157 DOI: 10.1007/s11693-011-9088-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
UNLABELLED Many biological processes are regulated by changing the concentration and activity of proteins. The presence of a protein at a given subcellular location at a given time with a certain conformation is the result of an apparently sequential process. The rate of protein formation is influenced by chromatin state, and the rates of transcription, translation, and degradation. There is an exquisite control system where each stage of the process is controlled both by seemingly unregulated proteins as well as through feedbacks mediated by RNA and protein products. Here we review the biological facts and mathematical models for each stage of the protein production process. We conclude that advances in experimental techniques leading to a detailed description of the process have not been matched by mathematical models that represent the details of the process and facilitate analysis. Such an exercise is the first step towards development of a framework for a systems biology analysis of the protein production process. ELECTRONIC SUPPLEMENTARY MATERIAL The online version of this article (doi:10.1007/s11693-011-9088-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sucheta Gokhale
- Chemical Engineering Division, CSIR-National Chemical Laboratory, Pune, 411008 India
| | | | | |
Collapse
|
11
|
Using temporal correlation in factor analysis for reconstructing transcription factor activities. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2010:172840. [PMID: 18604288 DOI: 10.1155/2008/172840] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2007] [Accepted: 04/13/2008] [Indexed: 11/17/2022]
Abstract
Two-level gene regulatory networks consist of the transcription factors (TFs) in the top level and their regulated genes in the second level. The expression profiles of the regulated genes are the observed high-throughput data given by experiments such as microarrays. The activity profiles of the TFs are treated as hidden variables as well as the connectivity matrix that indicates the regulatory relationships of TFs with their regulated genes. Factor analysis (FA) as well as other methods, such as the network component algorithm, has been suggested for reconstructing gene regulatory networks and also for predicting TF activities. They have been applied to E. coli and yeast data with the assumption that these datasets consist of identical and independently distributed samples. Thus, the main drawback of these algorithms is that they ignore any time correlation existing within the TF profiles. In this paper, we extend previously studied FA algorithms to include time correlation within the transcription factors. At the same time, we consider connectivity matrices that are sparse in order to capture the existing sparsity present in gene regulatory networks. The TFs activity profiles obtained by this approach are significantly smoother than profiles from previous FA algorithms. The periodicities in profiles from yeast expression data become prominent in our reconstruction. Moreover, the strength of the correlation between time points is estimated and can be used to assess the suitability of the experimental time interval.
Collapse
|
12
|
Konishi T, Konishi F, Takasaki S, Inoue K, Nakayama K, Konagaya A. Coincidence between transcriptome analyses on different microarray platforms using a parametric framework. PLoS One 2008; 3:e3555. [PMID: 18958174 PMCID: PMC2570215 DOI: 10.1371/journal.pone.0003555] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2008] [Accepted: 10/05/2008] [Indexed: 12/03/2022] Open
Abstract
A parametric framework for the analysis of transcriptome data is demonstrated to yield coincident results when applied to data acquired using two different microarray platforms. Microarrays are widely employed to acquire transcriptome information, and several platforms of chips are currently in use. However, discrepancies among studies are frequently reported, particularly among those performed using different platforms, casting doubt on the reliability of collected data. The inconsistency among observations can be largely attributed to differences among the analytical frameworks employed for data analysis. The existing frameworks are based on different philosophies and yield different results, but all involve normalization against a standard determined from the data to be analyzed. In the present study, a parametric framework based on a strict model for normalization is applied to data acquired using several slide-glass-type chips and GeneChip. The model is based on a common statistical characteristic of microarray data, and each set of chip data is normalized on the basis of a linear relationship with this model. In the proposed framework, the expressional changes observed and genes selected are coincident between platforms, achieving superior universality of data compared to other frameworks.
Collapse
Affiliation(s)
- Tomokazu Konishi
- Department of Bioresource Sciences, Akita Prefectural University, Shimosinjyo Nakano, Akita, Japan.
| | | | | | | | | | | |
Collapse
|
13
|
Teif VB. General transfer matrix formalism to calculate DNA-protein-drug binding in gene regulation: application to OR operator of phage lambda. Nucleic Acids Res 2007; 35:e80. [PMID: 17526526 PMCID: PMC1920246 DOI: 10.1093/nar/gkm268] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2007] [Revised: 04/09/2007] [Accepted: 04/09/2007] [Indexed: 11/24/2022] Open
Abstract
The transfer matrix methodology is proposed as a systematic tool for the statistical-mechanical description of DNA-protein-drug binding involved in gene regulation. We show that a genetic system of several cis-regulatory modules is calculable using this method, considering explicitly the site-overlapping, competitive, cooperative binding of regulatory proteins, their multilayer assembly and DNA looping. In the methodological section, the matrix models are solved for the basic types of short- and long-range interactions between DNA-bound proteins, drugs and nucleosomes. We apply the matrix method to gene regulation at the O(R) operator of phage lambda. The transfer matrix formalism allowed the description of the lambda-switch at a single-nucleotide resolution, taking into account the effects of a range of inter-protein distances. Our calculations confirm previously established roles of the contact CI-Cro-RNAP interactions. Concerning long-range interactions, we show that while the DNA loop between the O(R) and O(L) operators is important at the lysogenic CI concentrations, the interference between the adjacent promoters P(R) and P(RM) becomes more important at small CI concentrations. A large change in the expression pattern may arise in this regime due to anticooperative interactions between DNA-bound RNA polymerases. The applicability of the matrix method to more complex systems is discussed.
Collapse
Affiliation(s)
- Vladimir B Teif
- Institute of Bioorganic Chemistry, Belarus National Academy of Sciences, Street Kuprevich 5/2, 220141, Minsk, Belarus.
| |
Collapse
|