1
|
Yang TH, Yu YH, Wu SH, Zhang FY. CFA: An explainable deep learning model for annotating the transcriptional roles of cis-regulatory modules based on epigenetic codes. Comput Biol Med 2023; 152:106375. [PMID: 36502693 DOI: 10.1016/j.compbiomed.2022.106375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/07/2022] [Accepted: 11/27/2022] [Indexed: 11/30/2022]
Abstract
Metazoa gene expression is controlled by modular DNA segments called cis-regulatory modules (CRMs). CRMs can convey promoter/enhancer/insulator roles, generating additional regulation layers in transcription. Experiments for understanding CRM roles are low-throughput and costly. Large-scale CRM function investigation still depends on computational methods. However, existing in silico tools only recognize enhancers or promoters exclusively, thus accumulating errors when considering CRM promoter/enhancer/insulator roles altogether. Currently, no algorithm can concurrently consider these CRM roles. In this research, we developed the CRM Function Annotator (CFA) model. CFA provides complete CRM transcriptional role labeling based on epigenetic profiling interpretation. We demonstrated that CFA achieves high performance (test macro auROC/auPRC = 94.1%/90.3%) and outperforms existing tools in promoter/enhancer/insulator identification. CFA is also inspected to recognize explainable epigenetic codes consistent with previous findings when labeling CRM roles. By considering the higher-order combinations of the epigenetic codes, CFA significantly reduces false-positive rates in CRM transcriptional role annotation. CFA is available at https://github.com/cobisLab/CFA/.
Collapse
Affiliation(s)
- Tzu-Hsien Yang
- Department of Biomedical Engineering, National Cheng Kung University, No. 1, University Road, Tainan 701, Taiwan.
| | - Yu-Huai Yu
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan.
| | - Sheng-Hang Wu
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan.
| | - Fang-Yuan Zhang
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan.
| |
Collapse
|
2
|
Yang TH, Yang YC, Tu KC. regCNN: identifying Drosophila genome-wide cis-regulatory modules via integrating the local patterns in epigenetic marks and transcription factor binding motifs. Comput Struct Biotechnol J 2022; 20:296-308. [PMID: 35035784 PMCID: PMC8724954 DOI: 10.1016/j.csbj.2021.12.015] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 12/10/2021] [Accepted: 12/10/2021] [Indexed: 11/20/2022] Open
Abstract
Transcription regulation in metazoa is controlled by the binding events of transcription factors (TFs) or regulatory proteins on specific modular DNA regulatory sequences called cis-regulatory modules (CRMs). Understanding the distributions of CRMs on a genomic scale is essential for constructing the metazoan transcriptional regulatory networks that help diagnose genetic disorders. While traditional reporter-assay CRM identification approaches can provide an in-depth understanding of functions of some CRM, these methods are usually cost-inefficient and low-throughput. It is generally believed that by integrating diverse genomic data, reliable CRM predictions can be made. Hence, researchers often first resort to computational algorithms for genome-wide CRM screening before specific experiments. However, current existing in silico methods for searching potential CRMs were restricted by low sensitivity, poor prediction accuracy, or high computation time from TFBS composition combinatorial complexity. To overcome these obstacles, we designed a novel CRM identification pipeline called regCNN by considering the base-by-base local patterns in TF binding motifs and epigenetic profiles. On the test set, regCNN shows an accuracy/auROC of 84.5%/92.5% in CRM identification. And by further considering local patterns in epigenetic profiles and TF binding motifs, it can accomplish 4.7% (92.5%–87.8%) improvement in the auROC value over the average value-based pure multi-layer perceptron model. We also demonstrated that regCNN outperforms all currently available tools by at least 11.3% in auROC values. Finally, regCNN is verified to be robust against its resizing window hyperparameter in dealing with the variable lengths of CRMs. The model of regCNN can be downloaded athttp://cobisHSS0.im.nuk.edu.tw/regCNN/.
Collapse
Affiliation(s)
- Tzu-Hsien Yang
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
| | - Ya-Chiao Yang
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
| | - Kai-Chi Tu
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
| |
Collapse
|
3
|
Yang TH. An Aggregation Method to Identify the RNA Meta-Stable Secondary Structure and its Functionally Interpretable Structure Ensemble. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:75-86. [PMID: 34014829 DOI: 10.1109/tcbb.2021.3082396] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
RNA can provide vital cellular functions through its secondary or tertiary structure. Due to the low-throughput nature of experimental approaches, studies on RNA structures mainly resort to computational methods. However, current existing tools fail to consider RNA structure ensembles and do not provide ways to decipher functional hypotheses for the new predictions. In this research, a novel method was proposed to identify the functionally interpretable structure ensemble of a given RNA sequence and provide the meta-stable structure, or the most frequently observed functional RNA cellular conformation, based on the ensemble. In the prediction of meta-stable structures, the proposed method outperformed existing tools on a yeast test set. The inferred functional aspects were then manually checked and demonstrated a micro-averaging F1 value of 0.92. Further, a biological example of the yeast ASH1-E1 element was discussed to articulate that these functional aspects can also suggest testable hypotheses. Then the proposed method was verified to be well applicable to other species through a human test set. Finally, the proposed method was demonstrated to show resistance to sequence length-dependent performance deterioration.
Collapse
|
4
|
Yang TH, Wang CY, Tsai HC, Liu CT. Human IRES Atlas: an integrative platform for studying IRES-driven translational regulation in humans. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2021:6263636. [PMID: 33942874 PMCID: PMC8094437 DOI: 10.1093/database/baab025] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 04/16/2021] [Accepted: 04/23/2021] [Indexed: 11/13/2022]
Abstract
It is now known that cap-independent translation initiation facilitated by internal ribosome entry sites (IRESs) is vital in selective cellular protein synthesis under stress and different physiological conditions. However, three problems make it hard to understand transcriptome-wide cellular IRES-mediated translation initiation mechanisms: (i) complex interplay between IRESs and other translation initiation–related information, (ii) reliability issue of in silico cellular IRES investigation and (iii) labor-intensive in vivo IRES identification. In this research, we constructed the Human IRES Atlas database for a comprehensive understanding of cellular IRESs in humans. First, currently available and suitable IRES prediction tools (IRESfinder, PatSearch and IRESpy) were used to obtain transcriptome-wide human IRESs. Then, we collected eight genres of translation initiation–related features to help study the potential molecular mechanisms of each of the putative IRESs. Three functional tests (conservation, structural RNA–protein scores and conditional translation efficiency) were devised to evaluate the functionality of the identified putative IRESs. Moreover, an easy-to-use interface and an IRES–translation initiation interaction map for each gene transcript were implemented to help understand the interactions between IRESs and translation initiation–related features. Researchers can easily search/browse an IRES of interest using the web interface and deduce testable mechanism hypotheses of human IRES-driven translation initiation based on the integrated results. In summary, Human IRES Atlas integrates putative IRES elements and translation initiation–related experiments for better usage of these data and deduction of mechanism hypotheses. Database URL: http://cobishss0.im.nuk.edu.tw/Human_IRES_Atlas/
Collapse
Affiliation(s)
- Tzu-Hsien Yang
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung, Taiwan 811, Republic of China
| | - Chung-Yu Wang
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung, Taiwan 811, Republic of China
| | - Hsiu-Chun Tsai
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung, Taiwan 811, Republic of China
| | - Cheng-Tse Liu
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung, Taiwan 811, Republic of China
| |
Collapse
|
5
|
López Y, Vandenbon A, Nose A, Nakai K. Modeling the cis-regulatory modules of genes expressed in developmental stages of Drosophila melanogaster. PeerJ 2017; 5:e3389. [PMID: 28584716 PMCID: PMC5452948 DOI: 10.7717/peerj.3389] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Accepted: 05/08/2017] [Indexed: 12/30/2022] Open
Abstract
Because transcription is the first step in the regulation of gene expression, understanding how transcription factors bind to their DNA binding motifs has become absolutely necessary. It has been shown that the promoters of genes with similar expression profiles share common structural patterns. This paper presents an extensive study of the regulatory regions of genes expressed in 24 developmental stages of Drosophila melanogaster. It proposes the use of a combination of structural features, such as positioning of individual motifs relative to the transcription start site, orientation, pairwise distance between motifs, and presence of motifs anywhere in the promoter for predicting gene expression from structural features of promoter sequences. RNA-sequencing data was utilized to create and validate the 24 models. When genes with high-scoring promoters were compared to those identified by RNA-seq samples, 19 (79.2%) statistically significant models, a number that exceeds previous studies, were obtained. Each model yielded a set of highly informative features, which were used to search for genes with similar biological functions.
Collapse
Affiliation(s)
- Yosvany López
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan.,Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Alexis Vandenbon
- Immunology Frontier Research Center, Osaka University, Osaka, Japan
| | - Akinao Nose
- Department of Complexity Science and Engineering, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Kenta Nakai
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
6
|
Wu WS, Wang CC, Jhou MJ, Wang YC. YAGM: a web tool for mining associated genes in yeast based on diverse biological associations. BMC SYSTEMS BIOLOGY 2015; 9 Suppl 6:S1. [PMID: 26678566 PMCID: PMC4674844 DOI: 10.1186/1752-0509-9-s6-s1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Background Investigating association between genes can be used in understanding the relations of genes in biological processes. STRING and GeneMANIA are two well-known web tools which can provide a list of associated genes of a query gene based on diverse biological associations such as co-expression, co-localization, co-citation and so on. However, the transcriptional regulation association and mutant phenotype association have not been used in these two web tools. Since the comprehensive transcription factor (TF)-gene binding data, TF-gene regulation data and mutant phenotype data are available in yeast, we developed a web tool called YAGM (Yeast Associated Genes Miner) which constructed the transcriptional regulation association, mutant phenotype association and five commonly used biological associations to mine a list of associated genes of a query yeast gene. Description In YAGM, we collected seven kinds of datasets including TF-gene binding (TFB) data, TF-gene regulation (TFR) data, mutant phenotype (MP) data, functional annotation (FA) data, physical interaction (PI) data, genetic interaction (GI) data, and literature evidence (LE) data. Then by using the hypergeometric test to calculate the association scores of all gene pairs in yeast, we constructed seven biological associations including two transcriptional regulation associations (TFB association and TFR association), MP association, FA association, PI association, GI association, and LE association. Moreover, the expression profile association from SPELL database was also included in YAGM. When using YAGM, users can input a query gene and choose any possible subsets of the eight biological associations, then a list of associated genes of the query gene will be returned based on the chosen biological associations. Conclusions In this study, we presented the YAGM which provides eight biological associations for mining associated genes of a query gene in yeast. Among the eight biological associations constructed in YAGM, three (TFB association, TFR association, and MP association) are novel ones. By comparing the query results of two well-known web tools (STRING and GeneMANIA), we found that YAGM can find out distinct associated genes of a query gene. That is, YAGM can provide alternative candidates of associated genes for biologists to do further experimental investigation. We believe that YAGM will be a useful web tool for yeast biologists. YAGM is available online at http://cosbi3.ee.ncku.edu.tw/yagm/.
Collapse
|
7
|
Lai FJ, Chang HT, Wu WS. PCTFPeval: a web tool for benchmarking newly developed algorithms for predicting cooperative transcription factor pairs in yeast. BMC Bioinformatics 2015; 16 Suppl 18:S2. [PMID: 26677932 PMCID: PMC4682397 DOI: 10.1186/1471-2105-16-s18-s2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Background Computational identification of cooperative transcription factor (TF) pairs helps understand the combinatorial regulation of gene expression in eukaryotic cells. Many advanced algorithms have been proposed to predict cooperative TF pairs in yeast. However, it is still difficult to conduct a comprehensive and objective performance comparison of different algorithms because of lacking sufficient performance indices and adequate overall performance scores. To solve this problem, in our previous study (published in BMC Systems Biology 2014), we adopted/proposed eight performance indices and designed two overall performance scores to compare the performance of 14 existing algorithms for predicting cooperative TF pairs in yeast. Most importantly, our performance comparison framework can be applied to comprehensively and objectively evaluate the performance of a newly developed algorithm. However, to use our framework, researchers have to put a lot of effort to construct it first. To save researchers time and effort, here we develop a web tool to implement our performance comparison framework, featuring fast data processing, a comprehensive performance comparison and an easy-to-use web interface. Results The developed tool is called PCTFPeval (Predicted Cooperative TF Pair evaluator), written in PHP and Python programming languages. The friendly web interface allows users to input a list of predicted cooperative TF pairs from their algorithm and select (i) the compared algorithms among the 15 existing algorithms, (ii) the performance indices among the eight existing indices, and (iii) the overall performance scores from two possible choices. The comprehensive performance comparison results are then generated in tens of seconds and shown as both bar charts and tables. The original comparison results of each compared algorithm and each selected performance index can be downloaded as text files for further analyses. Conclusions Allowing users to select eight existing performance indices and 15 existing algorithms for comparison, our web tool benefits researchers who are eager to comprehensively and objectively evaluate the performance of their newly developed algorithm. Thus, our tool greatly expedites the progress in the research of computational identification of cooperative TF pairs.
Collapse
|
8
|
Ranganathan S, Tan TW, Schönbach C. InCoB2014: Systems Biology update from the Asia-Pacific. Introduction. BMC SYSTEMS BIOLOGY 2014; 8 Suppl 4:I1. [PMID: 25521591 PMCID: PMC4290681 DOI: 10.1186/1752-0509-8-s4-i1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Selected papers from the 13th International Conference on Bioinformatics (InCoB2014), July 31-2 August, 2014 in Sydney, Australia have been compiled in this supplement. These range from network analysis and gene regulatory networks to systems level biological analysis, providing the 2014 update to InCoB's computational systems biology research.
Collapse
Affiliation(s)
- Shoba Ranganathan
- Department of Chemistry and Biomolecular Sciences and ARC Centre of Excellence in Bioinformatics, Macquarie University, Sydney NSW 2109, Australia
| | - Tin Wee Tan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117599
| | - Christian Schönbach
- Department of Biology, School of Science and Technology, Nazarbayev University, Astana 010000, Republic of Kazakhstan
- Center for AIDS Research, Kumamoto University, Kumamoto 860-0811, Japan
| |
Collapse
|