1
|
Castellana S, Biagini T, Parca L, Petrizzelli F, Bianco SD, Vescovi AL, Carella M, Mazza T. A comparative benchmark of classic DNA motif discovery tools on synthetic data. Brief Bioinform 2021; 22:6341664. [PMID: 34351399 DOI: 10.1093/bib/bbab303] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 07/08/2021] [Accepted: 07/15/2021] [Indexed: 01/01/2023] Open
Abstract
Hundreds of human proteins were found to establish transient interactions with rather degenerated consensus DNA sequences or motifs. Identifying these motifs and the genomic sites where interactions occur represent one of the most challenging research goals in modern molecular biology and bioinformatics. The last twenty years witnessed an explosion of computational tools designed to perform this task, whose performance has been last compared fifteen years ago. Here, we survey sixteen of them, benchmark their ability to identify known motifs nested in twenty-nine simulated sequence datasets, and finally report their strengths, weaknesses, and complementarity.
Collapse
Affiliation(s)
- Stefano Castellana
- Bioinformatics Unit, IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo 71013, Italy
| | - Tommaso Biagini
- Bioinformatics Unit, IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo 71013, Italy
| | - Luca Parca
- Bioinformatics Unit, IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo 71013, Italy
| | - Francesco Petrizzelli
- Bioinformatics Unit, IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo 71013, Italy.,Department of Experimental Medicine, Sapienza University of Rome, Rome 00161, Italy
| | | | - Angelo Luigi Vescovi
- ISBReMIT Institute for Stem Cell Biology, Regenerative Medicine and Innovative Therapies, IRCSS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), 71013, Italy
| | - Massimo Carella
- Medical Genetics Unit, IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo 71013, Italy
| | - Tommaso Mazza
- Bioinformatics Unit, IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo 71013, Italy
| |
Collapse
|
2
|
Ma A, Wang C, Chang Y, Brennan FH, McDermaid A, Liu B, Zhang C, Popovich PG, Ma Q. IRIS3: integrated cell-type-specific regulon inference server from single-cell RNA-Seq. Nucleic Acids Res 2020; 48:W275-W286. [PMID: 32421805 PMCID: PMC7319566 DOI: 10.1093/nar/gkaa394] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 04/25/2020] [Accepted: 05/04/2020] [Indexed: 12/21/2022] Open
Abstract
A group of genes controlled as a unit, usually by the same repressor or activator gene, is known as a regulon. The ability to identify active regulons within a specific cell type, i.e., cell-type-specific regulons (CTSR), provides an extraordinary opportunity to pinpoint crucial regulators and target genes responsible for complex diseases. However, the identification of CTSRs from single-cell RNA-Seq (scRNA-Seq) data is computationally challenging. We introduce IRIS3, the first-of-its-kind web server for CTSR inference from scRNA-Seq data for human and mouse. IRIS3 is an easy-to-use server empowered by over 20 functionalities to support comprehensive interpretations and graphical visualizations of identified CTSRs. CTSR data can be used to reliably characterize and distinguish the corresponding cell type from others and can be combined with other computational or experimental analyses for biomedical studies. CTSRs can, therefore, aid in the discovery of major regulatory mechanisms and allow reliable constructions of global transcriptional regulation networks encoded in a specific cell type. The broader impact of IRIS3 includes, but is not limited to, investigation of complex diseases hierarchies and heterogeneity, causal gene regulatory network construction, and drug development. IRIS3 is freely accessible from https://bmbl.bmi.osumc.edu/iris3/ with no login requirement.
Collapse
Affiliation(s)
- Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Cankun Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Yuzhou Chang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Faith H Brennan
- Department of Neuroscience, Center for Brain and Spinal Cord Repair, Belford Center for Spinal Cord Injury, The Ohio State University Wexner Medical Center, Columbus, OH 43210, USA
| | - Adam McDermaid
- Imagenetics, Sanford Health, Sioux Falls, SD 57104, USA.,Department of Internal Medicine, Sanford School of Medicine, University of South Dakota, Vermillion, SD 57069, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan 250100, China
| | - Chi Zhang
- Department of Medical & Molecular Genetics, Indiana University, School of Medicine, Indianapolis, IN 46202, USA
| | - Phillip G Popovich
- Department of Neuroscience, Center for Brain and Spinal Cord Repair, Belford Center for Spinal Cord Injury, The Ohio State University Wexner Medical Center, Columbus, OH 43210, USA
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
3
|
Xu ZC, Feng PM, Yang H, Qiu WR, Chen W, Lin H. iRNAD: a computational tool for identifying D modification sites in RNA sequence. Bioinformatics 2020; 35:4922-4929. [PMID: 31077296 DOI: 10.1093/bioinformatics/btz358] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 03/01/2019] [Accepted: 04/27/2019] [Indexed: 12/19/2022] Open
Abstract
MOTIVATION Dihydrouridine (D) is a common RNA post-transcriptional modification found in eukaryotes, bacteria and a few archaea. The modification can promote the conformational flexibility of individual nucleotide bases. And its levels are increased in cancerous tissues. Therefore, it is necessary to detect D in RNA for further understanding its functional roles. Since wet-experimental techniques for the aim are time-consuming and laborious, it is urgent to develop computational models to identify D modification sites in RNA. RESULTS We constructed a predictor, called iRNAD, for identifying D modification sites in RNA sequence. In this predictor, the RNA samples derived from five species were encoded by nucleotide chemical property and nucleotide density. Support vector machine was utilized to perform the classification. The final model could produce the overall accuracy of 96.18% with the area under the receiver operating characteristic curve of 0.9839 in jackknife cross-validation test. Furthermore, we performed a series of validations from several aspects and demonstrated the robustness and reliability of the proposed model. AVAILABILITY AND IMPLEMENTATION A user-friendly web-server called iRNAD can be freely accessible at http://lin-group.cn/server/iRNAD, which will provide convenience and guide to users for further studying D modification.
Collapse
Affiliation(s)
- Zhao-Chun Xu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.,Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Peng-Mian Feng
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wang-Ren Qiu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
4
|
Cao H, Ma Q, Chen X, Xu Y. DOOR: a prokaryotic operon database for genome analyses and functional inference. Brief Bioinform 2020; 20:1568-1577. [PMID: 28968679 DOI: 10.1093/bib/bbx088] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Revised: 06/13/2017] [Indexed: 11/14/2022] Open
Abstract
The rapid accumulation of fully sequenced prokaryotic genomes provides unprecedented information for biological studies of bacterial and archaeal organisms in a systematic manner. Operons are the basic functional units for conducting such studies. Here, we review an operon database DOOR (the Database of prOkaryotic OpeRons) that we have previously developed and continue to update. Currently, the database contains 6 975 454 computationally predicted operons in 2072 complete genomes. In addition, the database also contains the following information: (i) transcriptional units for 24 genomes derived using publicly available transcriptomic data; (ii) orthologous gene mapping across genomes; (iii) 6408 cis-regulatory motifs for transcriptional factors of some operons for 203 genomes; (iv) 3 456 718 Rho-independent terminators for 2072 genomes; as well as (v) a suite of tools in support of applications of the predicted operons. In this review, we will explain how such data are computationally derived and demonstrate how they can be used to derive a wide range of higher-level information needed for systems biology studies to tackle complex and fundamental biology questions.
Collapse
|
5
|
Li HF, Wang XF, Tang H. Predicting Bacteriophage Enzymes and Hydrolases by Using Combined Features. Front Bioeng Biotechnol 2020; 8:183. [PMID: 32266225 PMCID: PMC7105632 DOI: 10.3389/fbioe.2020.00183] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2020] [Accepted: 02/24/2020] [Indexed: 12/19/2022] Open
Abstract
Bacteriophage is a type of virus that could infect the host bacteria. They have been applied in the treatment of pathogenic bacterial infection. Phage enzymes and hydrolases play the most important role in the destruction of bacterial cells. Correctly identifying the hydrolases coded by phage is not only beneficial to their function study, but also conducive to antibacteria drug discovery. Thus, this work aims to recognize the enzymes and hydrolases in phage. A combination of different features was used to represent samples of phage and hydrolase. A feature selection technique called analysis of variance was developed to optimize features. The classification was performed by using support vector machine (SVM). The prediction process includes two steps. The first step is to identify phage enzymes. The second step is to determine whether a phage enzyme is hydrolase or not. The jackknife cross-validated results showed that our method could produce overall accuracies of 85.1 and 94.3%, respectively, for the two predictions, demonstrating that the proposed method is promising.
Collapse
Affiliation(s)
- Hong-Fei Li
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou, China.,School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Xian-Fang Wang
- School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Hua Tang
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou, China
| |
Collapse
|
6
|
Comparison of High-Throughput Sequencing for Phage Display Peptide Screening on Two Commercially Available Platforms. Int J Pept Res Ther 2020. [DOI: 10.1007/s10989-019-09858-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
7
|
Wang F, Guan ZX, Dao FY, Ding H. A Brief Review of the Computational Identification of Antifreeze Protein. CURR ORG CHEM 2019. [DOI: 10.2174/1385272823666190718145613] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Lots of cold-adapted organisms could produce antifreeze proteins (AFPs) to counter the freezing of cell fluids by controlling the growth of ice crystal. AFPs have been found in various species such as in vertebrates, invertebrates, plants, bacteria, and fungi. These AFPs from fish, insects and plants displayed a high diversity. Thus, the identification of the AFPs is a challenging task in computational proteomics. With the accumulation of AFPs and development of machine meaning methods, it is possible to construct a high-throughput tool to timely identify the AFPs. In this review, we briefly reviewed the application of machine learning methods in antifreeze proteins identification from difference section, including published benchmark dataset, sequence descriptor, classification algorithms and published methods. We hope that this review will produce new ideas and directions for the researches in identifying antifreeze proteins.
Collapse
Affiliation(s)
- Fang Wang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
8
|
Chen W, Song X, Lv H, Lin H. iRNA-m2G: Identifying N 2-methylguanosine Sites Based on Sequence-Derived Information. MOLECULAR THERAPY-NUCLEIC ACIDS 2019; 18:253-258. [PMID: 31581049 PMCID: PMC6796771 DOI: 10.1016/j.omtn.2019.08.023] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 08/06/2019] [Accepted: 08/19/2019] [Indexed: 12/11/2022]
Abstract
RNA N2-methylguanosine (m2G) is one kind of posttranscriptional modification and plays crucial roles in the control and stabilization of tRNA. However, our knowledge about the biological functions of m2G is still limited. The key step of revealing its new function is to recognize the m2G sites in the transcriptome. Since there is no effective method for detecting m2G sites, it is desirable to develop new methods to identify m2G sites. In this study, a computational predictor called iRNA-m2G was proposed to identify m2G sites in eukaryotic transcriptomes. In iRNA-m2G, the RNA sequences were encoded by using nucleotide chemical property and accumulated nucleotide frequency. iRNA-m2G was not only validated by the rigorous jackknife test on the benchmark dataset but also examined by performing cross-species validations. In addition, iRNA-m2G was also tested on an independent dataset. It was found that the accuracies obtained by iRNA-m2G were all quite promising in these tests, indicating that the proposed method could become a powerful tool for identifying m2G sites. Finally, a user-friendly web server for iRNA-m2G is freely accessible at http://lin-group.cn/server/iRNA-m2G.php.
Collapse
Affiliation(s)
- Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China; Center for Genomics and Computational Biology, School of Life Sciences, North China University of Science and Technology, Tangshan 063000, China.
| | - Xiaoming Song
- Center for Genomics and Computational Biology, School of Life Sciences, North China University of Science and Technology, Tangshan 063000, China
| | - Hao Lv
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
9
|
Liu B, Han L, Liu X, Wu J, Ma Q. Computational Prediction of Sigma-54 Promoters in Bacterial Genomes by Integrating Motif Finding and Machine Learning Strategies. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1211-1218. [PMID: 29993815 DOI: 10.1109/tcbb.2018.2816032] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Sigma factor, as a unit of RNA polymerase holoenzyme, is a critical factor in the process of gene transcriptional regulation. It recognizes the specific DNA sites and brings the core enzyme of RNA polymerase to the upstream regions of target genes. Therefore, the prediction of the promoters for a particular sigma factor is essential for interpreting functional genomic data and observation. This paper develops a new method to predict sigma-54 promoters in bacterial genomes. The new method organically integrates motif finding and machine learning strategies to capture the intrinsic features of sigma-54 promoters. The experiments on E. coli benchmark test set show that our method has good capability to distinguish sigma-54 promoters from surrounding or randomly selected DNA sequences. The applications of the other three bacterial genomes indicate the potential robustness and applicative power of our method on a large number of bacterial genomes. The source code of our method can be freely downloaded at https://github.com/maqin2001/PromotePredictor.
Collapse
|
10
|
Lai HY, Zhang ZY, Su ZD, Su W, Ding H, Chen W, Lin H. iProEP: A Computational Predictor for Predicting Promoter. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 17:337-346. [PMID: 31299595 PMCID: PMC6616480 DOI: 10.1016/j.omtn.2019.05.028] [Citation(s) in RCA: 103] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2019] [Revised: 05/18/2019] [Accepted: 05/19/2019] [Indexed: 11/29/2022]
Abstract
Promoter is a fundamental DNA element located around the transcription start site (TSS) and could regulate gene transcription. Promoter recognition is of great significance in determining transcription units, studying gene structure, analyzing gene regulation mechanisms, and annotating gene functional information. Many models have already been proposed to predict promoters. However, the performances of these methods still need to be improved. In this work, we combined pseudo k-tuple nucleotide composition (PseKNC) with position-correlation scoring function (PCSF) to formulate promoter sequences of Homo sapiens (H. sapiens), Drosophila melanogaster (D. melanogaster), Caenorhabditis elegans (C. elegans), Bacillus subtilis (B. subtilis), and Escherichia coli (E. coli). Minimum Redundancy Maximum Relevance (mRMR) algorithm and increment feature selection strategy were then adopted to find out optimal feature subsets. Support vector machine (SVM) was used to distinguish between promoters and non-promoters. In the 10-fold cross-validation test, accuracies of 93.3%, 93.9%, 95.7%, 95.2%, and 93.1% were obtained for H. sapiens, D. melanogaster, C. elegans, B. subtilis, and E. coli, with the areas under receiver operating curves (AUCs) of 0.974, 0.975, 0.981, 0.988, and 0.976, respectively. Comparative results demonstrated that our method outperforms existing methods for identifying promoters. An online web server was established that can be freely accessed (http://lin-group.cn/server/iProEP/).
Collapse
Affiliation(s)
- Hong-Yan Lai
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zhao-Yue Zhang
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zhen-Dong Su
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Su
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Chen
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China; Center for Genomics and Computational Biology, School of Life Sciences, North China University of Science and Technology, Tangshan 063000, China.
| | - Hao Lin
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
11
|
Tran NTL, Huang CH. Performance evaluation for MOTIFSIM. Biol Proced Online 2018; 20:23. [PMID: 30574025 PMCID: PMC6299673 DOI: 10.1186/s12575-018-0088-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2018] [Accepted: 12/07/2018] [Indexed: 11/10/2022] Open
Abstract
Background Previous studies show various results obtained from different motif finders for an identical dataset. This is largely due to the fact that these tools use different strategies and possess unique features for discovering the motifs. Hence, using multiple tools and methods has been suggested because the motifs commonly reported by them are more likely to be biologically significant. Results The common significant motifs from multiple tools can be obtained by using MOTIFSIM tool. In this work, we evaluated the performance of MOTIFSIM in three aspects. First, we compared the pair-wise comparison technique of MOTIFSIM with the un-gapped Smith-Waterman algorithm and four common distance metrics: average Kullback-Leibler, average log-likelihood ratio, Chi-Square distance, and Pearson Correlation Coefficient. Second, we compared the performance of MOTIFSIM with RSAT Matrix-clustering tool for motif clustering. Lastly, we evaluated the performances of nineteen motif finders and the reliability of MOTIFSIM for identifying the common significant motifs from multiple tools. Conclusions The pair-wise comparison results reveal that MOTIFSIM attains better performance than the un-gapped Smith-Waterman algorithm and four distance metrics. The clustering results also demonstrate that MOTIFSIM achieves similar or even better performance than RSAT Matrix-clustering. Furthermore, the findings indicate if the motif detection does not require a special tool for detecting a specific type of motif then using multiple motif finders and combining with MOTIFSIM for obtaining the common significant motifs, it improved the results for DNA motif detection. Electronic supplementary material The online version of this article (10.1186/s12575-018-0088-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ngoc Tam L Tran
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269 USA
| | - Chun-Hsi Huang
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269 USA
| |
Collapse
|
12
|
Lee NK, Li X, Wang D. A comprehensive survey on genetic algorithms for DNA motif prediction. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.07.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
13
|
Chen X, Ma A, McDermaid A, Zhang H, Liu C, Cao H, Ma Q. RECTA: Regulon Identification Based on Comparative Genomics and Transcriptomics Analysis. Genes (Basel) 2018; 9:genes9060278. [PMID: 29849014 PMCID: PMC6027394 DOI: 10.3390/genes9060278] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Revised: 05/19/2018] [Accepted: 05/25/2018] [Indexed: 11/16/2022] Open
Abstract
Regulons, which serve as co-regulated gene groups contributing to the transcriptional regulation of microbial genomes, have the potential to aid in understanding of underlying regulatory mechanisms. In this study, we designed a novel computational pipeline, regulon identification based on comparative genomics and transcriptomics analysis (RECTA), for regulon prediction related to the gene regulatory network under certain conditions. To demonstrate the effectiveness of this tool, we implemented RECTA on Lactococcus lactis MG1363 data to elucidate acid-response regulons. A total of 51 regulons were identified, 14 of which have computational-verified significance. Among these 14 regulons, five of them were computationally predicted to be connected with acid stress response. Validated by literature, 33 genes in Lactococcus lactis MG1363 were found to have orthologous genes which were associated with six regulons. An acid response related regulatory network was constructed, involving two trans-membrane proteins, eight regulons (llrA, llrC, hllA, ccpA, NHP6A, rcfB, regulons #8 and #39), nine functional modules, and 33 genes with orthologous genes known to be associated with acid stress. The predicted response pathways could serve as promising candidates for better acid tolerance engineering in Lactococcus lactis. Our RECTA pipeline provides an effective way to construct a reliable gene regulatory network through regulon elucidation, and has strong application power and can be effectively applied to other bacterial genomes where the elucidation of the transcriptional regulation network is needed.
Collapse
Affiliation(s)
- Xin Chen
- Center for Applied Mathematics, Tianjin University, Tianjin 300072, China.
| | - Anjun Ma
- Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD 57006, USA.
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD 57006, USA.
| | - Adam McDermaid
- Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD 57006, USA.
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD 57006, USA.
| | - Hanyuan Zhang
- College of Computer Science and Engineering, University of Nebraska Lincoln, Lincoln, NE 68588, USA.
| | - Chao Liu
- Shandong Provincial Hospital affiliated to Shandong University, Jinan 250021, China.
| | - Huansheng Cao
- Center for Fundamental and Applied Microbiomics, Biodesign Institute, Arizona State University, Tempe, AZ 85287, USA.
| | - Qin Ma
- Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD 57006, USA.
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD 57006, USA.
| |
Collapse
|
14
|
Differential RNA Sequencing Implicates Sulfide as the Master Regulator of S 0 Metabolism in Chlorobaculum tepidum and Other Green Sulfur Bacteria. Appl Environ Microbiol 2018; 84:AEM.01966-17. [PMID: 29150516 DOI: 10.1128/aem.01966-17] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Accepted: 11/14/2017] [Indexed: 12/23/2022] Open
Abstract
The green sulfur bacteria (Chlorobiaceae) are anaerobes that use electrons from reduced sulfur compounds (sulfide, S0, and thiosulfate) as electron donors for photoautotrophic growth. Chlorobaculum tepidum, the model system for the Chlorobiaceae, both produces and consumes extracellular S0 globules depending on the availability of sulfide in the environment. These physiological changes imply significant changes in gene regulation, which has been observed when sulfide is added to Cba. tepidum growing on thiosulfate. However, the underlying mechanisms driving these gene expression changes, i.e., the specific regulators and promoter elements involved, have not yet been defined. Here, differential RNA sequencing (dRNA-seq) was used to globally identify transcript start sites (TSS) that were present during growth on sulfide, biogenic S0, and thiosulfate as sole electron donors. TSS positions were used in combination with RNA-seq data from cultures growing on these same electron donors to identify both basal promoter elements and motifs associated with electron donor-dependent transcriptional regulation. These motifs were conserved across homologous Chlorobiaceae promoters. Two lines of evidence suggest that sulfide-mediated repression is the dominant regulatory mode in Cba. tepidum First, motifs associated with genes regulated by sulfide overlap key basal promoter elements. Second, deletion of the Cba. tepidum1277 (CT1277) gene, encoding a putative regulatory protein, leads to constitutive overexpression of the sulfide:quinone oxidoreductase CT1087 in the absence of sulfide. The results suggest that sulfide is the master regulator of sulfur metabolism in Cba. tepidum and the Chlorobiaceae Finally, the identification of basal promoter elements with differing strengths will further the development of synthetic biology in Cba. tepidum and perhaps other ChlorobiaceaeIMPORTANCE Elemental sulfur is a key intermediate in biogeochemical sulfur cycling. The photoautotrophic green sulfur bacterium Chlorobaculum tepidum either produces or consumes elemental sulfur depending on the availability of sulfide in the environment. Our results reveal transcriptional dynamics of Chlorobaculum tepidum on elemental sulfur and increase our understanding of the mechanisms of transcriptional regulation governing growth on different reduced sulfur compounds. This report identifies genes and sequence motifs that likely play significant roles in the production and consumption of elemental sulfur. Beyond this focused impact, this report paves the way for the development of synthetic biology in Chlorobaculum tepidum and other Chlorobiaceae by providing a comprehensive identification of promoter elements for control of gene expression, a key element of strain engineering.
Collapse
|
15
|
Fang JS, Coon BG, Gillis N, Chen Z, Qiu J, Chittenden TW, Burt JM, Schwartz MA, Hirschi KK. Shear-induced Notch-Cx37-p27 axis arrests endothelial cell cycle to enable arterial specification. Nat Commun 2017; 8:2149. [PMID: 29247167 PMCID: PMC5732288 DOI: 10.1038/s41467-017-01742-7] [Citation(s) in RCA: 159] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Accepted: 10/13/2017] [Indexed: 01/26/2023] Open
Abstract
Establishment of a functional vascular network is rate-limiting in embryonic development, tissue repair and engineering. During blood vessel formation, newly generated endothelial cells rapidly expand into primitive plexi that undergo vascular remodeling into circulatory networks, requiring coordinated growth inhibition and arterial-venous specification. Whether the mechanisms controlling endothelial cell cycle arrest and acquisition of specialized phenotypes are interdependent is unknown. Here we demonstrate that fluid shear stress, at arterial flow magnitudes, maximally activates NOTCH signaling, which upregulates GJA4 (commonly, Cx37) and downstream cell cycle inhibitor CDKN1B (p27). Blockade of any of these steps causes hyperproliferation and loss of arterial specification. Re-expression of GJA4 or CDKN1B, or chemical cell cycle inhibition, restores endothelial growth control and arterial gene expression. Thus, we elucidate a mechanochemical pathway in which arterial shear activates a NOTCH-GJA4-CDKN1B axis that promotes endothelial cell cycle arrest to enable arterial gene expression. These insights will guide vascular regeneration and engineering.
Collapse
Affiliation(s)
- Jennifer S Fang
- Department of Medicine, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
- Yale Cardiovascular Research Center, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
- Vascular Biology and Therapeutics Program, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
- Yale Stem Cell Center, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
| | - Brian G Coon
- Department of Medicine, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
- Yale Cardiovascular Research Center, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
- Vascular Biology and Therapeutics Program, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
| | - Noelle Gillis
- Department of Medicine, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
- Yale Cardiovascular Research Center, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
- Vascular Biology and Therapeutics Program, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
- Yale Stem Cell Center, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
| | - Zehua Chen
- Computational Statistics and Bioinformatics Group, Advanced Artificial Intelligence Research Laboratory, WuXi NextCODE 55 Cambridge Parkway, 8th Floor, Cambridge, MA, 02142, USA
| | - Jingyao Qiu
- Department of Medicine, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
- Yale Cardiovascular Research Center, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
- Vascular Biology and Therapeutics Program, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
- Yale Stem Cell Center, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
- Department of Genetics, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
| | - Thomas W Chittenden
- Computational Statistics and Bioinformatics Group, Advanced Artificial Intelligence Research Laboratory, WuXi NextCODE 55 Cambridge Parkway, 8th Floor, Cambridge, MA, 02142, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, A-111, 25 Shattuck Street, Boston, MA, 02115, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, 21 Ames Street #56-651, Cambridge, MA, 02142, USA
| | - Janis M Burt
- Department of Physiology, College of Medicine, The University of Arizona, 1501 N. Campbell Road, Tucson, AZ, 85724, USA
| | - Martin A Schwartz
- Department of Medicine, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
- Yale Cardiovascular Research Center, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
- Vascular Biology and Therapeutics Program, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
- Department of Cell Biology, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
- Department of Biomedical Engineering, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
| | - Karen K Hirschi
- Department of Medicine, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA.
- Yale Cardiovascular Research Center, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA.
- Vascular Biology and Therapeutics Program, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA.
- Yale Stem Cell Center, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA.
- Department of Genetics, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA.
- Department of Biomedical Engineering, Yale University School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA.
| |
Collapse
|
16
|
Castro-Mondragon JA, Jaeger S, Thieffry D, Thomas-Chollier M, van Helden J. RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections. Nucleic Acids Res 2017; 45:e119. [PMID: 28591841 PMCID: PMC5737723 DOI: 10.1093/nar/gkx314] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Accepted: 06/04/2017] [Indexed: 01/08/2023] Open
Abstract
Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant TFBM collections. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools, and highlights biologically relevant variations of similar motifs. We also ran a large-scale application to cluster ∼11 000 motifs from 24 entire databases, showing that matrix-clustering correctly groups motifs belonging to the same TF families, and drastically reduced motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines.
Collapse
Affiliation(s)
| | | | - Denis Thieffry
- IBENS, Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, PSL Research University, F-75005 Paris, France
| | - Morgane Thomas-Chollier
- IBENS, Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, PSL Research University, F-75005 Paris, France
| | - Jacques van Helden
- Aix Marseille Univ, INSERM, TAGC, Theory and Approaches of Genomic Complexity, UMR_S 1090, Marseille, France
| |
Collapse
|
17
|
Antipov SS, Tutukina MN, Preobrazhenskaya EV, Kondrashov FA, Patrushev MV, Toshchakov SV, Dominova I, Shvyreva US, Vrublevskaya VV, Morenkov OS, Sukharicheva NA, Panyukov VV, Ozoline ON. The nucleoid protein Dps binds genomic DNA of Escherichia coli in a non-random manner. PLoS One 2017; 12:e0182800. [PMID: 28800583 PMCID: PMC5553809 DOI: 10.1371/journal.pone.0182800] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Accepted: 07/25/2017] [Indexed: 11/18/2022] Open
Abstract
Dps is a multifunctional homododecameric protein that oxidizes Fe2+ ions accumulating them in the form of Fe2O3 within its protein cavity, interacts with DNA tightly condensing bacterial nucleoid upon starvation and performs some other functions. During the last two decades from discovery of this protein, its ferroxidase activity became rather well studied, but the mechanism of Dps interaction with DNA still remains enigmatic. The crucial role of lysine residues in the unstructured N-terminal tails led to the conventional point of view that Dps binds DNA without sequence or structural specificity. However, deletion of dps changed the profile of proteins in starved cells, SELEX screen revealed genomic regions preferentially bound in vitro and certain affinity of Dps for artificial branched molecules was detected by atomic force microscopy. Here we report a non-random distribution of Dps binding sites across the bacterial chromosome in exponentially growing cells and show their enrichment with inverted repeats prone to form secondary structures. We found that the Dps-bound regions overlap with sites occupied by other nucleoid proteins, and contain overrepresented motifs typical for their consensus sequences. Of the two types of genomic domains with extensive protein occupancy, which can be highly expressed or transcriptionally silent only those that are enriched with RNA polymerase molecules were preferentially occupied by Dps. In the dps-null mutant we, therefore, observed a differentially altered expression of several targeted genes and found suppressed transcription from the dps promoter. In most cases this can be explained by the relieved interference with Dps for nucleoid proteins exploiting sequence-specific modes of DNA binding. Thus, protecting bacterial cells from different stresses during exponential growth, Dps can modulate transcriptional integrity of the bacterial chromosome hampering RNA biosynthesis from some genes via competition with RNA polymerase or, vice versa, competing with inhibitors to activate transcription.
Collapse
Affiliation(s)
- S. S. Antipov
- Department of Functional Genomics and Cellular Stress, Institute of Cell Biophysics of Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation
- Department of Cell Biology, Pushchino State Institute of Natural Sciences, Pushchino, Moscow Region, Russian Federation
- Department of Biophysics and Biotechnology, Voronezh State University, Voronezh, Russian Federation
- Department of Genomics of Microorganisms, Immanuel Kant Baltic Federal University, Kaliningrad, Russian Federation
| | - M. N. Tutukina
- Department of Functional Genomics and Cellular Stress, Institute of Cell Biophysics of Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG) Barcelona, Spain
- Department of Evolutionary Genomics, Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Department of Structural and Functional Genomics,–Pushchino Research Center of the Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation
| | - E. V. Preobrazhenskaya
- Department of Functional Genomics and Cellular Stress, Institute of Cell Biophysics of Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation
| | - F. A. Kondrashov
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG) Barcelona, Spain
- Department of Evolutionary Genomics, Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), 23 Pg. Lluís Companys, Barcelona, Spain
| | - M. V. Patrushev
- Department of Genomics of Microorganisms, Immanuel Kant Baltic Federal University, Kaliningrad, Russian Federation
| | - S. V. Toshchakov
- Department of Genomics of Microorganisms, Immanuel Kant Baltic Federal University, Kaliningrad, Russian Federation
| | - I. Dominova
- Department of Genomics of Microorganisms, Immanuel Kant Baltic Federal University, Kaliningrad, Russian Federation
| | - U. S. Shvyreva
- Department of Functional Genomics and Cellular Stress, Institute of Cell Biophysics of Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation
| | - V. V. Vrublevskaya
- Department of Cell Culture and Cell Engeneering, Institute of Cell Biophysics of Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation
| | - O. S. Morenkov
- Department of Cell Culture and Cell Engeneering, Institute of Cell Biophysics of Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation
| | - N. A. Sukharicheva
- Department of Functional Genomics and Cellular Stress, Institute of Cell Biophysics of Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation
| | - V. V. Panyukov
- Department of Structural and Functional Genomics,–Pushchino Research Center of the Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation
- Department of Bioinformatics, Institute of Mathematical Problems of Biology—the Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation
| | - O. N. Ozoline
- Department of Functional Genomics and Cellular Stress, Institute of Cell Biophysics of Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation
- Department of Cell Biology, Pushchino State Institute of Natural Sciences, Pushchino, Moscow Region, Russian Federation
- Department of Structural and Functional Genomics,–Pushchino Research Center of the Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation
- * E-mail:
| |
Collapse
|
18
|
Detecting N 6-methyladenosine sites from RNA transcriptomes using ensemble Support Vector Machines. Sci Rep 2017; 7:40242. [PMID: 28079126 PMCID: PMC5227715 DOI: 10.1038/srep40242] [Citation(s) in RCA: 88] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2016] [Accepted: 12/05/2016] [Indexed: 12/22/2022] Open
Abstract
As one of the most abundant RNA post-transcriptional modifications, N6-methyladenosine (m6A) involves in a broad spectrum of biological and physiological processes ranging from mRNA splicing and stability to cell differentiation and reprogramming. However, experimental identification of m6A sites is expensive and laborious. Therefore, it is urgent to develop computational methods for reliable prediction of m6A sites from primary RNA sequences. In the current study, a new method called RAM-ESVM was developed for detecting m6A sites from Saccharomyces cerevisiae transcriptome, which employed ensemble support vector machine classifiers and novel sequence features. The jackknife test results show that RAM-ESVM outperforms single support vector machine classifiers and other existing methods, indicating that it would be a useful computational tool for detecting m6A sites in S. cerevisiae. Furthermore, a web server named RAM-ESVM was constructed and could be freely accessible at http://server.malab.cn/RAM-ESVM/.
Collapse
|
19
|
Fondi M, Bosi E, Presta L, Natoli D, Fani R. Modelling microbial metabolic rewiring during growth in a complex medium. BMC Genomics 2016; 17:970. [PMID: 27881075 PMCID: PMC5121958 DOI: 10.1186/s12864-016-3311-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2016] [Accepted: 11/17/2016] [Indexed: 11/21/2022] Open
Abstract
Background In their natural environment, bacteria face a wide range of environmental conditions that change over time and that impose continuous rearrangements at all the cellular levels (e.g. gene expression, metabolism). When facing a nutritionally rich environment, for example, microbes first use the preferred compound(s) and only later start metabolizing the other one(s). A systemic re-organization of the overall microbial metabolic network in response to a variation in the composition/concentration of the surrounding nutrients has been suggested, although the range and the entity of such modifications in organisms other than a few model microbes has been scarcely described up to now. Results We used multi-step constraint-based metabolic modelling to simulate the growth in a complex medium over several time steps of the Antarctic model organism Pseudoalteromonas haloplanktis TAC125. As each of these phases is characterized by a specific set of amino acids to be used as carbon and energy source our modelling framework describes the major consequences of nutrients switching at the system level. The model predicts that a deep metabolic reprogramming might be required to achieve optimal biomass production in different stages of growth (different medium composition), with at least half of the cellular metabolic network involved (more than 50% of the metabolic genes). Additionally, we show that our modelling framework is able to capture metabolic functional association and/or common regulatory features of the genes embedded in our reconstruction (e.g. the presence of common regulatory motifs). Finally, to explore the possibility of a sub-optimal biomass objective function (i.e. that cells use resources in alternative metabolic processes at the expense of optimal growth) we have implemented a MOMA-based approach (called nutritional-MOMA) and compared the outcomes with those obtained with Flux Balance Analysis (FBA). Growth simulations under this scenario revealed the deep impact of choosing among alternative objective functions on the resulting predictions of fluxes distribution. Conclusions Here we provide a time-resolved, systems-level scheme of PhTAC125 metabolic re-wiring as a consequence of carbon source switching in a nutritionally complex medium. Our analyses suggest the presence of a potential efficient metabolic reprogramming machinery to continuously and promptly adapt to this nutritionally changing environment, consistent with adaptation to fast growth in a fairly, but probably inconstant and highly competitive, environment. Also, we show i) how functional partnership and co-regulation features can be predicted by integrating multi-step constraint-based metabolic modelling with fed-batch growth data and ii) that performing simulations under a sub-optimal objective function may lead to different flux distributions in respect to canonical FBA. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3311-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Marco Fondi
- Department of Biology, University of Florence, Via Madonna del Piano 6, I-50019, Sesto F.no, Italy.
| | - Emanuele Bosi
- Department of Biology, University of Florence, Via Madonna del Piano 6, I-50019, Sesto F.no, Italy
| | - Luana Presta
- Department of Biology, University of Florence, Via Madonna del Piano 6, I-50019, Sesto F.no, Italy
| | - Diletta Natoli
- Department of Biology, University of Florence, Via Madonna del Piano 6, I-50019, Sesto F.no, Italy
| | - Renato Fani
- Department of Biology, University of Florence, Via Madonna del Piano 6, I-50019, Sesto F.no, Italy
| |
Collapse
|
20
|
Liu B, Zhang H, Zhou C, Li G, Fennell A, Wang G, Kang Y, Liu Q, Ma Q. An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes. BMC Genomics 2016; 17:578. [PMID: 27507169 PMCID: PMC4977642 DOI: 10.1186/s12864-016-2982-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Accepted: 07/29/2016] [Indexed: 11/10/2022] Open
Abstract
Background Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction. Results Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP3). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP3 consistently outperformed other popular motif finding tools. We have integrated MP3 into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes. Conclusion The performance evaluation indicated that MP3 is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance progress in elucidating transcription regulation mechanism, thus provide benefit to the genomic research community and prokaryotic genome researchers in particular. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2982-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, 250100, China
| | - Hanyuan Zhang
- Systems Biology and Biomedical Informatics (SBBI) Laboratory University of Nebraska-Lincoln, Lincoln, NE, 68588-0115, USA
| | - Chuan Zhou
- School of Mathematics, Shandong University, Jinan, 250100, China
| | - Guojun Li
- School of Mathematics, Shandong University, Jinan, 250100, China
| | - Anne Fennell
- Department of Agronomy, Horticulture, and Plant Science, South Dakota State University, Brookings, SD, 57007, USA.,BioSNTR, Brookings, SD, USA
| | - Guanghui Wang
- School of Mathematics, Shandong University, Jinan, 250100, China
| | - Yu Kang
- CAS Key Laboratory of Genome Sciences and information, Beijing Institute of Genomics of CAS, Beijing, 100101, People's Republic of China
| | - Qi Liu
- Department of Bioinformatics, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Qin Ma
- Department of Agronomy, Horticulture, and Plant Science, South Dakota State University, Brookings, SD, 57007, USA. .,BioSNTR, Brookings, SD, USA.
| |
Collapse
|
21
|
Ordóñez-Robles M, Rodríguez-García A, Martín JF. Target genes of the Streptomyces tsukubaensis FkbN regulator include most of the tacrolimus biosynthesis genes, a phosphopantetheinyl transferase and other PKS genes. Appl Microbiol Biotechnol 2016; 100:8091-103. [PMID: 27357227 DOI: 10.1007/s00253-016-7696-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Revised: 06/15/2016] [Accepted: 06/17/2016] [Indexed: 01/01/2023]
Abstract
Tacrolimus (FK506) is a 23-membered macrolide immunosuppressant used in current clinics. Understanding how the tacrolimus biosynthetic gene cluster is regulated is important to increase its industrial production. Here, we analysed the effect of the disruption of fkbN (encoding a LAL-type positive transcriptional regulator) on the whole transcriptome of the tacrolimus producer Streptomyces tsukubaensis using microarray technology. Transcription of fkbN in the wild type strain increases from 70 h of cultivation reaching a maximum at 89 h, prior to the onset of tacrolimus biosynthesis. Disruption of fkbN in S. tsukubaensis does not affect growth but prevents tacrolimus biosynthesis. Inactivation of fkbN reduces the transcription of most of the fkb cluster genes, including some all (for allylmalonyl-CoA biosynthesis) genes but does not affect expression of allMNPOS or fkbR (encoding a LysR-type regulator). Disruption of fkbN does not suppress transcription of the cistron tcs6-fkbQ-fkbN; thus, FkbN self-regulates only weakly its own expression. Interestingly, inactivation of FkbN downregulates the transcription of a 4'-phosphopantetheinyl transferase coding gene, which product is involved in tacrolimus biosynthesis, and upregulates the transcription of a gene cluster containing a cpkA orthologous gene, which encodes a PKS involved in coelimycin P1 biosynthesis in Streptomyces coelicolor. We propose an information theory-based model for FkbN binding sequences. The consensus FkbN binding sequence consists of 14 nucleotides with dyad symmetry containing two conserved inverted repeats of 7 nt each. This FkbN target sequence is present in the promoters of FkbN-regulated genes.
Collapse
Affiliation(s)
- María Ordóñez-Robles
- Área de Microbiología, Departamento de Biología Molecular, Facultad de Ciencias Biológicas y Ambientales, Universidad de León, León, 24071, Spain
- Instituto de Biotecnología de León, INBIOTEC, Avda. Real no. 1, León, 24006, Spain
| | - Antonio Rodríguez-García
- Área de Microbiología, Departamento de Biología Molecular, Facultad de Ciencias Biológicas y Ambientales, Universidad de León, León, 24071, Spain
- Instituto de Biotecnología de León, INBIOTEC, Avda. Real no. 1, León, 24006, Spain
| | - Juan F Martín
- Área de Microbiología, Departamento de Biología Molecular, Facultad de Ciencias Biológicas y Ambientales, Universidad de León, León, 24071, Spain.
| |
Collapse
|
22
|
Liu B, Zhou C, Li G, Zhang H, Zeng E, Liu Q, Ma Q. Bacterial regulon modeling and prediction based on systematic cis regulatory motif analyses. Sci Rep 2016; 6:23030. [PMID: 26975728 PMCID: PMC4792141 DOI: 10.1038/srep23030] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 02/22/2016] [Indexed: 12/18/2022] Open
Abstract
Regulons are the basic units of the response system in a bacterial cell, and each consists of a set of transcriptionally co-regulated operons. Regulon elucidation is the basis for studying the bacterial global transcriptional regulation network. In this study, we designed a novel co-regulation score between a pair of operons based on accurate operon identification and cis regulatory motif analyses, which can capture their co-regulation relationship much better than other scores. Taking full advantage of this discovery, we developed a new computational framework and built a novel graph model for regulon prediction. This model integrates the motif comparison and clustering and makes the regulon prediction problem substantially more solvable and accurate. To evaluate our prediction, a regulon coverage score was designed based on the documented regulons and their overlap with our prediction; and a modified Fisher Exact test was implemented to measure how well our predictions match the co-expressed modules derived from E. coli microarray gene-expression datasets collected under 466 conditions. The results indicate that our program consistently performed better than others in terms of the prediction accuracy. This suggests that our algorithms substantially improve the state-of-the-art, leading to a computational capability to reliably predict regulons for any bacteria.
Collapse
Affiliation(s)
- Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, China
| | - Chuan Zhou
- School of Mathematics, Shandong University, Jinan, Shandong, China
| | - Guojun Li
- School of Mathematics, Shandong University, Jinan, Shandong, China
| | - Hanyuan Zhang
- Systems Biology and Biomedical Informatics (SBBI) Laboratory University of Nebraska-Lincoln, Lincoln, NE 68588-0115, USA
| | - Erliang Zeng
- Department of Biology, University of South Dakota, Vermillion, SD 57069, USA.,Department of Computer Science, University of South Dakota, Vermillion, SD 57069, USA.,BioSNTR, Brookings, SD, USA
| | - Qi Liu
- Department of Bioinformatics, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Qin Ma
- Department of Plant Science, South Dakota State University, Brookings, SD, 57006, USA.,BioSNTR, Brookings, SD, USA
| |
Collapse
|
23
|
Kamps-Hughes N, Preston JL, Randel MA, Johnson EA. Genome-wide identification of hypoxia-induced enhancer regions. PeerJ 2015; 3:e1527. [PMID: 26713262 PMCID: PMC4690393 DOI: 10.7717/peerj.1527] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Accepted: 12/01/2015] [Indexed: 12/12/2022] Open
Abstract
Here we present a genome-wide method for de novo identification of enhancer regions. This approach enables massively parallel empirical investigation of DNA sequences that mediate transcriptional activation and provides a platform for discovery of regulatory modules capable of driving context-specific gene expression. The method links fragmented genomic DNA to the transcription of randomer molecule identifiers and measures the functional enhancer activity of the library by massively parallel sequencing. We transfected a Drosophila melanogaster library into S2 cells in normoxia and hypoxia, and assayed 4,599,881 genomic DNA fragments in parallel. The locations of the enhancer regions strongly correlate with genes up-regulated after hypoxia and previously described enhancers. Novel enhancer regions were identified and integrated with RNAseq data and transcription factor motifs to describe the hypoxic response on a genome-wide basis as a complex regulatory network involving multiple stress-response pathways. This work provides a novel method for high-throughput assay of enhancer activity and the genome-scale identification of 31 hypoxia-activated enhancers in Drosophila.
Collapse
Affiliation(s)
- Nick Kamps-Hughes
- Institute of Molecular Biology, University of Oregon , Eugene OR , United States
| | - Jessica L Preston
- Institute of Molecular Biology, University of Oregon , Eugene OR , United States
| | - Melissa A Randel
- Institute of Molecular Biology, University of Oregon , Eugene OR , United States
| | - Eric A Johnson
- Institute of Molecular Biology, University of Oregon , Eugene OR , United States
| |
Collapse
|
24
|
Sass AM, Van Acker H, Förstner KU, Van Nieuwerburgh F, Deforce D, Vogel J, Coenye T. Genome-wide transcription start site profiling in biofilm-grown Burkholderia cenocepacia J2315. BMC Genomics 2015; 16:775. [PMID: 26462475 PMCID: PMC4603805 DOI: 10.1186/s12864-015-1993-3] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2015] [Accepted: 10/06/2015] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Burkholderia cenocepacia is a soil-dwelling Gram-negative Betaproteobacterium with an important role as opportunistic pathogen in humans. Infections with B. cenocepacia are very difficult to treat due to their high intrinsic resistance to most antibiotics. Biofilm formation further adds to their antibiotic resistance. B. cenocepacia harbours a large, multi-replicon genome with a high GC-content, the reference genome of strain J2315 includes 7374 annotated genes. This study aims to annotate transcription start sites and identify novel transcripts on a whole genome scale. METHODS RNA extracted from B. cenocepacia J2315 biofilms was analysed by differential RNA-sequencing and the resulting dataset compared to data derived from conventional, global RNA-sequencing. Transcription start sites were annotated and further analysed according to their position relative to annotated genes. RESULTS Four thousand ten transcription start sites were mapped over the whole B. cenocepacia genome and the primary transcription start site of 2089 genes expressed in B. cenocepacia biofilms were defined. For 64 genes a start codon alternative to the annotated one was proposed. Substantial antisense transcription for 105 genes and two novel protein coding sequences were identified. The distribution of internal transcription start sites can be used to identify genomic islands in B. cenocepacia. A potassium pump strongly induced only under biofilm conditions was found and 15 non-coding small RNAs highly expressed in biofilms were discovered. CONCLUSIONS Mapping transcription start sites across the B. cenocepacia genome added relevant information to the J2315 annotation. Genes and novel regulatory RNAs putatively involved in B. cenocepacia biofilm formation were identified. These findings will help in understanding regulation of B. cenocepacia biofilm formation.
Collapse
Affiliation(s)
- Andrea M Sass
- Laboratory of Pharmaceutical Microbiology, Ghent University, Ottergemsesteenweg 460, 9000, Ghent, Belgium.
| | - Heleen Van Acker
- Laboratory of Pharmaceutical Microbiology, Ghent University, Ottergemsesteenweg 460, 9000, Ghent, Belgium.
| | - Konrad U Förstner
- Core Unit Systems Medicine, University of Würzburg, Würzburg, Germany.
| | | | - Dieter Deforce
- Laboratory of Pharmaceutical Biotechnology, Ghent University, Ghent, Belgium.
| | - Jörg Vogel
- Institute for Molecular Infection Biology, University of Würzburg, Würzburg, Germany.
| | - Tom Coenye
- Laboratory of Pharmaceutical Microbiology, Ghent University, Ottergemsesteenweg 460, 9000, Ghent, Belgium.
| |
Collapse
|
25
|
Chou WC, Ma Q, Yang S, Cao S, Klingeman DM, Brown SD, Xu Y. Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum. Nucleic Acids Res 2015; 43:e67. [PMID: 25765651 PMCID: PMC4446414 DOI: 10.1093/nar/gkv177] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Accepted: 02/22/2015] [Indexed: 12/31/2022] Open
Abstract
Identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets. Among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available at https://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.
Collapse
Affiliation(s)
- Wen-Chi Chou
- Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, GA 30602, USA BioEnergy Science Center, TN 37831, USA
| | - Qin Ma
- Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, GA 30602, USA BioEnergy Science Center, TN 37831, USA
| | - Shihui Yang
- BioEnergy Science Center, TN 37831, USA Biosciences Division, Oak Ridge National Laboratory, TN 37831, USA National Bioenergy Center, National Renewable Energy Laboratory, Golden, CO 80401, USA
| | - Sha Cao
- Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, GA 30602, USA
| | - Dawn M Klingeman
- BioEnergy Science Center, TN 37831, USA Biosciences Division, Oak Ridge National Laboratory, TN 37831, USA
| | - Steven D Brown
- BioEnergy Science Center, TN 37831, USA Biosciences Division, Oak Ridge National Laboratory, TN 37831, USA
| | - Ying Xu
- Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, GA 30602, USA BioEnergy Science Center, TN 37831, USA College of Computer Science and Technology and School of Public Health, Jilin University, Changchun, Jilin 130012, China
| |
Collapse
|
26
|
Zhou C, Ma Q, Li G. Elucidation of operon structures across closely related bacterial genomes. PLoS One 2014; 9:e100999. [PMID: 24959722 PMCID: PMC4069176 DOI: 10.1371/journal.pone.0100999] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 06/01/2014] [Indexed: 11/30/2022] Open
Abstract
About half of the protein-coding genes in prokaryotic genomes are organized into operons to facilitate co-regulation during transcription. With the evolution of genomes, operon structures are undergoing changes which could coordinate diverse gene expression patterns in response to various stimuli during the life cycle of a bacterial cell. Here we developed a graph-based model to elucidate the diversity of operon structures across a set of closely related bacterial genomes. In the constructed graph, each node represents one orthologous gene group (OGG) and a pair of nodes will be connected if any two genes, from the corresponding two OGGs respectively, are located in the same operon as immediate neighbors in any of the considered genomes. Through identifying the connected components in the above graph, we found that genes in a connected component are likely to be functionally related and these identified components tend to form treelike topology, such as paths and stars, corresponding to different biological mechanisms in transcriptional regulation as follows. Specifically, (i) a path-structure component integrates genes encoding a protein complex, such as ribosome; and (ii) a star-structure component not only groups related genes together, but also reflects the key functional roles of the central node of this component, such as the ABC transporter with a transporter permease and substrate-binding proteins surrounding it. Most interestingly, the genes from organisms with highly diverse living environments, i.e., biomass degraders and animal pathogens of clostridia in our study, can be clearly classified into different topological groups on some connected components.
Collapse
Affiliation(s)
- Chuan Zhou
- School of Mathematics, Shandong University, Jinan, China
| | - Qin Ma
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology and Institute of Bioinformatics, University of Georgia, Athens, Georgia, United States of America
| | - Guojun Li
- School of Mathematics, Shandong University, Jinan, China
| |
Collapse
|