1
|
Daniel Thomas S, Vijayakumar K, John L, Krishnan D, Rehman N, Revikumar A, Kandel Codi JA, Prasad TSK, S S V, Raju R. Machine Learning Strategies in MicroRNA Research: Bridging Genome to Phenome. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2024; 28:213-233. [PMID: 38752932 DOI: 10.1089/omi.2024.0047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2024]
Abstract
MicroRNAs (miRNAs) have emerged as a prominent layer of regulation of gene expression. This article offers the salient and current aspects of machine learning (ML) tools and approaches from genome to phenome in miRNA research. First, we underline that the complexity in the analysis of miRNA function ranges from their modes of biogenesis to the target diversity in diverse biological conditions. Therefore, it is imperative to first ascertain the miRNA coding potential of genomes and understand the regulatory mechanisms of their expression. This knowledge enables the efficient classification of miRNA precursors and the identification of their mature forms and respective target genes. Second, and because one miRNA can target multiple mRNAs and vice versa, another challenge is the assessment of the miRNA-mRNA target interaction network. Furthermore, long-noncoding RNA (lncRNA)and circular RNAs (circRNAs) also contribute to this complexity. ML has been used to tackle these challenges at the high-dimensional data level. The present expert review covers more than 100 tools adopting various ML approaches pertaining to, for example, (1) miRNA promoter prediction, (2) precursor classification, (3) mature miRNA prediction, (4) miRNA target prediction, (5) miRNA- lncRNA and miRNA-circRNA interactions, (6) miRNA-mRNA expression profiling, (7) miRNA regulatory module detection, (8) miRNA-disease association, and (9) miRNA essentiality prediction. Taken together, we unpack, critically examine, and highlight the cutting-edge synergy of ML approaches and miRNA research so as to develop a dynamic and microlevel understanding of human health and diseases.
Collapse
Affiliation(s)
- Sonet Daniel Thomas
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Krithika Vijayakumar
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Levin John
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Deepak Krishnan
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Niyas Rehman
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Amjesh Revikumar
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
- Kerala Genome Data Centre, Kerala Development and Innovation Strategic Council, Thiruvananthapuram, Kerala, India
| | - Jalaluddin Akbar Kandel Codi
- Department of Surgical Oncology, Yenepoya Medical College, Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | | | - Vinodchandra S S
- Department of Computer Science, University of Kerala, Thiruvananthapuram, Kerala, India
| | - Rajesh Raju
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| |
Collapse
|
2
|
Uemura K, Ohyama T. Physical Peculiarity of Two Sites in Human Promoters: Universality and Diverse Usage in Gene Function. Int J Mol Sci 2024; 25:1487. [PMID: 38338773 PMCID: PMC10855393 DOI: 10.3390/ijms25031487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 01/15/2024] [Accepted: 01/18/2024] [Indexed: 02/12/2024] Open
Abstract
Since the discovery of physical peculiarities around transcription start sites (TSSs) and a site corresponding to the TATA box, research has revealed only the average features of these sites. Unsettled enigmas include the individual genes with these features and whether they relate to gene function. Herein, using 10 physical properties of DNA, including duplex DNA free energy, base stacking energy, protein-induced deformability, and stabilizing energy of Z-DNA, we clarified for the first time that approximately 97% of the promoters of 21,056 human protein-coding genes have distinctive physical properties around the TSS and/or position -27; of these, nearly 65% exhibited such properties at both sites. Furthermore, about 55% of the 21,056 genes had a minimum value of regional duplex DNA free energy within TSS-centered ±300 bp regions. Notably, distinctive physical properties within the promoters and free energies of the surrounding regions separated human protein-coding genes into five groups; each contained specific gene ontology (GO) terms. The group represented by immune response genes differed distinctly from the other four regarding the parameter of the free energies of the surrounding regions. A vital suggestion from this study is that physical-feature-based analyses of genomes may reveal new aspects of the organization and regulation of genes.
Collapse
Affiliation(s)
- Kohei Uemura
- Major in Integrative Bioscience and Biomedical Engineering, Graduate School of Science and Engineering, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan;
| | - Takashi Ohyama
- Major in Integrative Bioscience and Biomedical Engineering, Graduate School of Science and Engineering, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan;
- Department of Biology, Faculty of Education and Integrated Arts and Sciences, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan
| |
Collapse
|
3
|
Singh S, Kiran M, Somvanshi PR. Computational Inference of Gene Regulatory Network Using Genome-wide ChIP-X Data. Methods Mol Biol 2024; 2719:295-306. [PMID: 37803124 DOI: 10.1007/978-1-0716-3461-5_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/08/2023]
Abstract
Gene regulatory network is the architecture of transcription factors (TFs) and their gene targets, which help in controlling their expression as required by a phenotype during various environmental perturbations. Inferring the regulatory network from the high-throughput data needs an algorithmic approach involving statistical analysis. There are several interaction databases such as JASPAR and SwissRegulon that provide information for TFs-targets pair interaction, which are estimated based on experimental and prediction procedures. These repositories are majorly used for predicting the complex structure of GRNs either with or without gene expression data. Here we described and discussed the step-wise procedures to extract the interaction data for a desired set of target-TFs from the JASPAR database, and used that information to infer the network by using the igraph library. Further, we also mentioned the important parameters for analyzing the different properties of the network. The described procedure will be helpful in discerning the GRN based on the set of TF-gene pairs.
Collapse
Affiliation(s)
- Samayaditya Singh
- Department of Systems and Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| | - Manjari Kiran
- Department of Systems and Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| | - Pramod R Somvanshi
- Department of Systems and Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| |
Collapse
|
4
|
Milito A, Aschern M, McQuillan JL, Yang JS. Challenges and advances towards the rational design of microalgal synthetic promoters in Chlamydomonas reinhardtii. JOURNAL OF EXPERIMENTAL BOTANY 2023; 74:3833-3850. [PMID: 37025006 DOI: 10.1093/jxb/erad100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 03/24/2023] [Indexed: 06/19/2023]
Abstract
Microalgae hold enormous potential to provide a safe and sustainable source of high-value compounds, acting as carbon-fixing biofactories that could help to mitigate rapidly progressing climate change. Bioengineering microalgal strains will be key to optimizing and modifying their metabolic outputs, and to render them competitive with established industrial biotechnology hosts, such as bacteria or yeast. To achieve this, precise and tuneable control over transgene expression will be essential, which would require the development and rational design of synthetic promoters as a key strategy. Among green microalgae, Chlamydomonas reinhardtii represents the reference species for bioengineering and synthetic biology; however, the repertoire of functional synthetic promoters for this species, and for microalgae generally, is limited in comparison to other commercial chassis, emphasizing the need to expand the current microalgal gene expression toolbox. Here, we discuss state-of-the-art promoter analyses, and highlight areas of research required to advance synthetic promoter development in C. reinhardtii. In particular, we exemplify high-throughput studies performed in other model systems that could be applicable to microalgae, and propose novel approaches to interrogating algal promoters. We lastly outline the major limitations hindering microalgal promoter development, while providing novel suggestions and perspectives for how to overcome them.
Collapse
Affiliation(s)
- Alfonsina Milito
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus UAB, Bellaterra, Barcelona, Spain
| | - Moritz Aschern
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus UAB, Bellaterra, Barcelona, Spain
| | - Josie L McQuillan
- Department of Chemical and Biological Engineering, University of Sheffield, Mappin Street, Sheffield, S1 3JD, UK
| | - Jae-Seong Yang
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus UAB, Bellaterra, Barcelona, Spain
| |
Collapse
|
5
|
Sharma D, Sharma K, Mishra A, Siwach P, Mittal A, Jayaram B. Molecular dynamics simulation-based trinucleotide and tetranucleotide level structural and energy characterization of the functional units of genomic DNA. Phys Chem Chem Phys 2023; 25:7323-7337. [PMID: 36825435 DOI: 10.1039/d2cp04820e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Genomes of most organisms on earth are written in a universal language of life, made up of four units - adenine (A), thymine (T), guanine (G), and cytosine (C), and understanding the way they are put together has been a great challenge to date. Multiple efforts have been made to annotate this wonderfully engineered string of DNA using different methods but they lack a universal character. In this article, we have investigated the structural and energetic profiles of both prokaryotes and eukaryotes by considering two essential genomic sites, viz., the transcription start sites (TSS) and exon-intron boundaries. We have characterized these sites by mapping the structural and energy features of DNA obtained from molecular dynamics simulations, which considers all possible trinucleotide and tetranucleotide steps. For DNA, these physicochemical properties show distinct signatures at the TSS and intron-exon boundaries. Our results firmly convey the idea that DNA uses the same dialect for prokaryotes and eukaryotes and that it is worth going beyond sequence-level analyses to physicochemical space to determine the functional destiny of DNA sequences.
Collapse
Affiliation(s)
- Dinesh Sharma
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Kopal Sharma
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Priyanka Siwach
- Department of Biotechnology, Chaudhary Devi Lal University, Sirsa, Haryana, India
| | - Aditya Mittal
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - B Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India.,Department of Chemistry, Indian Institute of Technology, Delhi, India.
| |
Collapse
|
6
|
Zhang T, Tang Q, Nie F, Zhao Q, Chen W. DeepLncPro: an interpretable convolutional neural network model for identifying long non-coding RNA promoters. Brief Bioinform 2022; 23:6754194. [PMID: 36209437 DOI: 10.1093/bib/bbac447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 09/14/2022] [Accepted: 09/17/2022] [Indexed: 12/14/2022] Open
Abstract
Long non-coding RNA (lncRNA) plays important roles in a series of biological processes. The transcription of lncRNA is regulated by its promoter. Hence, accurate identification of lncRNA promoter will be helpful to understand its regulatory mechanisms. Since experimental techniques remain time consuming for gnome-wide promoter identification, developing computational tools to identify promoters are necessary. However, only few computational methods have been proposed for lncRNA promoter prediction and their performances still have room to be improved. In the present work, a convolutional neural network based model, called DeepLncPro, was proposed to identify lncRNA promoters in human and mouse. Comparative results demonstrated that DeepLncPro was superior to both state-of-the-art machine learning methods and existing models for identifying lncRNA promoters. Furthermore, DeepLncPro has the ability to extract and analyze transcription factor binding motifs from lncRNAs, which made it become an interpretable model. These results indicate that the DeepLncPro can server as a powerful tool for identifying lncRNA promoters. An open-source tool for DeepLncPro was provided at https://github.com/zhangtian-yang/DeepLncPro.
Collapse
Affiliation(s)
- Tianyang Zhang
- School of Life Sciences, North China University of Science and Technology
| | - Qiang Tang
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine
| | - Fulei Nie
- School of Life Sciences, North China University of Science and Technology
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine
| |
Collapse
|
7
|
Evolutionary Invariant of the Structure of DNA Double Helix in RNAP II Core Promoters. Int J Mol Sci 2022; 23:ijms231810873. [PMID: 36142782 PMCID: PMC9504043 DOI: 10.3390/ijms231810873] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 09/07/2022] [Accepted: 09/13/2022] [Indexed: 11/16/2022] Open
Abstract
Eukaryotic and archaeal RNA polymerase II (POL II) machinery is highly conserved, regardless of the extreme changes in promoter sequences in different organisms. The goal of our work is to find the cause of this conservatism. The representative sets of aligned promoter sequences of fifteen organisms belonging to different evolutional stages were studied. Their textual profiles, as well as profiles of the indexes that characterize the secondary structure and the mechanical and physicochemical properties, were analyzed. The evolutionarily stable, extremely heterogeneous special secondary structure of POL II core promoters was revealed, which includes two singular regions—hexanucleotide “INR” around TSS and octanucleotide “TATA element” of about −28 bp upstream. Such structures may have developed at some stage of evolution. It turned out to be so well matched for the pre-initiation complex formation and the subsequent initiation of transcription for POL II machinery that in the course of evolution there were selected only those nucleotide sequences that were able to reproduce these structural properties. The individual features of specific sequences representing the singular region of the promoter of each gene can affect the kinetics of DNA-protein complex formation and facilitate strand separation in double-stranded DNA at the TSS position.
Collapse
|
8
|
Wang M, Li F, Wu H, Liu Q, Li S. PredPromoter-MF(2L): A Novel Approach of Promoter Prediction Based on Multi-source Feature Fusion and Deep Forest. Interdiscip Sci 2022; 14:697-711. [PMID: 35488998 DOI: 10.1007/s12539-022-00520-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 04/05/2022] [Accepted: 04/05/2022] [Indexed: 12/12/2022]
Abstract
Promoters short DNA sequences play vital roles in initiating gene transcription. However, it remains a challenge to identify promoters using conventional experiment techniques in a high-throughput manner. To this end, several computational predictors based on machine learning models have been developed, while their performance is unsatisfactory. In this study, we proposed a novel two-layer predictor, called PredPromoter-MF(2L), based on multi-source feature fusion and ensemble learning. PredPromoter-MF(2L) was developed based on various deep features learned by a pre-trained deep learning network model and sequence-derived features. Feature selection based on XGBoost was applied to reduce fused features dimensions, and a cascade deep forest model was trained on the selected feature subset for promoter prediction. The results both fivefold cross-validation and independent test demonstrated that PredPromoter-MF(2L) outperformed state-of-the-art methods.
Collapse
Affiliation(s)
- Miao Wang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shanxi, China
| | - Fuyi Li
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, VIC, 3000, Australia
| | - Hao Wu
- School of Software, Shandong University, Jinan, 250100, Shandong, China
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shanxi, China.
| | - Shuqin Li
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shanxi, China.
| |
Collapse
|
9
|
Long K, Su D, Li X, Li H, Zeng S, Zhang Y, Zhong Z, Lin Y, Li X, Lu L, Jin L, Ma J, Tang Q, Li M. Identification of enhancers responsible for the coordinated expression of myosin heavy chain isoforms in skeletal muscle. BMC Genomics 2022; 23:519. [PMID: 35842589 PMCID: PMC9288694 DOI: 10.1186/s12864-022-08737-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 07/04/2022] [Indexed: 11/19/2022] Open
Abstract
Background Skeletal muscles consist of fibers of differing contractility and metabolic properties, which are primarily determined by the content of myosin heavy chain (MYH) isoforms (MYH7, MYH2, MYH1, and MYH4). The regulation of Myh genes transcription depends on three-dimensional chromatin conformation interaction, but the mechanistic details remain to be determined. Results In this study, we characterized the interaction profiles of Myh genes using 4C-seq (circular chromosome conformation capture coupled to high-throughput sequencing). The interaction profile of Myh genes changed between fast quadriceps and slow soleus muscles. Combining chromatin immunoprecipitation-sequencing (ChIP-seq) and transposase accessible chromatin with high-throughput sequencing (ATAC-seq), we found that a 38 kb intergenic region interacting simultaneously with fast Myh genes promoters controlled the coordinated expression of fast Myh genes. We also identified four active enhancers of Myh7, and revealed that binding of MYOG and MYOD increased the activity of Myh7 enhancers. Conclusions This study provides new insight into the chromatin interactions that regulate Myh genes expression. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08737-9.
Collapse
Affiliation(s)
- Keren Long
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China.
| | - Duo Su
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Xiaokai Li
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Hengkuan Li
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Sha Zeng
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Yu Zhang
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Zhining Zhong
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Yu Lin
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Xuemin Li
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Lu Lu
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Long Jin
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Jideng Ma
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Qianzi Tang
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Mingzhou Li
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China.
| |
Collapse
|
10
|
Wang Y, Peng Q, Mou X, Wang X, Li H, Han T, Sun Z, Wang X. A successful hybrid deep learning model aiming at promoter identification. BMC Bioinformatics 2022; 23:206. [PMID: 35641900 PMCID: PMC9158169 DOI: 10.1186/s12859-022-04735-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 05/16/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The zone adjacent to a transcription start site (TSS), namely, the promoter, is primarily involved in the process of DNA transcription initiation and regulation. As a result, proper promoter identification is critical for further understanding the mechanism of the networks controlling genomic regulation. A number of methodologies for the identification of promoters have been proposed. Nonetheless, due to the great heterogeneity existing in promoters, the results of these procedures are still unsatisfactory. In order to establish additional discriminative characteristics and properly recognize promoters, we developed the hybrid model for promoter identification (HMPI), a hybrid deep learning model that can characterize both the native sequences of promoters and the morphological outline of promoters at the same time. We developed the HMPI to combine a method called the PSFN (promoter sequence features network), which characterizes native promoter sequences and deduces sequence features, with a technique referred to as the DSPN (deep structural profiles network), which is specially structured to model the promoters in terms of their structural profile and to deduce their structural attributes. RESULTS The HMPI was applied to human, plant and Escherichia coli K-12 strain datasets, and the findings showed that the HMPI was successful at extracting the features of the promoter while greatly enhancing the promoter identification performance. In addition, after the improvements of synthetic sampling, transfer learning and label smoothing regularization, the improved HMPI models achieved good results in identifying subtypes of promoters on prokaryotic promoter datasets. CONCLUSIONS The results showed that the HMPI was successful at extracting the features of promoters while greatly enhancing the performance of identifying promoters on both eukaryotic and prokaryotic datasets, and the improved HMPI models are good at identifying subtypes of promoters on prokaryotic promoter datasets. The HMPI is additionally adaptable to different biological functional sequences, allowing for the addition of new features or models.
Collapse
Affiliation(s)
- Ying Wang
- Systems Engineering Institute, Xi'an Jiaotong University, Xi'an, China
| | - Qinke Peng
- Systems Engineering Institute, Xi'an Jiaotong University, Xi'an, China.
| | - Xu Mou
- Systems Engineering Institute, Xi'an Jiaotong University, Xi'an, China
| | - Xinyuan Wang
- Systems Engineering Institute, Xi'an Jiaotong University, Xi'an, China
| | - Haozhou Li
- Systems Engineering Institute, Xi'an Jiaotong University, Xi'an, China
| | - Tian Han
- Systems Engineering Institute, Xi'an Jiaotong University, Xi'an, China
| | - Zhao Sun
- Systems Engineering Institute, Xi'an Jiaotong University, Xi'an, China
| | - Xiao Wang
- Systems Engineering Institute, Xi'an Jiaotong University, Xi'an, China
| |
Collapse
|
11
|
Sequence-based evaluation of promoter context for prediction of transcription start sites in Arabidopsis and rice. Sci Rep 2022; 12:6976. [PMID: 35484393 PMCID: PMC9050755 DOI: 10.1038/s41598-022-11169-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 04/20/2022] [Indexed: 11/20/2022] Open
Abstract
Genes are transcribed from transcription start sites (TSSs), and their position in a genome is strictly controlled to avoid mis-expression of undesired regions. In this study, we designed and developed a methodology for the evaluation of promoter context, which detects proximal promoter regions from − 200 to − 60 bp relative to a TSS, in Arabidopsis and rice genomes. The method positively evaluates spacer sequences and Regulatory Element Groups, but not core promoter elements like TATA boxes, and is able to predict the position of a TSS within a width of 200 bp. An important feature of the evaluation/prediction method is its independence of the core promoter elements, which was demonstrated by successful prediction of all the TATA, GA, and coreless types of promoters without notable differences in the accuracy of prediction. The positive relationship identified between the evaluation scores and gene expression levels suggests that this method is useful for the evaluation of promoter maturity.
Collapse
|
12
|
Leveraging omic features with F3UTER enables identification of unannotated 3'UTRs for synaptic genes. Nat Commun 2022; 13:2270. [PMID: 35477703 PMCID: PMC9046390 DOI: 10.1038/s41467-022-30017-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 03/18/2022] [Indexed: 11/08/2022] Open
Abstract
There is growing evidence for the importance of 3' untranslated region (3'UTR) dependent regulatory processes. However, our current human 3'UTR catalogue is incomplete. Here, we develop a machine learning-based framework, leveraging both genomic and tissue-specific transcriptomic features to predict previously unannotated 3'UTRs. We identify unannotated 3'UTRs associated with 1,563 genes across 39 human tissues, with the greatest abundance found in the brain. These unannotated 3'UTRs are significantly enriched for RNA binding protein (RBP) motifs and exhibit high human lineage-specificity. We find that brain-specific unannotated 3'UTRs are enriched for the binding motifs of important neuronal RBPs such as TARDBP and RBFOX1, and their associated genes are involved in synaptic function. Our data is shared through an online resource F3UTER ( https://astx.shinyapps.io/F3UTER/ ). Overall, our data improves 3'UTR annotation and provides additional insights into the mRNA-RBP interactome in the human brain, with implications for our understanding of neurological and neurodevelopmental diseases.
Collapse
|
13
|
Vanaja A, Yella VR. Delineation of the DNA Structural Features of Eukaryotic Core Promoter Classes. ACS OMEGA 2022; 7:5657-5669. [PMID: 35224327 PMCID: PMC8867553 DOI: 10.1021/acsomega.1c04603] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 01/27/2022] [Indexed: 05/02/2023]
Abstract
The eukaryotic transcription is orchestrated from a chunk of the DNA region stated as the core promoter. Multifarious and punctilious core promoter signals, viz., TATA-box, Inr, BREs, and Pause Button, are associated with a subset of genes and regulate their spatiotemporal expression. However, the core promoter architecture linked with these signals has not been investigated exhaustively for several species. In this study, we attempted to envisage the adaptive binding landscape of the transcription initiation machinery as a function of DNA structure. To this end, we deployed a set of k-mer based DNA structural estimates and regular expression models derived from experiments, molecular dynamic simulations, and theoretical frameworks, and high-throughout promoter data sets retrieved from the eukaryotic promoter database. We categorized protein-coding gene core promoters based on characteristic motifs at precise locations and analyzed the B-DNA structural properties and non-B-DNA structural motifs for 15 different eukaryotic genomes. We observed that Inr, BREd, and no-motif classes display common patterns of DNA sequence and structural environment. TATA-containing, BREu, and Pause Button classes show a deviant behavior with the TATA class displaying varied axial and twisting flexibility while BREu and Pause Button leaned toward G-quadruplex motif enrichment. Intriguingly, DNA meltability and shape signals are conserved irrespective of the presence or absence of distinct core promoter motifs in the majority of species. Altogether, here we delineated the conserved DNA structural signals associated with several promoter classes that may contribute to the chromatin configuration, orchestration of transcription machinery, and DNA duplex melting during the transcription process.
Collapse
Affiliation(s)
- Akkinepally Vanaja
- Department
of Biotechnology, Koneru Lakshmaiah Education
Foundation, Vaddeswaram, Guntur 522502, Andhra
Pradesh, India
- KL
College of Pharmacy, Koneru Lakshmaiah Education
Foundation, Vaddeswaram, Guntur 522502, Andhra
Pradesh, India
| | - Venkata Rajesh Yella
- Department
of Biotechnology, Koneru Lakshmaiah Education
Foundation, Vaddeswaram, Guntur 522502, Andhra
Pradesh, India
- . Tel: +91-863-2399999, Extn-1021. Website: https://www.kluniversity.in/bt/faculty-list.aspx
| |
Collapse
|
14
|
Zhang M, Jia C, Li F, Li C, Zhu Y, Akutsu T, Webb GI, Zou Q, Coin LJM, Song J. Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction. Brief Bioinform 2022; 23:6502561. [PMID: 35021193 PMCID: PMC8921625 DOI: 10.1093/bib/bbab551] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 11/12/2021] [Accepted: 11/30/2021] [Indexed: 01/13/2023] Open
Abstract
Promoters are crucial regulatory DNA regions for gene transcriptional activation. Rapid advances in next-generation sequencing technologies have accelerated the accumulation of genome sequences, providing increased training data to inform computational approaches for both prokaryotic and eukaryotic promoter prediction. However, it remains a significant challenge to accurately identify species-specific promoter sequences using computational approaches. To advance computational support for promoter prediction, in this study, we curated 58 comprehensive, up-to-date, benchmark datasets for 7 different species (i.e. Escherichia coli, Bacillus subtilis, Homo sapiens, Mus musculus, Arabidopsis thaliana, Zea mays and Drosophila melanogaster) to assist the research community to assess the relative functionality of alternative approaches and support future research on both prokaryotic and eukaryotic promoters. We revisited 106 predictors published since 2000 for promoter identification (40 for prokaryotic promoter, 61 for eukaryotic promoter, and 5 for both). We systematically evaluated their training datasets, computational methodologies, calculated features, performance and software usability. On the basis of these benchmark datasets, we benchmarked 19 predictors with functioning webservers/local tools and assessed their prediction performance. We found that deep learning and traditional machine learning-based approaches generally outperformed scoring function-based approaches. Taken together, the curated benchmark dataset repository and the benchmarking analysis in this study serve to inform the design and implementation of computational approaches for promoter prediction and facilitate more rigorous comparison of new techniques in the future.
Collapse
Affiliation(s)
| | - Cangzhi Jia
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | | | | | | | | | - Geoffrey I Webb
- Department of Data Science and Artificial Intelligence, Monash University, Melbourne, VIC 3800, Australia,Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Quan Zou
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | - Lachlan J M Coin
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | - Jiangning Song
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| |
Collapse
|
15
|
Miskey C, Kesselring L, Querques I, Abrusán G, Barabas O, Ivics Z. OUP accepted manuscript. Nucleic Acids Res 2022; 50:2807-2825. [PMID: 35188569 PMCID: PMC8934666 DOI: 10.1093/nar/gkac092] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 01/24/2022] [Accepted: 02/08/2022] [Indexed: 11/14/2022] Open
Abstract
The Sleeping Beauty (SB) transposon system is a popular tool for genome engineering, but random integration into the genome carries a certain genotoxic risk in therapeutic applications. Here we investigate the role of amino acids H187, P247 and K248 in target site selection of the SB transposase. Structural modeling implicates these three amino acids located in positions analogous to amino acids with established functions in target site selection in retroviral integrases and transposases. Saturation mutagenesis of these residues in the SB transposase yielded variants with altered target site selection properties. Transposon integration profiling of several mutants reveals increased specificity of integrations into palindromic AT repeat target sequences in genomic regions characterized by high DNA bendability. The H187V and K248R mutants redirect integrations away from exons, transcriptional regulatory elements and nucleosomal DNA in the human genome, suggesting enhanced safety and thus utility of these SB variants in gene therapy applications.
Collapse
Affiliation(s)
| | | | - Irma Querques
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
- Department of Biochemistry, University of Zurich, Zurich 8057, Switzerland
| | - György Abrusán
- Institute of Biochemistry, Biological Research Center of the Hungarian Academy of Sciences, Szeged 6726, Hungary
| | - Orsolya Barabas
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
- Department of Molecular Biology, University of Geneva, Geneva 1211, Switzerland
| | - Zoltán Ivics
- To whom correspondence should be addressed. Tel: +49 6103 77 6000; Fax: +49 6103 77 1280;
| |
Collapse
|
16
|
Umarov R, Li Y, Arakawa T, Takizawa S, Gao X, Arner E. ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation. PLoS Comput Biol 2021; 17:e1009376. [PMID: 34491989 PMCID: PMC8448322 DOI: 10.1371/journal.pcbi.1009376] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 09/17/2021] [Accepted: 08/23/2021] [Indexed: 11/19/2022] Open
Abstract
Regulatory elements control gene expression through transcription initiation (promoters) and by enhancing transcription at distant regions (enhancers). Accurate identification of regulatory elements is fundamental for annotating genomes and understanding gene expression patterns. While there are many attempts to develop computational promoter and enhancer identification methods, reliable tools to analyze long genomic sequences are still lacking. Prediction methods often perform poorly on the genome-wide scale because the number of negatives is much higher than that in the training sets. To address this issue, we propose a dynamic negative set updating scheme with a two-model approach, using one model for scanning the genome and the other one for testing candidate positions. The developed method achieves good genome-level performance and maintains robust performance when applied to other vertebrate species, without re-training. Moreover, the unannotated predicted regulatory regions made on the human genome are enriched for disease-associated variants, suggesting them to be potentially true regulatory elements rather than false positives. We validated high scoring "false positive" predictions using reporter assay and all tested candidates were successfully validated, demonstrating the ability of our method to discover novel human regulatory regions.
Collapse
Affiliation(s)
- Ramzan Umarov
- Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima, Japan
- * E-mail: (RU); (XG); (EA)
| | - Yu Li
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Hong Kong, People’s Republic of China
| | - Takahiro Arakawa
- Laboratory for Applied Regulatory Genomics Network Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| | - Satoshi Takizawa
- Laboratory for Applied Regulatory Genomics Network Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| | - Xin Gao
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, Thuwal, Saudi Arabia
- * E-mail: (RU); (XG); (EA)
| | - Erik Arner
- Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima, Japan
- Laboratory for Applied Regulatory Genomics Network Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- * E-mail: (RU); (XG); (EA)
| |
Collapse
|
17
|
Perdikopanis N, Georgakilas GK, Grigoriadis D, Pierros V, Kavakiotis I, Alexiou P, Hatzigeorgiou A. DIANA-miRGen v4: indexing promoters and regulators for more than 1500 microRNAs. Nucleic Acids Res 2021; 49:D151-D159. [PMID: 33245765 PMCID: PMC7778932 DOI: 10.1093/nar/gkaa1060] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/16/2020] [Accepted: 11/26/2020] [Indexed: 02/06/2023] Open
Abstract
Deregulation of microRNA (miRNA) expression plays a critical role in the transition from a physiological to a pathological state. The accurate miRNA promoter identification in multiple cell types is a fundamental endeavor towards understanding and characterizing the underlying mechanisms of both physiological as well as pathological conditions. DIANA-miRGen v4 (www.microrna.gr/mirgenv4) provides cell type specific miRNA transcription start sites (TSSs) for over 1500 miRNAs retrieved from the analysis of >1000 cap analysis of gene expression (CAGE) samples corresponding to 133 tissues, cell lines and primary cells available in FANTOM repository. MiRNA TSS locations were associated with transcription factor binding site (TFBSs) annotation, for >280 TFs, derived from analyzing the majority of ENCODE ChIP-Seq datasets. For the first time, clusters of cell types having common miRNA TSSs are characterized and provided through a user friendly interface with multiple layers of customization. DIANA-miRGen v4 significantly improves our understanding of miRNA biogenesis regulation at the transcriptional level by providing a unique integration of high-quality annotations for hundreds of cell specific miRNA promoters with experimentally derived TFBSs.
Collapse
Affiliation(s)
- Nikos Perdikopanis
- Hellenic Pasteur Institute, Athens 11521, Greece.,Department of Electrical and Computer Engineering, University of Thessaly, Volos 38221, Greece.,Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens 15784, Greece
| | - Georgios K Georgakilas
- Central European Institute of Technology, Masaryk University, Kamenice 735/5, 62500 Brno, Czech Republic
| | - Dimitris Grigoriadis
- Hellenic Pasteur Institute, Athens 11521, Greece.,Department of Computer Science and Biomedical Informatics, University of Thessaly, Greece
| | - Vasilis Pierros
- Hellenic Pasteur Institute, Athens 11521, Greece.,Department of Electrical and Computer Engineering, University of Thessaly, Volos 38221, Greece
| | - Ioannis Kavakiotis
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Greece
| | - Panagiotis Alexiou
- Central European Institute of Technology, Masaryk University, Kamenice 735/5, 62500 Brno, Czech Republic
| | - Artemis Hatzigeorgiou
- Hellenic Pasteur Institute, Athens 11521, Greece.,Department of Electrical and Computer Engineering, University of Thessaly, Volos 38221, Greece.,Department of Computer Science and Biomedical Informatics, University of Thessaly, Greece
| |
Collapse
|
18
|
Zheng Q, Chen T, Zhou W, Xie L, Su H. Gene prediction by the noise-assisted MEMD and wavelet transform for identifying the protein coding regions. Biocybern Biomed Eng 2021. [DOI: 10.1016/j.bbe.2020.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
19
|
Zhu Y, Li F, Xiang D, Akutsu T, Song J, Jia C. Computational identification of eukaryotic promoters based on cascaded deep capsule neural networks. Brief Bioinform 2020; 22:5998831. [PMID: 33227813 DOI: 10.1093/bib/bbaa299] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 10/01/2020] [Accepted: 10/07/2020] [Indexed: 12/26/2022] Open
Abstract
A promoter is a region in the DNA sequence that defines where the transcription of a gene by RNA polymerase initiates, which is typically located proximal to the transcription start site (TSS). How to correctly identify the gene TSS and the core promoter is essential for our understanding of the transcriptional regulation of genes. As a complement to conventional experimental methods, computational techniques with easy-to-use platforms as essential bioinformatics tools can be effectively applied to annotate the functions and physiological roles of promoters. In this work, we propose a deep learning-based method termed Depicter (Deep learning for predicting promoter), for identifying three specific types of promoters, i.e. promoter sequences with the TATA-box (TATA model), promoter sequences without the TATA-box (non-TATA model), and indistinguishable promoters (TATA and non-TATA model). Depicter is developed based on an up-to-date, species-specific dataset which includes Homo sapiens, Mus musculus, Drosophila melanogaster and Arabidopsis thaliana promoters. A convolutional neural network coupled with capsule layers is proposed to train and optimize the prediction model of Depicter. Extensive benchmarking and independent tests demonstrate that Depicter achieves an improved predictive performance compared with several state-of-the-art methods. The webserver of Depicter is implemented and freely accessible at https://depicter.erc.monash.edu/.
Collapse
Affiliation(s)
- Yan Zhu
- School of Science, Dalian Maritime University, China
| | - Fuyi Li
- Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Australia
| | | | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | - Cangzhi Jia
- College of Science, Dalian Maritime University
| |
Collapse
|
20
|
Zhang WY, Xu J, Wang J, Zhou YK, Chen W, Du PF. KNIndex: a comprehensive database of physicochemical properties for k-tuple nucleotides. Brief Bioinform 2020; 22:5956158. [PMID: 33147622 DOI: 10.1093/bib/bbaa284] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Revised: 09/17/2020] [Accepted: 09/26/2020] [Indexed: 01/12/2023] Open
Abstract
With the development of high-throughput sequencing technology, the genomic sequences increased exponentially over the last decade. In order to decode these new genomic data, machine learning methods were introduced for genome annotation and analysis. Due to the requirement of most machines learning methods, the biological sequences must be represented as fixed-length digital vectors. In this representation procedure, the physicochemical properties of k-tuple nucleotides are important information. However, the values of the physicochemical properties of k-tuple nucleotides are scattered in different resources. To facilitate the studies on genomic sequences, we developed the first comprehensive database, namely KNIndex (https://knindex.pufengdu.org), for depositing and visualizing physicochemical properties of k-tuple nucleotides. Currently, the KNIndex database contains 182 properties including one for mononucleotide (DNA), 169 for dinucleotide (147 for DNA and 22 for RNA) and 12 for trinucleotide (DNA). KNIndex database also provides a user-friendly web-based interface for the users to browse, query, visualize and download the physicochemical properties of k-tuple nucleotides. With the built-in conversion and visualization functions, users are allowed to display DNA/RNA sequences as curves of multiple physicochemical properties. We wish that the KNIndex will facilitate the related studies in computational biology.
Collapse
Affiliation(s)
- Wen-Ya Zhang
- College of Intelligence and Computing, Tianjin University
| | - Junhai Xu
- College of Intelligence and Computing, Tianjin University
| | - Jun Wang
- College of Intelligence and Computing, Tianjin University
| | - Yuan-Ke Zhou
- College of Intelligence and Computing, Tianjin University
| | - Wei Chen
- School of Life Sciences, North China University of Science and Technology
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University
| |
Collapse
|
21
|
Zrimec J. Multiple plasmid origin-of-transfer regions might aid the spread of antimicrobial resistance to human pathogens. Microbiologyopen 2020; 9:e1129. [PMID: 33111499 PMCID: PMC7755788 DOI: 10.1002/mbo3.1129] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 09/21/2020] [Accepted: 09/21/2020] [Indexed: 12/12/2022] Open
Abstract
Antimicrobial resistance poses a great danger to humanity, in part due to the widespread horizontal gene transfer of plasmids via conjugation. Modeling of plasmid transfer is essential to uncovering the fundamentals of resistance transfer and for the development of predictive measures to limit the spread of resistance. However, a major limitation in the current understanding of plasmids is the incomplete characterization of the conjugative DNA transfer mechanisms, which conceals the actual potential for plasmid transfer in nature. Here, we consider that the plasmid-borne origin-of-transfer substrates encode specific DNA structural properties that can facilitate finding these regions in large datasets and develop a DNA structure-based alignment procedure for typing the transfer substrates that outperforms sequence-based approaches. Thousands of putative DNA transfer substrates are identified, showing that plasmid mobility can be twofold higher and span almost twofold more host species than is currently known. Over half of all putative mobile plasmids contain the means for mobilization by conjugation systems belonging to different mobility groups, which can hypothetically link previously confined host ranges across ecological habitats into a robust plasmid transfer network. This hypothetical network is found to facilitate the transfer of antimicrobial resistance from environmental genetic reservoirs to human pathogens, which might be an important driver of the observed rapid resistance development in humans and thus an important point of focus for future prevention measures.
Collapse
Affiliation(s)
- Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| |
Collapse
|
22
|
Mishra A, Dhanda S, Siwach P, Aggarwal S, Jayaram B. A novel method SEProm for prokaryotic promoter prediction based on DNA structure and energetics. Bioinformatics 2020; 36:2375-2384. [PMID: 31909789 DOI: 10.1093/bioinformatics/btz941] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 11/08/2019] [Accepted: 01/02/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Despite conservation in general architecture of promoters and protein-DNA interaction interface of RNA polymerases among various prokaryotes, identification of promoter regions in the whole genome sequences remains a daunting challenge. The available tools for promoter prediction do not seem to address the problem satisfactorily, apparently because the biochemical nature of promoter signals is yet to be understood fully. Using 28 structural and 3 energetic parameters, we found that prokaryotic promoter regions have a unique structural and energy state, quite distinct from that of coding regions and the information for this signature state is in-built in their sequences. We developed a novel promoter prediction tool from these 31 parameters using various statistical techniques. RESULTS Here, we introduce SEProm, a novel tool that is developed by studying and utilizing the in-built structural and energy information of DNA sequences, which is applicable to all prokaryotes including archaea. Compared to five most recent, diverged and current best available tools, SEProm performs much better, predicting promoters with an 'F-value' of 82.04 and 'Precision' of 81.08. The next best 'F-value' was obtained with PromPredict (72.14) followed by BProm (68.37). On the basis of 'Precision' value, the next best 'Precision' was observed for Pepper (75.39) followed by PromPredict (72.01). SEProm maintained the lead even when comparison was done on two test organisms (not involved in training for SEProm). AVAILABILITY AND IMPLEMENTATION The software is freely available with easy to follow instructions (www.scfbio-iitd.res.in/software/TSS_Predict.jsp). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology.,Kusuma School of Biological Sciences, Indian Institute of Technology, New Delhi 110016, India
| | - Sahil Dhanda
- Supercomputing Facility for Bioinformatics & Computational Biology
| | - Priyanka Siwach
- Supercomputing Facility for Bioinformatics & Computational Biology.,Department of Biotechnology, Chaudhary Devi Lal University, Sirsa 125055, India
| | - Shruti Aggarwal
- Supercomputing Facility for Bioinformatics & Computational Biology
| | - B Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology.,Kusuma School of Biological Sciences, Indian Institute of Technology, New Delhi 110016, India.,Department of Chemistry, Indian Institute of Technology, New Delhi 110016, India
| |
Collapse
|
23
|
Tang Q, Nie F, Kang J, Chen W. ncPro-ML: An integrated computational tool for identifying non-coding RNA promoters in multiple species. Comput Struct Biotechnol J 2020; 18:2445-2452. [PMID: 33005306 PMCID: PMC7509369 DOI: 10.1016/j.csbj.2020.09.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 08/30/2020] [Accepted: 09/01/2020] [Indexed: 02/07/2023] Open
Abstract
A computational method for identifying non-coding promoters was proposed for the first time. A high-quality dataset was built to train and test the models for identifying non-coding promoters. A user-friendly web server was developed to recognize non-coding promoters.
The promoter is located near the transcription start sites and regulates transcription initiation of the gene. Accurate identification of promoters is essential for understanding the mechanism of gene regulation. Since experimental methods are costly and ineffective, developing efficient and accurate computational tools to identify promoters are necessary. Although a series of methods have been proposed for identifying promoters, none of them is able to identify the promoters of non-coding RNA (ncRNA). In the present work, a new method called ncPro-ML was proposed to identify the promoter of ncRNA in Homo sapiens and Mus musculus, in which different kinds of sequence encoding schemes were used to convert DNA sequences into feature vectors. To test the length effect, for each species, datasets including sequences with different lengths were built. The results demonstrated that ncPro-ML achieved the best performance based on the dataset with the sequence length of 221 nucleotides for human and mouse. The performances of ncPro-ML were also satisfying from both independent dataset test and cross-species test. The results indicate that the proposed predictor can server as a powerful tool for the discovery of ncRNA promoters. In addition, a web-server for ncPro-ML was developed, which can be freely accessed at http://www.bio-bigdata.cn/ncPro-ML/.
Collapse
Affiliation(s)
- Qiang Tang
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Fulei Nie
- Center for Genomics and Computational Biology, Scholl of Life Sciences, North China University of Science and Technology, Tangshan 063210, China
- School of Public Health, North China University of Science and Technology, Tangshan 063210, China
| | - Juanjuan Kang
- Affiliated Foshan Maternity & Child Healthcare Hospital, Southern Medical University (Foshan Maternity & Child Healthcare Hospital), Foshan 528000, China
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
- Center for Genomics and Computational Biology, Scholl of Life Sciences, North China University of Science and Technology, Tangshan 063210, China
- School of Public Health, North China University of Science and Technology, Tangshan 063210, China
- Corresponding author: Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.
| |
Collapse
|
24
|
Tang H, Wu Y, Deng J, Chen N, Zheng Z, Wei Y, Luo X, Keasling JD. Promoter Architecture and Promoter Engineering in Saccharomyces cerevisiae. Metabolites 2020; 10:metabo10080320. [PMID: 32781665 PMCID: PMC7466126 DOI: 10.3390/metabo10080320] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 07/30/2020] [Accepted: 08/04/2020] [Indexed: 12/23/2022] Open
Abstract
Promoters play an essential role in the regulation of gene expression for fine-tuning genetic circuits and metabolic pathways in Saccharomyces cerevisiae (S. cerevisiae). However, native promoters in S. cerevisiae have several limitations which hinder their applications in metabolic engineering. These limitations include an inadequate number of well-characterized promoters, poor dynamic range, and insufficient orthogonality to endogenous regulations. Therefore, it is necessary to perform promoter engineering to create synthetic promoters with better properties. Here, we review recent advances related to promoter architecture, promoter engineering and synthetic promoter applications in S. cerevisiae. We also provide a perspective of future directions in this field with an emphasis on the recent advances of machine learning based promoter designs.
Collapse
Affiliation(s)
- Hongting Tang
- Center for Synthetic Biochemistry, Shenzhen Institutes for Advanced Technologies, Chinese Academy of Sciences, Shenzhen 518055, China; (H.T.); (Y.W.); (J.D.); (N.C.); (Z.Z.)
| | - Yanling Wu
- Center for Synthetic Biochemistry, Shenzhen Institutes for Advanced Technologies, Chinese Academy of Sciences, Shenzhen 518055, China; (H.T.); (Y.W.); (J.D.); (N.C.); (Z.Z.)
| | - Jiliang Deng
- Center for Synthetic Biochemistry, Shenzhen Institutes for Advanced Technologies, Chinese Academy of Sciences, Shenzhen 518055, China; (H.T.); (Y.W.); (J.D.); (N.C.); (Z.Z.)
| | - Nanzhu Chen
- Center for Synthetic Biochemistry, Shenzhen Institutes for Advanced Technologies, Chinese Academy of Sciences, Shenzhen 518055, China; (H.T.); (Y.W.); (J.D.); (N.C.); (Z.Z.)
| | - Zhaohui Zheng
- Center for Synthetic Biochemistry, Shenzhen Institutes for Advanced Technologies, Chinese Academy of Sciences, Shenzhen 518055, China; (H.T.); (Y.W.); (J.D.); (N.C.); (Z.Z.)
| | - Yongjun Wei
- School of Pharmaceutical Sciences, Key Laboratory of Advanced Drug Preparation Technologies, Ministry of Education, Zhengzhou University, Zhengzhou 450001, China;
| | - Xiaozhou Luo
- Center for Synthetic Biochemistry, Shenzhen Institutes for Advanced Technologies, Chinese Academy of Sciences, Shenzhen 518055, China; (H.T.); (Y.W.); (J.D.); (N.C.); (Z.Z.)
- Correspondence: (X.L.); (J.D.K.)
| | - Jay D. Keasling
- Center for Synthetic Biochemistry, Shenzhen Institutes for Advanced Technologies, Chinese Academy of Sciences, Shenzhen 518055, China; (H.T.); (Y.W.); (J.D.); (N.C.); (Z.Z.)
- Joint BioEnergy Institute, Emeryville, CA 94608, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Department of Chemical and Biomolecular Engineering & Department of Bioengineering, University of California, Berkeley, CA 94720, USA
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Kongens Lyngby, Denmark
- Correspondence: (X.L.); (J.D.K.)
| |
Collapse
|
25
|
Cloning and promoter analysis of palladin 90-kDa, 140-kDa, and 200-kDa isoforms involved in skeletal muscle cell maturation. BMC Res Notes 2020; 13:321. [PMID: 32620172 PMCID: PMC7333403 DOI: 10.1186/s13104-020-05152-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 06/24/2020] [Indexed: 11/10/2022] Open
Abstract
Objective Palladin is a ubiquitous phosphoprotein expressed in vertebrate cells that works as a scaffolding protein. Several isoforms deriving from alternative splicing are originated from the palladin gene and involved in mesenchymal and muscle cells formation, maturation, migration, and contraction. Recent studies have linked palladin to the invasive spread of cancer and myogenesis. However, since its discovery, the promoter region of the palladin gene has never been studied. The objective of this study was to predict, identify, and measure the activity of the promoter regions of palladin gene. Results By using promoter prediction programs, we successfully identified the transcription start sites for the Palld isoforms and revealed the presence of a variety of transcriptional regulatory elements including TATA box, GATA, MyoD, myogenin, MEF, Nkx2-5, and Tcf3 upstream promoter regions. The transcriptome profiling approach confirmed the active role of predicted transcription factors in the mouse genome. This study complements the missing piece in the characterization of palladin gene and certainly contributes to understanding the complexity and enrollment of palladin regulatory factors in gene transcription.
Collapse
|
26
|
Amin R, Rahman CR, Ahmed S, Sifat MHR, Liton MNK, Rahman MM, Khan MZH, Shatabda S. iPromoter-BnCNN: a novel branched CNN-based predictor for identifying and classifying sigma promoters. Bioinformatics 2020; 36:4869-4875. [DOI: 10.1093/bioinformatics/btaa609] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2019] [Revised: 05/19/2020] [Accepted: 06/24/2020] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Promoter is a short region of DNA which is responsible for initiating transcription of specific genes. Development of computational tools for automatic identification of promoters is in high demand. According to the difference of functions, promoters can be of different types. Promoters may have both intra- and interclass variation and similarity in terms of consensus sequences. Accurate classification of various types of sigma promoters still remains a challenge.
Results
We present iPromoter-BnCNN for identification and accurate classification of six types of promoters—σ24,σ28,σ32,σ38,σ54,σ70. It is a CNN-based classifier which combines local features related to monomer nucleotide sequence, trimer nucleotide sequence, dimer structural properties and trimer structural properties through the use of parallel branching. We conducted experiments on a benchmark dataset and compared with six state-of-the-art tools to show our supremacy on 5-fold cross-validation. Moreover, we tested our classifier on an independent test dataset.
Availability and implementation
Our proposed tool iPromoter-BnCNN web server is freely available at http://103.109.52.8/iPromoter-BnCNN. The runnable source code can be found https://colab.research.google.com/drive/1yWWh7BXhsm8U4PODgPqlQRy23QGjF2DZ.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ruhul Amin
- Department of Computer Science and Engineering, United International University, Dhaka 1207, Bangladesh
| | - Chowdhury Rafeed Rahman
- Department of Computer Science and Engineering, United International University, Dhaka 1207, Bangladesh
| | - Sajid Ahmed
- Department of Computer Science and Engineering, United International University, Dhaka 1207, Bangladesh
| | - Md Habibur Rahman Sifat
- Department of Computer Science and Engineering, United International University, Dhaka 1207, Bangladesh
| | - Md Nazmul Khan Liton
- Department of Computer Science and Engineering, United International University, Dhaka 1207, Bangladesh
| | - Md Moshiur Rahman
- Department of Computer Science and Engineering, United International University, Dhaka 1207, Bangladesh
| | - Md Zahid Hossain Khan
- Department of Computer Science and Engineering, United International University, Dhaka 1207, Bangladesh
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Dhaka 1207, Bangladesh
| |
Collapse
|
27
|
Simó-Mirabet P, Perera E, Calduch-Giner JA, Pérez-Sánchez J. Local DNA methylation helps to regulate muscle sirtuin 1 gene expression across seasons and advancing age in gilthead sea bream ( Sparus aurata). Front Zool 2020; 17:15. [PMID: 32467713 PMCID: PMC7227224 DOI: 10.1186/s12983-020-00361-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 04/30/2020] [Indexed: 12/15/2022] Open
Abstract
Background Sirtuins (SIRTs) are master regulators of metabolism, and their expression patterns in gilthead sea bream (GSB) reveal different tissue metabolic capabilities and changes in energy status. Since little is known about their transcriptional regulation, the aim of this work was to study for the first time in fish the effect of age and season on sirt gene expression, correlating expression patterns with local changes in DNA methylation in liver and white skeletal muscle (WSM). Methods Gene organization of the seven sirts was analyzed by BLAT searches in the IATS-CSIC genomic database (www.nutrigroup-iats.org/seabreamdb/). The presence of CpG islands (CGIs) was mapped by means of MethPrimer software. DNA methylation analyses were performed by bisulfite pyrosequencing. A PCR array was designed for the simultaneous gene expression profiling of sirts and related markers (cs, cpt1a, pgc1α, ucp1, and ucp3) in the liver and WSM of one- and three-year-old fish during winter and summer. Results The occurrence of CGIs was evidenced in the sirt1 and sirt3 promoters. This latter CGI remained hypomethylated regardless of tissue, age and season. Conversely, DNA methylation of sirt1 at certain CpG positions within the promoter varied with age and season in the WSM. Among them, changes at several SP1 binding sites were negatively correlated with the decrease in sirt1 expression in summer and in younger fish. Changes in sirt1 regulation match well with variations in feed intake and energy metabolism, as judged by the concurrent changes in the analyzed markers. This was supported by discriminant analyses, which identified sirt1 as a highly responsive element to age- and season-mediated changes in energy metabolism in WSM. Conclusions The gene organization of SIRTs is highly conserved in vertebrates. GSB sirt family members have CGI- and non-CGI promoters, and the presence of CGIs at the sirt1 promoter agrees with its ubiquitous expression. Gene expression analyses support that sirts, especially sirt1, are reliable markers of age- and season-dependent changes in energy metabolism. Correlation analyses suggest the involvement of DNA methylation in the regulation of sirt1 expression, but the low methylation levels suggest the contribution of other putative mechanisms in the transcriptional regulation of sirt1.
Collapse
Affiliation(s)
- Paula Simó-Mirabet
- Nutrigenomics and Fish Growth Endocrinology Group, Institute of Aquaculture Torre de la Sal, IATS-CSIC, 12595 Ribera de Cabanes s/n, Castellón, Spain
| | - Erick Perera
- Nutrigenomics and Fish Growth Endocrinology Group, Institute of Aquaculture Torre de la Sal, IATS-CSIC, 12595 Ribera de Cabanes s/n, Castellón, Spain
| | - Josep Alvar Calduch-Giner
- Nutrigenomics and Fish Growth Endocrinology Group, Institute of Aquaculture Torre de la Sal, IATS-CSIC, 12595 Ribera de Cabanes s/n, Castellón, Spain
| | - Jaume Pérez-Sánchez
- Nutrigenomics and Fish Growth Endocrinology Group, Institute of Aquaculture Torre de la Sal, IATS-CSIC, 12595 Ribera de Cabanes s/n, Castellón, Spain
| |
Collapse
|
28
|
Wang TY, Guo X. Expression vector cassette engineering for recombinant therapeutic production in mammalian cell systems. Appl Microbiol Biotechnol 2020; 104:5673-5688. [PMID: 32372203 DOI: 10.1007/s00253-020-10640-w] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 04/13/2020] [Accepted: 04/20/2020] [Indexed: 12/16/2022]
Abstract
Human tissue plasminogen activator was the first recombinant therapy protein that successfully produced in Chinese hamster ovary cells in 1986 and approved for clinical use. Since then, more and more therapeutic proteins are being manufactured in mammalian cells, and the technologies for recombinant protein production in this expression system have developed rapidly, with the optimization of both upstream and downstream processes. One of the most promising strategies is expression vector cassette optimization based on the expression vector cassette. In this review paper, these approaches and developments are summarized, and the future strategy on the utilizing of expression cassettes for the production of recombinant therapeutic proteins in mammalian cells is discussed.
Collapse
Affiliation(s)
- Tian-Yun Wang
- Department of Biochemistry and Molecular Biology, Xinxiang Medical University, Xinxiang, 453003, Henan, China.
- International Joint Research Laboratory for Recombinant Pharmaceutical Protein Expression System of Henan, Xinxiang Medical University, Xinxiang, 453003, Henan, China.
| | - Xiao Guo
- International Joint Research Laboratory for Recombinant Pharmaceutical Protein Expression System of Henan, Xinxiang Medical University, Xinxiang, 453003, Henan, China
- Perildicals Publishing House, Xinxiang Medical University, Xinxiang, Henan, China
| |
Collapse
|
29
|
Georgakilas GK, Perdikopanis N, Hatzigeorgiou A. Solving the transcription start site identification problem with ADAPT-CAGE: a Machine Learning algorithm for the analysis of CAGE data. Sci Rep 2020; 10:877. [PMID: 31965016 PMCID: PMC6972925 DOI: 10.1038/s41598-020-57811-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 12/18/2019] [Indexed: 11/20/2022] Open
Abstract
Cap Analysis of Gene Expression (CAGE) has emerged as a powerful experimental technique for assisting in the identification of transcription start sites (TSSs). There is strong evidence that CAGE also identifies capping sites along various other locations of transcribed loci such as splicing byproducts, alternative isoforms and capped molecules overlapping introns and exons. We present ADAPT-CAGE, a Machine Learning framework which is trained to distinguish between CAGE signal derived from TSSs and transcriptional noise. ADAPT-CAGE provides highly accurate experimentally derived TSSs on a genome-wide scale. It has been specifically designed for flexibility and ease-of-use by only requiring aligned CAGE data and the underlying genomic sequence. When compared to existing algorithms, ADAPT-CAGE exhibits improved performance on every benchmark that we designed based on both annotation- and experimentally-driven strategies. This performance boost brings ADAPT-CAGE in the spotlight as a computational framework that is able to assist in the refinement of gene regulatory networks, the incorporation of accurate information of gene expression regulators and alternative promoter usage in both physiological and pathological conditions.
Collapse
Affiliation(s)
- Georgios K Georgakilas
- Hellenic Pasteur Institute, Athens, 11521, Greece. .,Department of Electrical and Computer Engineering, University of Thessaly, Volos, Greece. .,Central European Institute of Technology, Masaryk University, Kamenice 735/5, 62500, Brno, Czech Republic.
| | - Nikos Perdikopanis
- Hellenic Pasteur Institute, Athens, 11521, Greece.,Department of Electrical and Computer Engineering, University of Thessaly, Volos, Greece.,Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens, Greece
| | - Artemis Hatzigeorgiou
- Hellenic Pasteur Institute, Athens, 11521, Greece. .,Department of Electrical and Computer Engineering, University of Thessaly, Volos, Greece.
| |
Collapse
|
30
|
Moisseyev G, Park K, Cui A, Freitas D, Rajagopal D, Konda AR, Martin-Olenski M, Mcham M, Liu K, Du Q, Schnable JC, Moriyama EN, Cahoon EB, Zhang C. RGPDB: database of root-associated genes and promoters in maize, soybean, and sorghum. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5851117. [PMID: 32500918 PMCID: PMC7273057 DOI: 10.1093/database/baaa038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 03/02/2020] [Accepted: 05/06/2020] [Indexed: 12/21/2022]
Abstract
Root-associated genes play an important role in plants. Despite the fact that there have been studies on root biology, information on genes that are specifically expressed or upregulated in roots is poorly collected. There exist very few databases dedicated to genes and promoters associated with root biology, preventing effective root-related studies. Therefore, we analyzed multiple types of omics data to identify root-associated genes in maize, soybean, and sorghum and constructed a comprehensive online database of these genes and their promoter sequences. This database creates a pivotal platform capable of stimulating and facilitating further studies on manipulating root growth and development.
Collapse
Affiliation(s)
- Gleb Moisseyev
- Young Nebraska Scientists Program, University of Nebraska (EPSCoR), Lincoln, NE 68588, USA
| | - Kiyoul Park
- Department of Biochemistry, University of Nebraska, Lincoln, NE 68588 USA.,Center for Plant Science Innovation, University of Nebraska, Lincoln, NE 68588, USA
| | - Alix Cui
- Young Nebraska Scientists Program, University of Nebraska (EPSCoR), Lincoln, NE 68588, USA
| | - Daniel Freitas
- Young Nebraska Scientists Program, University of Nebraska (EPSCoR), Lincoln, NE 68588, USA
| | - Divith Rajagopal
- Young Nebraska Scientists Program, University of Nebraska (EPSCoR), Lincoln, NE 68588, USA
| | - Anji Reddy Konda
- Department of Biochemistry, University of Nebraska, Lincoln, NE 68588 USA.,Center for Plant Science Innovation, University of Nebraska, Lincoln, NE 68588, USA
| | | | - Mackenzie Mcham
- Young Nebraska Scientists Program, University of Nebraska (EPSCoR), Lincoln, NE 68588, USA
| | - Kan Liu
- School of Biological Sciences, University of Nebraska, Lincoln, NE 68588 USA.,Center for Plant Science Innovation, University of Nebraska, Lincoln, NE 68588, USA
| | - Qian Du
- School of Biological Sciences, University of Nebraska, Lincoln, NE 68588 USA.,Center for Plant Science Innovation, University of Nebraska, Lincoln, NE 68588, USA
| | - James C Schnable
- Department of Agronomy and Horticulture, University of Nebraska, Lincoln, NE 68583 USA.,Center for Plant Science Innovation, University of Nebraska, Lincoln, NE 68588, USA
| | - Etsuko N Moriyama
- School of Biological Sciences, University of Nebraska, Lincoln, NE 68588 USA.,Center for Plant Science Innovation, University of Nebraska, Lincoln, NE 68588, USA
| | - Edgar B Cahoon
- Department of Biochemistry, University of Nebraska, Lincoln, NE 68588 USA.,Center for Plant Science Innovation, University of Nebraska, Lincoln, NE 68588, USA
| | - Chi Zhang
- School of Biological Sciences, University of Nebraska, Lincoln, NE 68588 USA.,Center for Plant Science Innovation, University of Nebraska, Lincoln, NE 68588, USA
| |
Collapse
|
31
|
Lenzini L, Di Patti F, Livi R, Fondi M, Fani R, Mengoni A. A Method for the Structure-Based, Genome-Wide Analysis of Bacterial Intergenic Sequences Identifies Shared Compositional and Functional Features. Genes (Basel) 2019; 10:genes10100834. [PMID: 31652625 PMCID: PMC6826451 DOI: 10.3390/genes10100834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Revised: 10/07/2019] [Accepted: 10/16/2019] [Indexed: 11/16/2022] Open
Abstract
In this paper, we propose a computational strategy for performing genome-wide analyses of intergenic sequences in bacterial genomes. Following similar directions of a previous paper, where a method for genome-wide analysis of eucaryotic Intergenic sequences was proposed, here we developed a tool for implementing similar concepts in bacteria genomes. This allows us to (i) classify intergenic sequences into clusters, characterized by specific global structural features and (ii) draw possible relations with their functional features.
Collapse
Affiliation(s)
- Leonardo Lenzini
- Dipartimento di Fisica e Astronomia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
- Istituto Nazionale di Fisica Nucleare, Sesto Fiorentino, 50019, Italy.
| | - Francesca Di Patti
- Dipartimento di Fisica e Astronomia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
- Centro Interdipartimentale per lo Studio delle Dinamiche Complesse, Sesto Fiorentino, 50019, Italy.
| | - Roberto Livi
- Dipartimento di Fisica e Astronomia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
- Istituto Nazionale di Fisica Nucleare, Sesto Fiorentino, 50019, Italy.
- Centro Interdipartimentale per lo Studio delle Dinamiche Complesse, Sesto Fiorentino, 50019, Italy.
- Istituto dei Sistemi Complessi, Consiglio Nazionale delle Ricerche, Sesto Fiorentino, 50019, Italy.
| | - Marco Fondi
- Dipartimento di Biologia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
| | - Renato Fani
- Istituto dei Sistemi Complessi, Consiglio Nazionale delle Ricerche, Sesto Fiorentino, 50019, Italy.
- Dipartimento di Biologia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
| | - Alessio Mengoni
- Dipartimento di Biologia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
| |
Collapse
|
32
|
Cencini M, Pigolotti S. Energetic funnel facilitates facilitated diffusion. Nucleic Acids Res 2019; 46:558-567. [PMID: 29216364 PMCID: PMC5778461 DOI: 10.1093/nar/gkx1220] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Accepted: 11/24/2017] [Indexed: 01/25/2023] Open
Abstract
Transcription factors (TFs) are able to associate to their binding sites on DNA faster than the physical limit posed by diffusion. Such high association rates can be achieved by alternating between three-dimensional diffusion and one-dimensional sliding along the DNA chain, a mechanism-dubbed facilitated diffusion. By studying a collection of TF binding sites of Escherichia coli from the RegulonDB database and of Bacillus subtilis from DBTBS, we reveal a funnel in the binding energy landscape around the target sequences. We show that such a funnel is linked to the presence of gradients of AT in the base composition of the DNA region around the binding sites. An extensive computational study of the stochastic sliding process along the energetic landscapes obtained from the database shows that the funnel can significantly enhance the probability of TFs to find their target sequences when sliding in their proximity. We demonstrate that this enhancement leads to a speed-up of the association process.
Collapse
Affiliation(s)
- Massimo Cencini
- Istituto dei Sistemi Complessi, Consiglio Nazionale delle Ricerche, via dei Taurini 19, 00185 Rome, Italy
| | - Simone Pigolotti
- Biological Complexity Unit, Okinawa Institute of Science and Technology and Graduate University, Onna, Okinawa 904-0495, Japan.,Max Planck Institute for the Physics of Complex Systems, Nöthnitzerstraße 38, 01187 Dresden, Germany.,Departament de Fisica, Universitat Politecnica de Catalunya Edif. GAIA, Rambla Sant Nebridi 22, 08222 Terrassa, Barcelona, Spain
| |
Collapse
|
33
|
Liu B, Han L, Liu X, Wu J, Ma Q. Computational Prediction of Sigma-54 Promoters in Bacterial Genomes by Integrating Motif Finding and Machine Learning Strategies. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1211-1218. [PMID: 29993815 DOI: 10.1109/tcbb.2018.2816032] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Sigma factor, as a unit of RNA polymerase holoenzyme, is a critical factor in the process of gene transcriptional regulation. It recognizes the specific DNA sites and brings the core enzyme of RNA polymerase to the upstream regions of target genes. Therefore, the prediction of the promoters for a particular sigma factor is essential for interpreting functional genomic data and observation. This paper develops a new method to predict sigma-54 promoters in bacterial genomes. The new method organically integrates motif finding and machine learning strategies to capture the intrinsic features of sigma-54 promoters. The experiments on E. coli benchmark test set show that our method has good capability to distinguish sigma-54 promoters from surrounding or randomly selected DNA sequences. The applications of the other three bacterial genomes indicate the potential robustness and applicative power of our method on a large number of bacterial genomes. The source code of our method can be freely downloaded at https://github.com/maqin2001/PromotePredictor.
Collapse
|
34
|
Lai HY, Zhang ZY, Su ZD, Su W, Ding H, Chen W, Lin H. iProEP: A Computational Predictor for Predicting Promoter. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 17:337-346. [PMID: 31299595 PMCID: PMC6616480 DOI: 10.1016/j.omtn.2019.05.028] [Citation(s) in RCA: 99] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2019] [Revised: 05/18/2019] [Accepted: 05/19/2019] [Indexed: 11/29/2022]
Abstract
Promoter is a fundamental DNA element located around the transcription start site (TSS) and could regulate gene transcription. Promoter recognition is of great significance in determining transcription units, studying gene structure, analyzing gene regulation mechanisms, and annotating gene functional information. Many models have already been proposed to predict promoters. However, the performances of these methods still need to be improved. In this work, we combined pseudo k-tuple nucleotide composition (PseKNC) with position-correlation scoring function (PCSF) to formulate promoter sequences of Homo sapiens (H. sapiens), Drosophila melanogaster (D. melanogaster), Caenorhabditis elegans (C. elegans), Bacillus subtilis (B. subtilis), and Escherichia coli (E. coli). Minimum Redundancy Maximum Relevance (mRMR) algorithm and increment feature selection strategy were then adopted to find out optimal feature subsets. Support vector machine (SVM) was used to distinguish between promoters and non-promoters. In the 10-fold cross-validation test, accuracies of 93.3%, 93.9%, 95.7%, 95.2%, and 93.1% were obtained for H. sapiens, D. melanogaster, C. elegans, B. subtilis, and E. coli, with the areas under receiver operating curves (AUCs) of 0.974, 0.975, 0.981, 0.988, and 0.976, respectively. Comparative results demonstrated that our method outperforms existing methods for identifying promoters. An online web server was established that can be freely accessed (http://lin-group.cn/server/iProEP/).
Collapse
Affiliation(s)
- Hong-Yan Lai
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zhao-Yue Zhang
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zhen-Dong Su
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Su
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Chen
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China; Center for Genomics and Computational Biology, School of Life Sciences, North China University of Science and Technology, Tangshan 063000, China.
| | - Hao Lin
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
35
|
Variation of gene expression in plants is influenced by gene architecture and structural properties of promoters. PLoS One 2019; 14:e0212678. [PMID: 30908494 PMCID: PMC6433290 DOI: 10.1371/journal.pone.0212678] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2018] [Accepted: 02/07/2019] [Indexed: 12/03/2022] Open
Abstract
In higher eukaryotes, gene architecture and structural properties of promoters have emerged as significant factors influencing variation in number of transcripts (expression level) and specificity of gene expression in a tissue (expression breadth), which eventually shape the phenotype. In this study, transcriptome data of different tissue types at various developmental stages of A. thaliana, O. sativa, S. bicolor and Z. mays have been used to understand the relationship between properties of gene components and its expression. Our findings indicate that in plants, among all gene architecture and structural properties of promoters, compactness of genes in terms of intron content is significantly linked to gene expression level and breadth, whereas in human an exactly opposite scenario is seen. In plants, for the first time we have carried out a quantitative estimation of effect of a particular trait on expression level and breadth, by using multiple regression analysis and it confirms that intron content of primary transcript (as %) is a powerful determinant of expression breadth. Similarly, further regression analysis revealed that among structural properties of the promoters, stability is negatively linked to expression breadth, while DNase1 sensitivity strongly governs gene expression breadth in monocots and gene expression level in dicots. In addition, promoter regions of tissue specific genes are found to be enriched with TATA box and Y-patch motifs. Finally, multi copy orthologous genes in plants are found to be longer, highly regulated and tissue specific.
Collapse
|
36
|
Dao FY, Lv H, Wang F, Ding H. Recent Advances on the Machine Learning Methods in Identifying DNA Replication Origins in Eukaryotic Genomics. Front Genet 2018; 9:613. [PMID: 30619452 PMCID: PMC6295579 DOI: 10.3389/fgene.2018.00613] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2018] [Accepted: 11/21/2018] [Indexed: 01/01/2023] Open
Abstract
The initiate site of DNA replication is called origins of replication (ORI) which is regulated by a set of regulatory proteins and plays important roles in the basic biochemical process during cell growth and division in all living organisms. Therefore, the study of ORIs is essential for understanding the cell-division cycle and gene expression regulation so that scholars can develop a new strategy against genetic diseases by using the knowledge of DNA replication. Thus, the accurate identification of ORIs will provide key clues for DNA replication research and clinical medicine. Although, the conventional experiments could provide accurate results, they are time-consuming and cost ineffective. On the contrary, bioinformatics-based methods can overcome these shortcomings. Especially, with the emergence of DNA sequences in the post-genomic era, it is highly expected to develop high throughput tools to identify ORIs based on sequence information. In this review, we will summarize the current progress in computational prediction of eukaryotic ORIs including the collection of benchmark dataset, the application of machine learning-based techniques, the results obtained by these methods, and the construction of web servers. Finally, we gave the future perspectives on ORIs prediction. The review provided readers with a whole background of ORIs prediction based on machine learning methods, which will be helpful for researchers to study DNA replication in-depth and drug therapy of genetic defect.
Collapse
Affiliation(s)
- Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lv
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Fang Wang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
37
|
Mishra A, Siwach P, Misra P, Jayaram B, Bansal M, Olson WK, Thayer KM, Beveridge DL. Toward a Universal Structural and Energetic Model for Prokaryotic Promoters. Biophys J 2018; 115:1180-1189. [PMID: 30172386 DOI: 10.1016/j.bpj.2018.08.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Revised: 07/28/2018] [Accepted: 08/02/2018] [Indexed: 01/04/2023] Open
Abstract
With almost no consensus promoter sequence in prokaryotes, recruitment of RNA polymerase (RNAP) to precise transcriptional start sites (TSSs) has remained an unsolved puzzle. Uncovering the underlying mechanism is critical for understanding the principle of gene regulation. We attempted to search the hidden code in ∼16,500 promoters of 12 prokaryotes representing two kingdoms in their structure and energetics. Twenty-eight fundamental parameters of DNA structure including backbone angles, basepair axis, and interbasepair and intrabasepair parameters were used, and information was extracted from x-ray crystallography data. Three parameters (solvation energy, hydrogen-bond energy, and stacking energy) were selected for creating energetics profiles using in-house programs. DNA of promoter regions was found to be inherently designed to undergo a change in every parameter undertaken for the study, in all prokaryotes. The change starts from some distance upstream of TSSs and continues past some distance from TSS, hence giving a signature state to promoter regions. These signature states might be the universal hidden codes recognized by RNAP. This observation was reiterated when randomly selected promoter sequences (with little sequence conservation) were subjected to structure generation; all developed into very similar three-dimensional structures quite distinct from those of conventional B-DNA and coding sequences. Fine structural details at important motifs (viz. -11, -35, and -75 positions relative to TSS) of promoters reveal novel to our knowledge and pointed insights for RNAP interaction at these locations; it could be correlated with how some particular structural changes at the -11 region may allow insertion of RNAP amino acids in interbasepair space as well as facilitate the flipping out of bases from the DNA duplex.
Collapse
Affiliation(s)
- Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology; Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Priyanka Siwach
- Supercomputing Facility for Bioinformatics & Computational Biology; Department of Biotechnology, Chaudhary Devi Lal University, Sirsa, Haryana, India
| | - Pallavi Misra
- Supercomputing Facility for Bioinformatics & Computational Biology
| | - Bhyravabhotla Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology; Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India; Department of Chemistry, Indian Institute of Technology, Delhi, India.
| | - Manju Bansal
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| | - Wilma K Olson
- Department of Chemistry & Chemical Biology and BioMaPS Institute for Quantitative Biology, Rutgers, Piscataway, New Jersey
| | - Kelly M Thayer
- Department of Chemistry, Vassar College, Poughkeepsie, New York
| | - David L Beveridge
- Departments of Chemistry, Molecular Biology, and Biochemistry and Molecular Biophysics Program, Wesleyan University, Middletown, Connecticut
| |
Collapse
|
38
|
Lara J, Teka MA, Sims S, Xia GL, Ramachandran S, Khudyakov Y. HCV adaptation to HIV coinfection. INFECTION GENETICS AND EVOLUTION 2018; 65:216-225. [PMID: 30075255 DOI: 10.1016/j.meegid.2018.07.039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 07/25/2018] [Accepted: 07/30/2018] [Indexed: 02/07/2023]
Abstract
Human immunodeficiency virus (HIV) infection is rising as a leading cause of morbidity and mortality among hepatitis C virus (HCV)-infected patients. Both viruses interact in co-infected hosts, which may affect their intra-host evolution, potentially leading to differing genetic composition of viral populations in co-infected (CIP) and mono-infected (MIP) patients. Here, we investigate genetic differences between intra-host variants of the HCV hypervariable region 1 (HVR1) sampled from CIP and MIP. Nucleotide (nt) sequences of intra-host HCV HVR1 variants (N = 28,622) obtained from CIP (N = 112) and MIP (n = 176) were represented using 148 physical-chemical (PhyChem) indexes of DNA nt dimers. Significant (p < .0001) differences in the means and frequency distributions of 7 PhyChem properties were found between HVR1 variants from both groups. Linear projection analysis of 29 PhyChem features extracted from such PhyChem properties showed that the CIP and MIP HVR1 variants have a distinct distribution in the modeled 2D-space, with only ~1.3% of PhyChem profiles (N = 6782), shared by all HVR1 variants, being found in both groups. Probabilistic neural network (PNN) and naïve Bayesian (NB) classifiers trained on the PhyChem features accurately classified HVR1 variants by the group in cross-validation experiments (AUROC ≥ 0.96). Similarly, both models showed a high accuracy (AUROC ≥ 0.95) when evaluated on a test dataset of HVR1 sequences obtained from 10 patients, data from whom were not used for model building. Both models performed at the expected lower accuracy on randomly labeled datasets in cross-validation experiments (AUROC = 0.50). The random-label trained PNN showed a similar drop in accuracy on the test dataset (AUROC = 0.48), indicating that the detected associations were unlikely due to random correlations. Marked differences in genetic composition of HCV HVR1 variants sampled from CIP and MIP suggest differing intra-host HCV evolution in the presence of HIV infection. PhyChem features identified here may be used for detection of HIV infection from intra-host HCV variants alone in co-infected patients, thus facilitating monitoring for HIV introduction to high-risk populations with high HCV prevalence.
Collapse
Affiliation(s)
- James Lara
- Centers for Disease Control, 1600 Clifton Road, Atlanta, GA 30333, United States.
| | - Mahder A Teka
- Centers for Disease Control, 1600 Clifton Road, Atlanta, GA 30333, United States
| | - Seth Sims
- Centers for Disease Control, 1600 Clifton Road, Atlanta, GA 30333, United States
| | - Guo-Liang Xia
- Centers for Disease Control, 1600 Clifton Road, Atlanta, GA 30333, United States
| | - Sumathi Ramachandran
- Centers for Disease Control, 1600 Clifton Road, Atlanta, GA 30333, United States
| | - Yury Khudyakov
- Centers for Disease Control, 1600 Clifton Road, Atlanta, GA 30333, United States
| |
Collapse
|
39
|
He W, Jia C, Duan Y, Zou Q. 70ProPred: a predictor for discovering sigma70 promoters based on combining multiple features. BMC SYSTEMS BIOLOGY 2018; 12:44. [PMID: 29745856 PMCID: PMC5998878 DOI: 10.1186/s12918-018-0570-1] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
BACKGROUND Promoter is an important sequence regulation element, which is in charge of gene transcription initiation. In prokaryotes, σ70 promoters regulate the transcription of most genes. The promoter recognition has been a crucial part of gene structure recognition. It's also the core issue of constructing gene transcriptional regulation network. With the successfully completion of genome sequencing from an increasing number of microbe species, the accurate identification of σ70 promoter regions in DNA sequence is not easy. RESULTS In order to improve the prediction accuracy of sigma70 promoters in prokaryote, a promoter recognition model 70ProPred was established. In this work, two sequence-based features, including position-specific trinucleotide propensity based on single-stranded characteristic (PSTNPss) and electron-ion potential values for trinucleotides (PseEIIP), were assessed to build the best prediction model. It was found that 79 features of PSTNPSS combined with 64 features of PseEIIP obtained the best performance for sigma70 promoter identification, with a promising accuracy and the Matthews correlation coefficient (MCC) at 95.56% and 0.90, respectively. CONCLUSION The jackknife tests showed that 70ProPred outperforms the existing sigma70 promoter prediction approaches in terms of accuracy and stability. Additionally, this approach can also be extended to predict promoters of other species. In order to facilitate experimental biologists, an online web server for the proposed method was established, which is freely available at http://server.malab.cn/70ProPred/ .
Collapse
Affiliation(s)
- Wenying He
- School of Computer Science and Technology, Tianjin University, Tianjin, 300072 China
| | - Cangzhi Jia
- Department of Mathematics, Dalian Maritime University, Dalian, 116026 China
| | - Yucong Duan
- College of Information and Technology, Hainan University, Haikou, 570228 China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, 300072 China
| |
Collapse
|
40
|
Abstract
Transcription is an intricate mechanism and is orchestrated at the promoter region. The cognate motifs in the promoters are observed in only a subset of total genes across different domains of life. Hence, sequence-motif based promoter prediction may not be a holistic approach for whole genomes. Conversely, the DNA structural property, duplex stability is a characteristic of promoters and can be used to delineate them from other genomic sequences. In this study, we have used a DNA duplex stability based algorithm ‘PromPredict’ for promoter prediction in a broad range of eukaryotes, representing various species of yeast, worm, fly, fish, and mammal. Efficiency of the software has been tested in promoter regions of 48 eukaryotic systems. PromPredict achieves recall values, which range from 68 to 92% in various eukaryotes. PromPredict performs well in mammals, although their core promoter regions are GC rich. ‘PromPredict’ has also been tested for its ability to predict promoter regions for various transcript classes (coding and non-coding), TATA-containing and TATA-less promoters as well as on promoter sequences belonging to different gene expression variability categories. The results support the idea that differential DNA duplex stability is a potential predictor of promoter regions in various genomes.
Collapse
|
41
|
Boroni M, Sammeth M, Gava SG, Jorge NAN, Macedo AM, Machado CR, Mourão MM, Franco GR. Landscape of the spliced leader trans-splicing mechanism in Schistosoma mansoni. Sci Rep 2018; 8:3877. [PMID: 29497070 PMCID: PMC5832876 DOI: 10.1038/s41598-018-22093-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Accepted: 02/12/2018] [Indexed: 11/09/2022] Open
Abstract
Spliced leader dependent trans-splicing (SLTS) has been described as an important RNA regulatory process that occurs in different organisms, including the trematode Schistosoma mansoni. We identified more than seven thousand putative SLTS sites in the parasite, comprising genes with a wide spectrum of functional classes, which underlines the SLTS as a ubiquitous mechanism in the parasite. Also, SLTS gene expression levels span several orders of magnitude, showing that SLTS frequency is not determined by the expression level of the target gene, but by the presence of particular gene features facilitating or hindering the trans-splicing mechanism. Our in-depth investigation of SLTS events demonstrates widespread alternative trans-splicing (ATS) acceptor sites occurring in different regions along the entire gene body, highlighting another important role of SLTS generating alternative RNA isoforms in the parasite, besides the polycistron resolution. Particularly for introns where SLTS directly competes for the same acceptor substrate with cis-splicing, we identified for the first time additional and important features that might determine the type of splicing. Our study substantially extends the current knowledge of RNA processing by SLTS in S. mansoni, and provide basis for future studies on the trans-splicing mechanism in other eukaryotes.
Collapse
Affiliation(s)
- Mariana Boroni
- Laboratório de Genética Bioquímica, Departamento de Bioquímica e Imunologia, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
- Laboratório de Bioinformática e Biologia Computacional, Coordenação de Pesquisa, Instituto Nacional de Câncer José Alencar Gomes da Silva, Rio de Janeiro, 20231-050, Brazil
| | - Michael Sammeth
- Bioinformatics in Transcriptomics and Functional Genomics (BITFUN), Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, 21941-901, Brazil
- Laboratório Nacional de Computação Científica, Petrópolis, 25651-075, Brazil
| | - Sandra Grossi Gava
- Grupo de Helmintologia e Malacologia Médica, Instituto René Rachou, Fundação Oswaldo Cruz, Belo Horizonte, 30190-009, Brazil
| | - Natasha Andressa Nogueira Jorge
- Laboratório de Bioinformática e Biologia Computacional, Coordenação de Pesquisa, Instituto Nacional de Câncer José Alencar Gomes da Silva, Rio de Janeiro, 20231-050, Brazil
| | - Andréa Mara Macedo
- Laboratório de Genética Bioquímica, Departamento de Bioquímica e Imunologia, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Carlos Renato Machado
- Laboratório de Genética Bioquímica, Departamento de Bioquímica e Imunologia, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Marina Moraes Mourão
- Grupo de Helmintologia e Malacologia Médica, Instituto René Rachou, Fundação Oswaldo Cruz, Belo Horizonte, 30190-009, Brazil.
| | - Glória Regina Franco
- Laboratório de Genética Bioquímica, Departamento de Bioquímica e Imunologia, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil.
| |
Collapse
|
42
|
Awasthi A, Nain V, Puria R. MYOD and HAND transcription factors have conserved recognition sites in mTOR promoter: insights from in silico analysis. Interdiscip Sci 2018; 11:329-335. [PMID: 29411313 DOI: 10.1007/s12539-018-0284-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Revised: 01/02/2018] [Accepted: 01/24/2018] [Indexed: 11/28/2022]
Abstract
mTOR regulates multiple cellular processes that are critical for proper maintenance of cell growth and development. However, mechanisms and factors responsible for transcriptional regulation of mTOR are partially known. To identify different transcription factor binding sites in promoter region of mTOR, we performed in silico phylogenetic foot printing analysis of diverse set of human orthologs. Phylogenetic tree for the orthologs was generated to establish the evolutionary relationships among them. Conserved binding sites among the species were predicted by tool MEME. The predicted conserved sites were further analyzed for binding of transcription factors by MatInspector program. Predicted TFs were then integrated with known physical interactions and coexpression data to decipher the important transcriptional regulators of mTOR signaling. Our study suggests that motifs AGGCGGG (+ 15 to + 21) and GGCGGC (+ 60 to + 65) are highly conserved across the species and are recognition sequence for HAND and MYOD transcription factors, respectively. Also these two transcription factors show direct physical interaction in protein-protein interaction map, indicating their regulatory role on expression of mTOR for control of myogenesis. Our study provides novel clues on differential regulation of mTOR under diverse environmental conditions.
Collapse
Affiliation(s)
- Ankita Awasthi
- School of Biotechnology, Gautam Buddha University, Gautam Budh Nagar, Greater Noida, 201312, India
| | - Vikrant Nain
- School of Biotechnology, Gautam Buddha University, Gautam Budh Nagar, Greater Noida, 201312, India.
| | - Rekha Puria
- School of Biotechnology, Gautam Buddha University, Gautam Budh Nagar, Greater Noida, 201312, India.
| |
Collapse
|
43
|
Magana-Mora A, Kalkatawi M, Bajic VB. Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA. BMC Genomics 2017; 18:620. [PMID: 28810905 PMCID: PMC5558757 DOI: 10.1186/s12864-017-4033-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Accepted: 08/07/2017] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Polyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and many long non-coding RNAs. The correct identification of poly(A) signals (PAS) not only helps to elucidate the 3'-end genomic boundaries of a transcribed DNA region and gene regulatory mechanisms but also gives insight into the multiple transcript isoforms resulting from alternative PAS. Although progress has been made in the in-silico prediction of genomic signals, the recognition of PAS in DNA genomic sequences remains a challenge. RESULTS In this study, we analyzed human genomic DNA sequences for the 12 most common PAS variants. Our analysis has identified a set of features that helps in the recognition of true PAS, which may be involved in the regulation of the polyadenylation process. The proposed features, in combination with a recognition model, resulted in a novel method and tool, Omni-PolyA. Omni-PolyA combines several machine learning techniques such as different classifiers in a tree-like decision structure and genetic algorithms for deriving a robust classification model. We performed a comparison between results obtained by state-of-the-art methods, deep neural networks, and Omni-PolyA. Results show that Omni-PolyA significantly reduced the average classification error rate by 35.37% in the prediction of the 12 considered PAS variants relative to the state-of-the-art results. CONCLUSIONS The results of our study demonstrate that Omni-PolyA is currently the most accurate model for the prediction of PAS in human and can serve as a useful complement to other PAS recognition methods. Omni-PolyA is publicly available as an online tool accessible at www.cbrc.kaust.edu.sa/omnipolya/ .
Collapse
Affiliation(s)
- Arturo Magana-Mora
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Manal Kalkatawi
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Vladimir B Bajic
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|
44
|
Shahmuradov IA, Umarov RK, Solovyev VV. TSSPlant: a new tool for prediction of plant Pol II promoters. Nucleic Acids Res 2017; 45:e65. [PMID: 28082394 PMCID: PMC5416875 DOI: 10.1093/nar/gkw1353] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2016] [Revised: 12/16/2016] [Accepted: 12/27/2016] [Indexed: 11/22/2022] Open
Abstract
Our current knowledge of eukaryotic promoters indicates their complex architecture that is often composed of numerous functional motifs. Most of known promoters include multiple and in some cases mutually exclusive transcription start sites (TSSs). Moreover, TSS selection depends on cell/tissue, development stage and environmental conditions. Such complex promoter structures make their computational identification notoriously difficult. Here, we present TSSPlant, a novel tool that predicts both TATA and TATA-less promoters in sequences of a wide spectrum of plant genomes. The tool was developed by using large promoter collections from ppdb and PlantProm DB. It utilizes eighteen significant compositional and signal features of plant promoter sequences selected in this study, that feed the artificial neural network-based model trained by the backpropagation algorithm. TSSPlant achieves significantly higher accuracy compared to the next best promoter prediction program for both TATA promoters (MCC≃0.84 and F1-score≃0.91 versus MCC≃0.51 and F1-score≃0.71) and TATA-less promoters (MCC≃0.80, F1-score≃0.89 versus MCC≃0.29 and F1-score≃0.50). TSSPlant is available to download as a standalone program at http://www.cbrc.kaust.edu.sa/download/.
Collapse
Affiliation(s)
- Ilham A. Shahmuradov
- King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
- Institue of Molecular Biology and Biotechnologies, ANAS, 2 Matbuat strasse, Baku AZ1073, Azerbaijan
| | - Ramzan Kh. Umarov
- King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | | |
Collapse
|
45
|
Yella VR, Bansal M. DNA structural features of eukaryotic TATA-containing and TATA-less promoters. FEBS Open Bio 2017; 7:324-334. [PMID: 28286728 PMCID: PMC5337902 DOI: 10.1002/2211-5463.12166] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Accepted: 11/16/2016] [Indexed: 01/26/2023] Open
Abstract
Eukaryotic genes can be broadly classified as TATA‐containing and TATA‐less based on the presence of TATA box in their promoters. Experiments on both classes of genes have revealed a disparity in the regulation of gene expression and cellular functions between the two classes. In this study, we report characteristic differences in promoter sequences and associated structural properties of the two categories of genes in six different eukaryotes. We have analyzed three structural features, DNA duplex stability, bendability, and curvature along with the distribution of A‐tracts, G‐quadruplex motifs, and CpG islands. The structural feature analyses reveal that while the two classes of gene promoters are distinctly different from each other, the properties are also distinguishable across the six organisms.
Collapse
Affiliation(s)
- Venkata Rajesh Yella
- Molecular Biophysics Unit Indian Institute of Science Bangalore India; Present address: Department of Biotechnology K L University, Vaddeswaram Guntur 522502 India
| | - Manju Bansal
- Molecular Biophysics Unit Indian Institute of Science Bangalore India
| |
Collapse
|
46
|
Il'icheva IA, Khodikov MV, Poptsova MS, Nechipurenko DY, Nechipurenko YD, Grokhovsky SL. Structural features of DNA that determine RNA polymerase II core promoter. BMC Genomics 2016; 17:973. [PMID: 27884105 PMCID: PMC5123417 DOI: 10.1186/s12864-016-3292-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 11/15/2016] [Indexed: 01/02/2023] Open
Abstract
Background The general structure and action of all eukaryotic and archaeal RNA polymerases machinery have an astonishing similarity despite the diversity of core promoter sequences in different species. The goal of our work is to find common characteristics of DNA region that define it as a promoter for the RNA polymerase II (Pol II). Results The profiles of a large number of physical and structural characteristics, averaged over representative sets of the Pol II minimal core promoters of the evolutionary divergent species from animals, plants and unicellular fungi were analysed. In addition to the characteristics defined at the base-pair steps, we, for the first time, use profiles of the ultrasonic cleavage and DNase I cleavage indexes, informative for internal properties of each complementary strand. Conclusions DNA of the core promoters of metazoans and Schizosaccharomyces pombe has similar structural organization. Its mechanical and 3D structural characteristics have singular properties at the positions of TATA-box. The minor groove is broadened and conformational motion is decreased in that region. Special characteristics of conformational behavior are revealed in metazoans at the region, which connects the end of TATA-box and the transcription start site (TSS). The intensities of conformational motions in the complementary strands are periodically changed in opposite phases. They are noticeable, best of all, in mammals. Such conformational features are lacking in the core promoters of S. pombe. The profiles of Saccharomyces cerevisiae core promoters significantly differ: their singular region is shifted down thus pointing to the uniqueness of their structural organization. Obtained results may be useful in genetic engineering for artificial modulation of the promoter strength. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3292-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Irina A Il'icheva
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia.
| | - Mingian V Khodikov
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | | | | | - Yury D Nechipurenko
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Sergei L Grokhovsky
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
47
|
Kumar A, Manivelan V, Bansal M. Structural features of DNA are conserved in the promoter region of orthologous genes across different strains ofHelicobacter pylori. FEMS Microbiol Lett 2016; 363:fnw207. [DOI: 10.1093/femsle/fnw207] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/25/2016] [Indexed: 12/19/2022] Open
|
48
|
Lu Y, Gan Y, Guan J, Zhou S. An integrative analysis of nucleosome occupancy and positioning using diverse sequence dependent properties. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.11.107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
49
|
Chen W, Feng P, Ding H, Lin H, Chou KC. Using deformation energy to analyze nucleosome positioning in genomes. Genomics 2016; 107:69-75. [DOI: 10.1016/j.ygeno.2015.12.005] [Citation(s) in RCA: 87] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Revised: 12/06/2015] [Accepted: 12/22/2015] [Indexed: 12/28/2022]
|
50
|
Promoter and Terminator Discovery and Engineering. ADVANCES IN BIOCHEMICAL ENGINEERING/BIOTECHNOLOGY 2016; 162:21-44. [PMID: 27277391 DOI: 10.1007/10_2016_8] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Control of gene expression is crucial to optimize metabolic pathways and synthetic gene networks. Promoters and terminators are stretches of DNA upstream and downstream (respectively) of genes that control both the rate at which the gene is transcribed and the rate at which mRNA is degraded. As a result, both of these elements control net protein expression from a synthetic construct. Thus, it is highly important to discover and engineer promoters and terminators with desired characteristics. This chapter highlights various approaches taken to catalogue these important synthetic elements. Specifically, early strategies have focused largely on semi-rational techniques such as saturation mutagenesis to diversify native promoters and terminators. Next, in an effort to reduce the length of the synthetic biology design cycle, efforts in the field have turned towards the rational design of synthetic promoters and terminators. In this vein, we cover recently developed methods such as hybrid engineering, high throughput characterization, and thermodynamic modeling which allow finer control in the rational design of novel promoters and terminators. Emphasis is placed on the methodologies used and this chapter showcases the utility of these methods across multiple host organisms.
Collapse
|