1
|
Ye DX, Yu JW, Li R, Hao YD, Wang TY, Yang H, Ding H. The Prediction of Recombination Hotspot Based on Automated Machine Learning. J Mol Biol 2024:168653. [PMID: 38871176 DOI: 10.1016/j.jmb.2024.168653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Revised: 05/12/2024] [Accepted: 06/06/2024] [Indexed: 06/15/2024]
Abstract
Meiotic recombination plays a pivotal role in genetic evolution. Genetic variation induced by recombination is a crucial factor in generating biodiversity and a driving force for evolution. At present, the development of recombination hotspot prediction methods has encountered challenges related to insufficient feature extraction and limited generalization capabilities. This paper focused on the research of recombination hotspot prediction methods. We explored deep learning-based recombination hotspot prediction and scrutinized the shortcomings of prevalent models in addressing the challenge of recombination hotspot prediction. To addressing these deficiencies, an automated machine learning approach was utilized to construct recombination hotspot prediction model. The model combined sequence information with physicochemical properties by employing TF-IDF-Kmer and DNA composition components to acquire more effective feature data. Experimental results validate the effectiveness of the feature extraction method and automated machine learning technology used in this study. The final model was validated on three distinct datasets and yielded accuracy rates of 97.14%, 79.71%, and 98.73%, surpassing the current leading models by 2%, 2.56%, and 4%, respectively. In addition, we incorporated tools such as SHAP and AutoGluon to analyze the interpretability of black-box models, delved into the impact of individual features on the results, and investigated the reasons behind misclassification of samples. Finally, an application of recombination hotspot prediction was established to facilitate easy access to necessary information and tools for researchers. The research outcomes of this paper underscore the enormous potential of automated machine learning methods in gene sequence prediction.
Collapse
Affiliation(s)
- Dong-Xin Ye
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Jun-Wen Yu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Rui Li
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yu-Duo Hao
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Tian-Yu Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Yang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China.
| | - Hui Ding
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
2
|
Savinkova LK, Sharypova EB, Kolchanov NA. On the Role of TATA Boxes and TATA-Binding Protein in Arabidopsis thaliana. PLANTS (BASEL, SWITZERLAND) 2023; 12:1000. [PMID: 36903861 PMCID: PMC10005294 DOI: 10.3390/plants12051000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 01/13/2023] [Accepted: 02/20/2023] [Indexed: 06/18/2023]
Abstract
For transcription initiation by RNA polymerase II (Pol II), all eukaryotes require assembly of basal transcription machinery on the core promoter, a region located approximately in the locus spanning a transcription start site (-50; +50 bp). Although Pol II is a complex multi-subunit enzyme conserved among all eukaryotes, it cannot initiate transcription without the participation of many other proteins. Transcription initiation on TATA-containing promoters requires the assembly of the preinitiation complex; this process is triggered by an interaction of TATA-binding protein (TBP, a component of the general transcription factor TFIID (transcription factor II D)) with a TATA box. The interaction of TBP with various TATA boxes in plants, in particular Arabidopsis thaliana, has hardly been investigated, except for a few early studies that addressed the role of a TATA box and substitutions in it in plant transcription systems. This is despite the fact that the interaction of TBP with TATA boxes and their variants can be used to regulate transcription. In this review, we examine the roles of some general transcription factors in the assembly of the basal transcription complex, as well as functions of TATA boxes of the model plant A. thaliana. We review examples showing not only the involvement of TATA boxes in the initiation of transcription machinery assembly but also their indirect participation in plant adaptation to environmental conditions in responses to light and other phenomena. Examples of an influence of the expression levels of A. thaliana TBP1 and TBP2 on morphological traits of the plants are also examined. We summarize available functional data on these two early players that trigger the assembly of transcription machinery. This information will deepen the understanding of the mechanisms underlying transcription by Pol II in plants and will help to utilize the functions of the interaction of TBP with TATA boxes in practice.
Collapse
|
3
|
Ferreira TMM, Ferreira Filho JA, Leão AP, de Sousa CAF, Souza MTJ. Structural and functional analysis of stress-inducible genes and their promoters selected from young oil palm ( Elaeis guineensis) under salt stress. BMC Genomics 2022; 23:735. [PMCID: PMC9620643 DOI: 10.1186/s12864-022-08926-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 10/04/2022] [Indexed: 11/10/2022] Open
Abstract
Background Soil salinity is a problem in more than 100 countries across all continents. It is one of the abiotic stress that threatens agriculture the most, negatively affecting crops and reducing productivity. Transcriptomics is a technology applied to characterize the transcriptome in a cell, tissue, or organism at a given time via RNA-Seq, also known as full-transcriptome shotgun sequencing. This technology allows the identification of most genes expressed at a particular stage, and different isoforms are separated and transcript expression levels measured. Once determined by this technology, the expression profile of a gene must undergo validation by another, such as quantitative real-time PCR (qRT-PCR). This study aimed to select, annotate, and validate stress-inducible genes—and their promoters—differentially expressed in the leaves of oil palm (Elaeis guineensis) plants under saline stress. Results The transcriptome analysis led to the selection of 14 genes that underwent structural and functional annotation, besides having their expression validated using the qRT-PCR technique. When compared, the RNA-Seq and qRT-PCR profiles of those genes resulted in some inconsistencies. The structural and functional annotation analysis of proteins coded by the selected genes showed that some of them are orthologs of genes reported as conferring resistance to salinity in other species. There were those coding for proteins related to the transport of salt into and out of cells, transcriptional regulatory activity, and opening and closing of stomata. The annotation analysis performed on the promoter sequence revealed 22 distinct types of cis-acting elements, and 14 of them are known to be involved in abiotic stress. Conclusion This study has helped validate the process of an accurate selection of genes responsive to salt stress with a specific and predefined expression profile and their promoter sequence. Its results also can be used in molecular-genetics-assisted breeding programs. In addition, using the identified genes is a window of opportunity for strategies trying to relieve the damages arising from the salt stress in many glycophyte crops with economic importance.
Collapse
Affiliation(s)
- Thalita Massaro Malheiros Ferreira
- grid.411269.90000 0000 8816 9513Graduate Program of Plant Biotechnology, Federal University of Lavras, 37200-000 Lavras, MG CP 3037, Brazil
| | - Jaire Alves Ferreira Filho
- grid.460200.00000 0004 0541 873XBrazilian Agricultural Research Corporation, Embrapa Agroenergy, 70770-901 Brasília, DF Brazil
| | - André Pereira Leão
- grid.460200.00000 0004 0541 873XBrazilian Agricultural Research Corporation, Embrapa Agroenergy, 70770-901 Brasília, DF Brazil
| | | | - Manoel Teixeira Jr. Souza
- grid.411269.90000 0000 8816 9513Graduate Program of Plant Biotechnology, Federal University of Lavras, 37200-000 Lavras, MG CP 3037, Brazil ,grid.460200.00000 0004 0541 873XBrazilian Agricultural Research Corporation, Embrapa Agroenergy, 70770-901 Brasília, DF Brazil
| |
Collapse
|
4
|
Ebeed HT. Genome-wide analysis of polyamine biosynthesis genes in wheat reveals gene expression specificity and involvement of STRE and MYB-elements in regulating polyamines under drought. BMC Genomics 2022; 23:734. [PMID: 36309637 PMCID: PMC9618216 DOI: 10.1186/s12864-022-08946-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 10/10/2022] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND Polyamines (PAs) are considered promising biostimulants that have diverse key roles during growth and stress responses in plants. Nevertheless, the molecular basis of these roles by PAs has not been completely realized even now, and unfortunately, the transcriptional analyses of the biosynthesis pathway in various wheat tissues have not been investigated under normal or stress conditions. In this research, the findings of genome-wide analyses of genes implicated in the PAs biosynthesis in wheat (ADC, Arginine decarboxylase; ODC, ornithine decarboxylase; AIH, agmatine iminohydrolase; NPL1, Nitrlase like protein 1; SAMDC, S-adenosylmethionine decarboxylase; SPDS, spermidine synthase; SPMS, spermine synthase and ACL5, thermospermine synthase) are shown. RESULTS In total, thirty PAs biosynthesis genes were identified. Analysis of gene structure, subcellular compartmentation and promoters were discussed. Furthermore, experimental gene expression analyses in roots, shoot axis, leaves, and spike tissues were investigated in adult wheat plants under control and drought conditions. Results revealed structural similarity within each gene family and revealed the identity of two new motifs that were conserved in SPDS, SPMS and ACL5. Analysis of the promoter elements revealed the incidence of conserved elements (STRE, CAAT-box, TATA-box, and MYB TF) in all promoters and highly conserved CREs in >80% of promoters (G-Box, ABRE, TGACG-motif, CGTCA-motif, as1, and MYC). The results of the quantification of PAs revealed higher levels of putrescine (Put) in the leaves and higher spermidine (Spd) in the other tissues. However, no spermine (Spm) was detected in the roots. Drought stress elevated Put level in the roots and the Spm in the leaves, shoots and roots, while decreased Put in spikes and elevated the total PAs levels in all tissues. Interestingly, PA biosynthesis genes showed tissue-specificity and some homoeologs of the same gene family showed differential gene expression during wheat development. Additionally, gene expression analysis showed that ODC is the Put biosynthesis path under drought stress in roots. CONCLUSION The information gained by this research offers important insights into the transcriptional regulation of PA biosynthesis in wheat that would result in more successful and consistent plant production.
Collapse
Affiliation(s)
- Heba Talat Ebeed
- Botany and Microbiology Department, Faculty of Science, Damietta University, Damietta, 34517, Egypt.
| |
Collapse
|
5
|
Bai H, Li QZ, Qi YC, Zhai YY, Jin W. The prediction of tumor and normal tissues based on the DNA methylation values of ten key sites. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2022; 1865:194841. [PMID: 35798200 DOI: 10.1016/j.bbagrm.2022.194841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Revised: 05/28/2022] [Accepted: 06/28/2022] [Indexed: 06/15/2023]
Abstract
Abnormal DNA methylation can alter the gene expression to promote or inhibit tumorigenesis in colon adenocarcinoma (COAD). However, the finding important genes and key sites of abnormal DNA methylation which result in the occurrence of COAD is still an eventful task. Here, we studied the effects of DNA methylation in the 12 types of genomic features on the changes of gene expression in COAD, the 10 important COAD-related genes and the key abnormal DNA methylation sites were identified. The effects of important genes on the prognosis were verified by survival analysis. Moreover, it was shown that the important genes were participated in cancer pathways and were hub genes in a co-expression network. Based on the DNA methylation levels in the ten sites, the least diversity increment algorithm for predicting tumor tissues and normal tissues in seventeen cancer types are proposed. The better results are obtained in jackknife test. For example, the predictive accuracies are 94.17 %, 91.28 %, 89.04 % and 88.89 %, respectively, for COAD, rectum adenocarcinoma, pancreatic adenocarcinoma and cholangiocarcinoma. Finally, by computing enrichment score of infiltrating immunocytes and the activity of immune pathways, we found that the genes are highly correlated with immune microenvironment.
Collapse
Affiliation(s)
- Hui Bai
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Qian-Zhong Li
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China; The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Inner Mongolia University, Hohhot 010070, China.
| | - Ye-Chen Qi
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Yuan-Yuan Zhai
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Wen Jin
- Inner Mongolia key laboratory of gene regulation of the metabolic disease, Department of Clinical Medical Research Center, Inner Mongolia People's Hospital, Hohhot 010010, China
| |
Collapse
|
6
|
A study of strong nucleosomes in the human genome. iScience 2022; 25:104593. [PMID: 35789840 PMCID: PMC9249913 DOI: 10.1016/j.isci.2022.104593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 03/03/2022] [Accepted: 06/08/2022] [Indexed: 11/30/2022] Open
Abstract
Micrococcal nuclease (MNase) is widely used to map nucleosomes. However, nucleosomes are highly dynamic and susceptible to experimental conditions, resulting in extreme variability across nucleosome maps, which complicates the generation of accurate nucleosome organization data. We mapped nucleosomes from different individuals using improved MNase-seq. The improvements included setting different digestion levels (low, medium, high) and naked DNA correction to remove the noise caused by experimental manipulation and comparing maps to obtain the accurate position and occupancy of strong nucleosomes (SNs) in the whole genome. In addition, the characteristics of SNs were further excavated. SNs were enriched in Alu elements and near the centromere of Chr12. SNs contain some specific sequences, and the GC content of SNs is different from that of dynamic nucleosomes. The findings suggest that nucleosome location in the genome and the DNA sequence may affect nucleosome stability. Naked DNA correction improved the accuracy of nucleosome map in partial digestion Level of MNase digestion has effects on nucleosome organization A type of strong nucleosomes (SNs) exist across different nucleosome maps Nucleosome stability may be related to its location and the DNA sequence
Collapse
|
7
|
Genome-wide analysis of the CAD gene family reveals two bona fide CAD genes in oil palm. 3 Biotech 2022; 12:149. [PMID: 35747504 PMCID: PMC9209623 DOI: 10.1007/s13205-022-03208-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Accepted: 05/21/2022] [Indexed: 11/01/2022] Open
Abstract
Cinnamyl alcohol dehydrogenase (CAD) is the key enzyme for lignin biosynthesis in plants. In this study, genome-wide analysis was performed to identify CAD genes in oil palm (Elaeis guineensis). Phylogenetic analysis was then conducted to select the bona fide EgCADs. The bona fide EgCAD genes and their respective 5' flanking regions were cloned and analysed. Their expression profiles were evaluated in various organs using RT-PCR. Seven EgCAD genes (EgCAD1-7) were identified and divided into four phylogenetic groups. EgCAD1 and EgCAD2 display high sequence similarities with other bona fide CADs and possess all the signature motifs of the bona fide CAD. They also display similar 3D protein structures. Gene expression analysis showed that EgCAD1 was expressed most abundantly in the root tissues, while EgCAD2 was expressed constitutively in all the tissues studied. EgCAD1 possesses only one transcription start site, while EgCAD2 has five. Interestingly, a TC microsatellite was found in the 5' flanking region of EgCAD2. The 5' flanking regions of EgCAD1 and EgCAD2 contain lignin-associated regulatory elements i.e. AC-elements, and other defence-related motifs, including W-box, GT-1 motif and CGTCA-motif. Altogether, these results imply that EgCAD1 and EgCAD2 are bona fide CAD involved in lignin biosynthesis during the normal development of oil palm and in response to stresses. Our findings shed some light on the roles of the bona fide CAD genes in oil palm and pave the way for manipulating lignin content in oil palm through a genetic approach. Supplementary Information The online version contains supplementary material available at 10.1007/s13205-022-03208-0.
Collapse
|
8
|
iProm-Zea: A two-layer model to identify plant promoters and their types using convolutional neural network. Genomics 2022; 114:110384. [PMID: 35533969 DOI: 10.1016/j.ygeno.2022.110384] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 04/18/2022] [Accepted: 05/02/2022] [Indexed: 01/14/2023]
Abstract
A promoter is a short DNA sequence near the start codon, responsible for initiating the transcription of a specific gene in the genome. The accurate recognition of promoters is important for achieving a better understanding of transcriptional regulation. Because of their importance in the process of biological transcriptional regulation, there is an urgent need to develop in silico tools to identify promoters and their types in a timely and accurate manner. A number of prediction methods have been developed in this regard; however, almost all of them are merely used for identifying promoters and their strength or sigma types. The TATA box region in TATA promoter influences the post-transcriptional processes; therefore, in the current study, we developed a two-layer predictor called "iProm-Zea" using the convolutional neural network (CNN) for identify TATA and TATA less promoters. The first layer can be used to identify a given DNA sequence as a promoter or non-promoter. The second layer can be used to identify whether the recognized promoter is the TATA promoter. To find an optimal feature encoding scheme and model, we employed four feature encoding schemes on different machine learning and CNN algorithms, and based on the evaluation results, we selected a one-hot encoding scheme and a CNN model for iProm-Zea. The 5-fold cross validation testing results demonstrated that the constructed predictor showed great potential for identifying promoters and classifying them as TATA and TATA less promoters. Furthermore, we performed cross-species analysis of iProm-Zea to evaluate its performance in other species. Moreover, to make it easier for other experimental scientists to obtain the results they need, we established a freely accessible and user-friendly web server at http://nsclbio.jbnu.ac.kr/tools/iProm-Zea/.
Collapse
|
9
|
Genome-Wide Prediction of Transcription Start Sites in Conifers. Int J Mol Sci 2022; 23:ijms23031735. [PMID: 35163661 PMCID: PMC8836283 DOI: 10.3390/ijms23031735] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 01/30/2022] [Accepted: 02/01/2022] [Indexed: 02/04/2023] Open
Abstract
The identification of promoters is an essential step in the genome annotation process, providing a framework for gene regulatory networks and their role in transcription regulation. Despite considerable advances in the high-throughput determination of transcription start sites (TSSs) and transcription factor binding sites (TFBSs), experimental methods are still time-consuming and expensive. Instead, several computational approaches have been developed to provide fast and reliable means for predicting the location of TSSs and regulatory motifs on a genome-wide scale. Numerous studies have been carried out on the regulatory elements of mammalian genomes, but plant promoters, especially in gymnosperms, have been left out of the limelight and, therefore, have been poorly investigated. The aim of this study was to enhance and expand the existing genome annotations using computational approaches for genome-wide prediction of TSSs in the four conifer species: loblolly pine, white spruce, Norway spruce, and Siberian larch. Our pipeline will be useful for TSS predictions in other genomes, especially for draft assemblies, where reliable TSS predictions are not usually available. We also explored some of the features of the nucleotide composition of the predicted promoters and compared the GC properties of conifer genes with model monocot and dicot plants. Here, we demonstrate that even incomplete genome assemblies and partial annotations can be a reliable starting point for TSS annotation. The results of the TSS prediction in four conifer species have been deposited in the Persephone genome browser, which allows smooth visualization and is optimized for large data sets. This work provides the initial basis for future experimental validation and the study of the regulatory regions to understand gene regulation in gymnosperms.
Collapse
|
10
|
Zhang M, Jia C, Li F, Li C, Zhu Y, Akutsu T, Webb GI, Zou Q, Coin LJM, Song J. Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction. Brief Bioinform 2022; 23:6502561. [PMID: 35021193 PMCID: PMC8921625 DOI: 10.1093/bib/bbab551] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 11/12/2021] [Accepted: 11/30/2021] [Indexed: 01/13/2023] Open
Abstract
Promoters are crucial regulatory DNA regions for gene transcriptional activation. Rapid advances in next-generation sequencing technologies have accelerated the accumulation of genome sequences, providing increased training data to inform computational approaches for both prokaryotic and eukaryotic promoter prediction. However, it remains a significant challenge to accurately identify species-specific promoter sequences using computational approaches. To advance computational support for promoter prediction, in this study, we curated 58 comprehensive, up-to-date, benchmark datasets for 7 different species (i.e. Escherichia coli, Bacillus subtilis, Homo sapiens, Mus musculus, Arabidopsis thaliana, Zea mays and Drosophila melanogaster) to assist the research community to assess the relative functionality of alternative approaches and support future research on both prokaryotic and eukaryotic promoters. We revisited 106 predictors published since 2000 for promoter identification (40 for prokaryotic promoter, 61 for eukaryotic promoter, and 5 for both). We systematically evaluated their training datasets, computational methodologies, calculated features, performance and software usability. On the basis of these benchmark datasets, we benchmarked 19 predictors with functioning webservers/local tools and assessed their prediction performance. We found that deep learning and traditional machine learning-based approaches generally outperformed scoring function-based approaches. Taken together, the curated benchmark dataset repository and the benchmarking analysis in this study serve to inform the design and implementation of computational approaches for promoter prediction and facilitate more rigorous comparison of new techniques in the future.
Collapse
Affiliation(s)
| | - Cangzhi Jia
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | | | | | | | | | - Geoffrey I Webb
- Department of Data Science and Artificial Intelligence, Monash University, Melbourne, VIC 3800, Australia,Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Quan Zou
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | - Lachlan J M Coin
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | - Jiangning Song
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| |
Collapse
|
11
|
To JPC, Davis IW, Marengo MS, Shariff A, Baublite C, Decker K, Galvão RM, Gao Z, Haragutchi O, Jung JW, Li H, O'Brien B, Sant A, Elich TD. Expression Elements Derived From Plant Sequences Provide Effective Gene Expression Regulation and New Opportunities for Plant Biotechnology Traits. FRONTIERS IN PLANT SCIENCE 2021; 12:712179. [PMID: 34745155 PMCID: PMC8569612 DOI: 10.3389/fpls.2021.712179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 09/15/2021] [Indexed: 06/13/2023]
Abstract
Plant biotechnology traits provide a means to increase crop yields, manage weeds and pests, and sustainably contribute to addressing the needs of a growing population. One of the key challenges in developing new traits for plant biotechnology is the availability of expression elements for efficacious and predictable transgene regulation. Recent advances in genomics, transcriptomics, and computational tools have enabled the generation of new expression elements in a variety of model organisms. In this study, new expression element sequences were computationally generated for use in crops, starting from native Arabidopsis and maize sequences. These elements include promoters, 5' untranslated regions (5' UTRs), introns, and 3' UTRs. The expression elements were demonstrated to drive effective transgene expression in stably transformed soybean plants across multiple tissues types and developmental stages. The expressed transcripts were characterized to demonstrate the molecular function of these expression elements. The data show that the promoters precisely initiate transcripts, the introns are effectively spliced, and the 3' UTRs enable predictable processing of transcript 3' ends. Overall, our results indicate that these new expression elements can recapitulate key functional properties of natural sequences and provide opportunities for optimizing the expression of genes in future plant biotechnology traits.
Collapse
Affiliation(s)
- Jennifer P. C. To
- Bayer Crop Science, Chesterfield, MO, United States
- GrassRoots Biotechnology, Durham, NC, United States
- Monsanto Company, Research Triangle Park, Durham, NC, United States
| | - Ian W. Davis
- Bayer Crop Science, Chesterfield, MO, United States
- GrassRoots Biotechnology, Durham, NC, United States
- Monsanto Company, Research Triangle Park, Durham, NC, United States
| | - Matthew S. Marengo
- Bayer Crop Science, Chesterfield, MO, United States
- GrassRoots Biotechnology, Durham, NC, United States
- Monsanto Company, Research Triangle Park, Durham, NC, United States
| | - Aabid Shariff
- GrassRoots Biotechnology, Durham, NC, United States
- Monsanto Company, Research Triangle Park, Durham, NC, United States
- Pairwise Plants, Durham, NC, United States
| | | | - Keith Decker
- Bayer Crop Science, Chesterfield, MO, United States
| | - Rafaelo M. Galvão
- Bayer Crop Science, Chesterfield, MO, United States
- GrassRoots Biotechnology, Durham, NC, United States
- Monsanto Company, Research Triangle Park, Durham, NC, United States
| | - Zhihuan Gao
- Bayer Crop Science, Chesterfield, MO, United States
- GrassRoots Biotechnology, Durham, NC, United States
- Monsanto Company, Research Triangle Park, Durham, NC, United States
| | - Olivia Haragutchi
- Bayer Crop Science, Chesterfield, MO, United States
- GrassRoots Biotechnology, Durham, NC, United States
- Monsanto Company, Research Triangle Park, Durham, NC, United States
| | - Jee W. Jung
- Bayer Crop Science, Chesterfield, MO, United States
- GrassRoots Biotechnology, Durham, NC, United States
- Monsanto Company, Research Triangle Park, Durham, NC, United States
- Duke University, Office for Translation and Commercialization, Durham, NC, United States
| | - Hong Li
- Bayer Crop Science, Chesterfield, MO, United States
| | - Brent O'Brien
- Bayer Crop Science, Chesterfield, MO, United States
- GrassRoots Biotechnology, Durham, NC, United States
- Monsanto Company, Research Triangle Park, Durham, NC, United States
| | - Anagha Sant
- Bayer Crop Science, Chesterfield, MO, United States
| | - Tedd D. Elich
- GrassRoots Biotechnology, Durham, NC, United States
- Monsanto Company, Research Triangle Park, Durham, NC, United States
- LifeEDIT Therapeutics, Durham, NC, United States
| |
Collapse
|
12
|
Min X, Lu F, Li C. Sequence-Based Deep Learning Frameworks on Enhancer-Promoter Interactions Prediction. Curr Pharm Des 2021; 27:1847-1855. [PMID: 33234095 DOI: 10.2174/1381612826666201124112710] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 07/29/2020] [Accepted: 08/06/2020] [Indexed: 11/22/2022]
Abstract
Enhancer-promoter interactions (EPIs) in the human genome are of great significance to transcriptional regulation, which tightly controls gene expression. Identification of EPIs can help us better decipher gene regulation and understand disease mechanisms. However, experimental methods to identify EPIs are constrained by funds, time, and manpower, while computational methods using DNA sequences and genomic features are viable alternatives. Deep learning methods have shown promising prospects in classification and efforts that have been utilized to identify EPIs. In this survey, we specifically focus on sequence-based deep learning methods and conduct a comprehensive review of the literature. First, we briefly introduce existing sequence- based frameworks on EPIs prediction and their technique details. After that, we elaborate on the dataset, pre-processing means, and evaluation strategies. Finally, we concluded with the challenges these methods are confronted with and suggest several future opportunities. We hope this review will provide a useful reference for further studies on enhancer-promoter interactions.
Collapse
Affiliation(s)
- Xiaoping Min
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Fengqing Lu
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Chunyan Li
- Graduate School, Yunnan Minzu University, Kunming 650504, China
| |
Collapse
|
13
|
Hata T, Satoh S, Takada N, Matsuo M, Obokata J. Kozak Sequence Acts as a Negative Regulator for De Novo Transcription Initiation of Newborn Coding Sequences in the Plant Genome. Mol Biol Evol 2021; 38:2791-2803. [PMID: 33705557 PMCID: PMC8233501 DOI: 10.1093/molbev/msab069] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The manner in which newborn coding sequences and their transcriptional competency emerge during the process of gene evolution remains unclear. Here, we experimentally simulated eukaryotic gene origination processes by mimicking horizontal gene transfer events in the plant genome. We mapped the precise position of the transcription start sites (TSSs) of hundreds of newly introduced promoterless firefly luciferase (LUC) coding sequences in the genome of Arabidopsis thaliana cultured cells. The systematic characterization of the LUC-TSSs revealed that 80% of them occurred under the influence of endogenous promoters, while the remainder underwent de novo activation in the intergenic regions, starting from pyrimidine-purine dinucleotides. These de novo TSSs obeyed unexpected rules; they predominantly occurred ∼100 bp upstream of the LUC inserts and did not overlap with Kozak-containing putative open reading frames (ORFs). These features were the output of the immediate responses to the sequence insertions, rather than a bias in the screening of the LUC gene function. Regarding the wild-type genic TSSs, they appeared to have evolved to lack any ORFs in their vicinities. Therefore, the repulsion by the de novo TSSs of Kozak-containing ORFs described above might be the first selection gate for the occurrence and evolution of TSSs in the plant genome. Based on these results, we characterized the de novo type of TSS identified in the plant genome and discuss its significance in genome evolution.
Collapse
Affiliation(s)
- Takayuki Hata
- Graduate School of Life and Environmental Sciences, Kyoto Prefectural University, Sakyo-ku, Kyoto, Kyoto, Japan
- Faculty of Agriculture, Setsunan University, Hirakata, Osaka, Japan
| | - Soichirou Satoh
- Graduate School of Life and Environmental Sciences, Kyoto Prefectural University, Sakyo-ku, Kyoto, Kyoto, Japan
| | - Naoto Takada
- Graduate School of Life and Environmental Sciences, Kyoto Prefectural University, Sakyo-ku, Kyoto, Kyoto, Japan
| | - Mitsuhiro Matsuo
- Faculty of Agriculture, Setsunan University, Hirakata, Osaka, Japan
| | - Junichi Obokata
- Faculty of Agriculture, Setsunan University, Hirakata, Osaka, Japan
| |
Collapse
|
14
|
iPTT(2 L)-CNN: A Two-Layer Predictor for Identifying Promoters and Their Types in Plant Genomes by Convolutional Neural Network. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:6636350. [PMID: 33488763 PMCID: PMC7803414 DOI: 10.1155/2021/6636350] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 12/13/2020] [Accepted: 12/16/2020] [Indexed: 11/18/2022]
Abstract
A promoter is a short DNA sequence near to the start codon, responsible for initiating transcription of a specific gene in genome. The accurate recognition of promoters has great significance for a better understanding of the transcriptional regulation. Because of their importance in the process of biological transcriptional regulation, there is an urgent need to develop in silico tools to identify promoters and their types timely and accurately. A number of prediction methods had been developed in this regard; however, almost all of them were merely used for identifying promoters and their strength or sigma types. Owing to that TATA box region in TATA promoter that influences posttranscriptional processes, in the current study, we developed a two-layer predictor called iPTT(2L)-CNN by using the convolutional neural network (CNN) for identifying TATA and TATA-less promoters. The first layer can be used to identify a given DNA sequence as a promoter or nonpromoter. The second layer is used to identify whether the recognized promoter is TATA promoter or not. The 5-fold crossvalidation and independent testing results demonstrate that the constructed predictor is promising for identifying promoter and classifying TATA and TATA-less promoter. Furthermore, to make it easier for most experimental scientists get the results they need, a user-friendly web server has been established at http://www.jci-bioinfo.cn/iPPT(2L)-CNN.
Collapse
|
15
|
Identification and Functional Characterization of a Soybean ( Glycine max) Thioesterase that Acts on Intermediates of Fatty Acid Biosynthesis. PLANTS 2019; 8:plants8100397. [PMID: 31597241 PMCID: PMC6843456 DOI: 10.3390/plants8100397] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 09/21/2019] [Accepted: 10/02/2019] [Indexed: 11/16/2022]
Abstract
(1) Background: Plants possess many acyl-acyl carrier protein (acyl-ACP) thioesterases (TEs) with unique specificity. One such TE is methylketone synthase 2 (MKS2), an enzyme with a single-hotdog-fold structure found in several tomato species that hydrolyzes 3-ketoacyl-ACPs to give free 3-ketoacids. (2) Methods: In this study, we identified and characterized a tomato MKS2 homolog gene, namely, GmMKS2, in the genome of soybean (Glycine max). (3) Results: GmMKS2 underwent alternative splicing to produce three alternative transcripts, but only one encodes a protein with thioesterase activity when recombinantly expressed in Escherichia coli. Heterologous expression of the main transcript of GmMKS2, GmMKS2-X2, in E. coli generated various types of fatty acids, including 3-ketoacids-with 3-ketotetradecenoic acid (14:1) being the most abundant-cis-Δ5-dodecanoic acid, and 3-hydroxyacids, suggesting that GmMKS2 acts as an acyl-ACP thioesterase. In plants, the GmMKS2-X2 transcript level was found to be higher in the roots compared to other examined organs. In silico analysis revealed that there is a substantial enrichment of putative cis-regulatory elements related to disease-resistance responses and abiotic stress responses in the promoter of this gene. (4) Conclusions: GmMKS2 showed broad substrate specificities toward a wide range of acyl-ACPs that varied in terms of chain length, oxidation state, and saturation degree. Our results suggest that GmMKS2 might have a stress-related physiological function in G. max.
Collapse
|
16
|
Rantasalo A, Landowski CP, Kuivanen J, Korppoo A, Reuter L, Koivistoinen O, Valkonen M, Penttilä M, Jäntti J, Mojzita D. A universal gene expression system for fungi. Nucleic Acids Res 2019; 46:e111. [PMID: 29924368 PMCID: PMC6182139 DOI: 10.1093/nar/gky558] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Accepted: 06/07/2018] [Indexed: 12/02/2022] Open
Abstract
Biotechnological production of fuels, chemicals and proteins is dependent on efficient production systems, typically genetically engineered microorganisms. New genome editing methods are making it increasingly easy to introduce new genes and functionalities in a broad range of organisms. However, engineering of all these organisms is hampered by the lack of suitable gene expression tools. Here, we describe a synthetic expression system (SES) that is functional in a broad spectrum of fungal species without the need for host-dependent optimization. The SES consists of two expression cassettes, the first providing a weak, but constitutive level of a synthetic transcription factor (sTF), and the second enabling strong, at will tunable expression of the target gene via an sTF-dependent promoter. We validated the SES functionality in six yeast and two filamentous fungi species in which high (levels beyond organism-specific promoters) as well as adjustable expression levels of heterologous and native genes was demonstrated. The SES is an unprecedentedly broadly functional gene expression regulation method that enables significantly improved engineering of fungi. Importantly, the SES system makes it possible to take in use novel eukaryotic microbes for basic research and various biotechnological applications.
Collapse
Affiliation(s)
- Anssi Rantasalo
- VTT Technical Research Centre of Finland, Espoo, P.O. Box 1000, FI-02044 VTT, Finland
| | | | - Joosu Kuivanen
- VTT Technical Research Centre of Finland, Espoo, P.O. Box 1000, FI-02044 VTT, Finland
| | - Annakarin Korppoo
- VTT Technical Research Centre of Finland, Espoo, P.O. Box 1000, FI-02044 VTT, Finland
| | - Lauri Reuter
- VTT Technical Research Centre of Finland, Espoo, P.O. Box 1000, FI-02044 VTT, Finland
| | - Outi Koivistoinen
- VTT Technical Research Centre of Finland, Espoo, P.O. Box 1000, FI-02044 VTT, Finland
| | - Mari Valkonen
- VTT Technical Research Centre of Finland, Espoo, P.O. Box 1000, FI-02044 VTT, Finland
| | - Merja Penttilä
- VTT Technical Research Centre of Finland, Espoo, P.O. Box 1000, FI-02044 VTT, Finland
| | - Jussi Jäntti
- VTT Technical Research Centre of Finland, Espoo, P.O. Box 1000, FI-02044 VTT, Finland
| | - Dominik Mojzita
- VTT Technical Research Centre of Finland, Espoo, P.O. Box 1000, FI-02044 VTT, Finland
| |
Collapse
|
17
|
Chaudhary S, Jabre I, Reddy ASN, Staiger D, Syed NH. Perspective on Alternative Splicing and Proteome Complexity in Plants. TRENDS IN PLANT SCIENCE 2019; 24:496-506. [PMID: 30852095 DOI: 10.1016/j.tplants.2019.02.006] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Revised: 01/28/2019] [Accepted: 02/08/2019] [Indexed: 05/02/2023]
Abstract
Alternative splicing (AS) generates multiple transcripts from the same gene, however, AS contribution to proteome complexity remains elusive in plants. AS is prevalent under stress conditions in plants, but it is counterintuitive why plants would invest in protein synthesis under declining energy supply. We propose that plants employ AS not only to potentially increasing proteomic complexity, but also to buffer against the stress-responsive transcriptome to reduce the metabolic cost of translating all AS transcripts. To maximise efficiency under stress, plants may make fewer proteins with disordered domains via AS to diversify substrate specificity and maintain sufficient regulatory capacity. Furthermore, we suggest that chromatin state-dependent AS engenders short/long-term stress memory to mediate reproducible transcriptional response in the future.
Collapse
Affiliation(s)
- Saurabh Chaudhary
- School of Human and Life Sciences, Canterbury Christ Church University, Canterbury, CT1 1QU, UK; These authors contributed equally to this work
| | - Ibtissam Jabre
- School of Human and Life Sciences, Canterbury Christ Church University, Canterbury, CT1 1QU, UK; These authors contributed equally to this work
| | - Anireddy S N Reddy
- Department of Biology and Program in Cell and Molecular Biology, Colorado State University, Fort Collins, CO 80523-1878, USA
| | - Dorothee Staiger
- RNA Biology and Molecular Physiology, Faculty of Biology, Bielefeld University, Bielefeld, Germany
| | - Naeem H Syed
- School of Human and Life Sciences, Canterbury Christ Church University, Canterbury, CT1 1QU, UK.
| |
Collapse
|
18
|
Liu G, Liu GJ, Tan JX, Lin H. DNA physical properties outperform sequence compositional information in classifying nucleosome-enriched and -depleted regions. Genomics 2018; 111:1167-1175. [PMID: 30055231 DOI: 10.1016/j.ygeno.2018.07.013] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Revised: 07/07/2018] [Accepted: 07/15/2018] [Indexed: 12/15/2022]
Abstract
The nucleosome is the fundamental structural unit of eukaryotic chromatin and plays an essential role in the epigenetic regulation of cellular processes, such as DNA replication, recombination, and transcription. Hence, it is important to identify nucleosome positions in the genome. Our previous model based on DNA deformation energy, in which a set of DNA physical descriptors was used, performed well in predicting nucleosome dyad positions and occupancy. In this study, we established a machine-learning model for predicting nucleosome occupancy in order to further verify the physical descriptors. Results showed that (1) our model outperformed several other sequence compositional information-based models, indicating a stronger dependence of nucleosome positioning on DNA physical properties; (2) nucleosome-enriched and -depleted regions have distinct features in terms of DNA physical descriptors like sequence-dependent flexibility and equilibrium structure parameters; (3) gene transcription start sites and termination sites can be well characterized with the distribution patterns of the physical descriptors, indicating the regulatory role of DNA physical properties in gene transcription. In addition, we developed a web server for the model, which is freely accessible at http://lin-group.cn/server/iNuc-force/.
Collapse
Affiliation(s)
- Guoqing Liu
- The School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou 014010, China.
| | - Guo-Jun Liu
- School of Natural Sciences and Mathematics, Ural Federal University, Ekaterinburg 620000, Russia
| | - Jiu-Xin Tan
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lin
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
19
|
Zhang Q, Wang S, Pan Y, Su D, Lu Q, Zuo Y, Yang L. Characterization of proteins in different subcellular localizations for Escherichia coli K12. Genomics 2018; 111:1134-1141. [PMID: 30026105 DOI: 10.1016/j.ygeno.2018.07.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2018] [Revised: 07/07/2018] [Accepted: 07/11/2018] [Indexed: 10/28/2022]
Abstract
Knowing the comprehensive knowledge about the protein subcellular localization is an important step to understand the function of the proteins. Recent advances in system biology have allowed us to develop more accurate methods for characterizing the proteins at subcellular localization level. In this study, the analysis method was developed to characterize the topological properties and biological properties of the cytoplasmic proteins, inner membrane proteins, outer membrane proteins and periplasmic proteins in Escherichia coli (E. coli). Statistical significant differences were found in all topological properties and biological properties among proteins in different subcellular localizations. In addition, investigation was carried out to analyze the differences in 20 amino acid compositions for four protein categories. We also found that there were significant differences in all of the 20 amino acid compositions. These findings may be helpful for understanding the comprehensive relationship between protein subcellular localization and biological function.
Collapse
Affiliation(s)
- Qi Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yi Pan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Qianzi Lu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yongchun Zuo
- The State key Laboratory of Reproductive Regulation, Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China.
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| |
Collapse
|
20
|
Kaur A, Pati PK, Pati AM, Nagpal AK. In-silico analysis of cis-acting regulatory elements of pathogenesis-related proteins of Arabidopsis thaliana and Oryza sativa. PLoS One 2017; 12:e0184523. [PMID: 28910327 PMCID: PMC5598985 DOI: 10.1371/journal.pone.0184523] [Citation(s) in RCA: 91] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 08/27/2017] [Indexed: 01/24/2023] Open
Abstract
Pathogenesis related (PR) proteins are low molecular weight family of proteins induced in plants under various biotic and abiotic stresses. They play an important role in plant-defense mechanism. PRs have wide range of functions, acting as hydrolases, peroxidases, chitinases, anti-fungal, protease inhibitors etc. In the present study, an attempt has been made to analyze promoter regions of PR1, PR2, PR5, PR9, PR10 and PR12 of Arabidopsis thaliana and Oryza sativa. Analysis of cis-element distribution revealed the functional multiplicity of PRs and provides insight into the gene regulation. CpG islands are observed only in rice PRs, which indicates that monocot genome contains more GC rich motifs than dicots. Tandem repeats were also observed in 5' UTR of PR genes. Thus, the present study provides an understanding of regulation of PR genes and their versatile roles in plants.
Collapse
Affiliation(s)
- Amritpreet Kaur
- Department of Botanical and Environmental sciences, Guru Nanak Dev University, Amritsar, Punjab, India
| | - Pratap Kumar Pati
- Department of Biotechnology, Guru Nanak Dev University, Amritsar, Punjab, India
| | - Aparna Maitra Pati
- Planning Project Monitoring and Evaluation Cell, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
| | - Avinash Kaur Nagpal
- Department of Botanical and Environmental sciences, Guru Nanak Dev University, Amritsar, Punjab, India
| |
Collapse
|
21
|
Shahmuradov IA, Umarov RK, Solovyev VV. TSSPlant: a new tool for prediction of plant Pol II promoters. Nucleic Acids Res 2017; 45:e65. [PMID: 28082394 PMCID: PMC5416875 DOI: 10.1093/nar/gkw1353] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2016] [Revised: 12/16/2016] [Accepted: 12/27/2016] [Indexed: 11/22/2022] Open
Abstract
Our current knowledge of eukaryotic promoters indicates their complex architecture that is often composed of numerous functional motifs. Most of known promoters include multiple and in some cases mutually exclusive transcription start sites (TSSs). Moreover, TSS selection depends on cell/tissue, development stage and environmental conditions. Such complex promoter structures make their computational identification notoriously difficult. Here, we present TSSPlant, a novel tool that predicts both TATA and TATA-less promoters in sequences of a wide spectrum of plant genomes. The tool was developed by using large promoter collections from ppdb and PlantProm DB. It utilizes eighteen significant compositional and signal features of plant promoter sequences selected in this study, that feed the artificial neural network-based model trained by the backpropagation algorithm. TSSPlant achieves significantly higher accuracy compared to the next best promoter prediction program for both TATA promoters (MCC≃0.84 and F1-score≃0.91 versus MCC≃0.51 and F1-score≃0.71) and TATA-less promoters (MCC≃0.80, F1-score≃0.89 versus MCC≃0.29 and F1-score≃0.50). TSSPlant is available to download as a standalone program at http://www.cbrc.kaust.edu.sa/download/.
Collapse
Affiliation(s)
- Ilham A. Shahmuradov
- King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
- Institue of Molecular Biology and Biotechnologies, ANAS, 2 Matbuat strasse, Baku AZ1073, Azerbaijan
| | - Ramzan Kh. Umarov
- King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | | |
Collapse
|
22
|
Tatarinova TV, Chekalin E, Nikolsky Y, Bruskin S, Chebotarov D, McNally KL, Alexandrov N. Nucleotide diversity analysis highlights functionally important genomic regions. Sci Rep 2016; 6:35730. [PMID: 27774999 PMCID: PMC5075931 DOI: 10.1038/srep35730] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 09/30/2016] [Indexed: 12/15/2022] Open
Abstract
We analyzed functionality and relative distribution of genetic variants across the complete Oryza sativa genome, using the 40 million single nucleotide polymorphisms (SNPs) dataset from the 3,000 Rice Genomes Project (http://snp-seek.irri.org), the largest and highest density SNP collection for any higher plant. We have shown that the DNA-binding transcription factors (TFs) are the most conserved group of genes, whereas kinases and membrane-localized transporters are the most variable ones. TFs may be conserved because they belong to some of the most connected regulatory hubs that modulate transcription of vast downstream gene networks, whereas signaling kinases and transporters need to adapt rapidly to changing environmental conditions. In general, the observed profound patterns of nucleotide variability reveal functionally important genomic regions. As expected, nucleotide diversity is much higher in intergenic regions than within gene bodies (regions spanning gene models), and protein-coding sequences are more conserved than untranslated gene regions. We have observed a sharp decline in nucleotide diversity that begins at about 250 nucleotides upstream of the transcription start and reaches minimal diversity exactly at the transcription start. We found the transcription termination sites to have remarkably symmetrical patterns of SNP density, implying presence of functional sites near transcription termination. Also, nucleotide diversity was significantly lower near 3′ UTRs, the area rich with regulatory regions.
Collapse
Affiliation(s)
- Tatiana V Tatarinova
- Center for Personalized Medicine and Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA.,Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russian Federation
| | | | - Yuri Nikolsky
- Vavilov Institute of General Genetics, Moscow, Russia.,F1 Genomics, San Diego, CA, USA.,School of Systems Biology, George Mason University, VA, USA
| | | | - Dmitry Chebotarov
- International Rice Research Institute, Los Baños, Laguna 4031, Philippines
| | - Kenneth L McNally
- International Rice Research Institute, Los Baños, Laguna 4031, Philippines
| | | |
Collapse
|
23
|
Nascent RNA sequencing reveals distinct features in plant transcription. Proc Natl Acad Sci U S A 2016; 113:12316-12321. [PMID: 27729530 DOI: 10.1073/pnas.1603217113] [Citation(s) in RCA: 118] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Transcriptional regulation of gene expression is a major mechanism used by plants to confer phenotypic plasticity, and yet compared with other eukaryotes or bacteria, little is known about the design principles. We generated an extensive catalog of nascent and steady-state transcripts in Arabidopsis thaliana seedlings using global nuclear run-on sequencing (GRO-seq), 5'GRO-seq, and RNA-seq and reanalyzed published maize data to capture characteristics of plant transcription. De novo annotation of nascent transcripts accurately mapped start sites and unstable transcripts. Examining the promoters of coding and noncoding transcripts identified comparable chromatin signatures, a conserved "TGT" core promoter motif and unreported transcription factor-binding sites. Mapping of engaged RNA polymerases showed a lack of enhancer RNAs, promoter-proximal pausing, and divergent transcription in Arabidopsis seedlings and maize, which are commonly present in yeast and humans. In contrast, Arabidopsis and maize genes accumulate RNA polymerases in proximity of the polyadenylation site, a trend that coincided with longer genes and CpG hypomethylation. Lack of promoter-proximal pausing and a higher correlation of nascent and steady-state transcripts indicate Arabidopsis may regulate transcription predominantly at the level of initiation. Our findings provide insight into plant transcription and eukaryotic gene expression as a whole.
Collapse
|
24
|
Su WX, Li QZ, Zhang LQ, Fan GL, Wu CY, Yan ZH, Zuo YC. Gene expression classification using epigenetic features and DNA sequence composition in the human embryonic stem cell line H1. Gene 2016; 592:227-234. [DOI: 10.1016/j.gene.2016.07.059] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Revised: 06/20/2016] [Accepted: 07/23/2016] [Indexed: 01/01/2023]
|
25
|
Lis M, Walther D. The orientation of transcription factor binding site motifs in gene promoter regions: does it matter? BMC Genomics 2016; 17:185. [PMID: 26939991 PMCID: PMC4778318 DOI: 10.1186/s12864-016-2549-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Accepted: 02/27/2016] [Indexed: 12/23/2022] Open
Abstract
Background Gene expression is to large degree regulated by the specific binding of protein transcription factors to cis-regulatory transcription factor binding sites in gene promoter regions. Despite the identification of hundreds of binding site sequence motifs, the question as to whether motif orientation matters with regard to the gene expression regulation of the respective downstream genes appears surprisingly underinvestigated. Results We pursued a statistical approach by probing 293 reported non-palindromic transcription factor binding site and ten core promoter motifs in Arabidopsis thaliana for evidence of any relevance of motif orientation based on mapping statistics and effects on the co-regulation of gene expression of the respective downstream genes. Although positional intervals closer to the transcription start site (TSS) were found with increased frequencies of motifs exhibiting orientation preference, a corresponding effect with regard to gene expression regulation as evidenced by increased co-expression of genes harboring the favored orientation in their upstream sequence could not be established. Furthermore, we identified an intrinsic orientational asymmetry of sequence regions close to the TSS as the likely source of the identified motif orientation preferences. By contrast, motif presence irrespective of orientation was found associated with pronounced effects on gene expression co-regulation validating the pursued approach. Inspecting motif pairs revealed statistically preferred orientational arrangements, but no consistent effect with regard to arrangement-dependent gene expression regulation was evident. Conclusions Our results suggest that for the motifs considered here, either no specific orientation rendering them functional across all their instances exists with orientational requirements instead depending on gene-locus specific additional factors, or that the binding orientation of transcription factors may generally not be relevant, but rather the event of binding itself. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2549-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Monika Lis
- Max Planck Institute for Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam-Golm, Germany.
| | - Dirk Walther
- Max Planck Institute for Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam-Golm, Germany.
| |
Collapse
|
26
|
Xie KB, Zhou X, Zhang TH, Zhang BL, Chen LM, Chen GX. Systematic discovery and characterization of stress-related microRNA genes in Oryza sativa. Biologia (Bratisl) 2015. [DOI: 10.1515/biolog-2015-0001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
27
|
Yang L, Wang J, Lv Y, Hao D, Zuo Y, Li X, Jiang W. Characterization of TATA-containing genes and TATA-less genes in S. cerevisiae by network topologies and biological properties. Genomics 2014; 104:562-71. [DOI: 10.1016/j.ygeno.2014.10.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2014] [Revised: 10/01/2014] [Accepted: 10/04/2014] [Indexed: 01/11/2023]
|
28
|
Prediction of DNase I hypersensitive sites by using pseudo nucleotide compositions. ScientificWorldJournal 2014; 2014:740506. [PMID: 25215331 PMCID: PMC4152949 DOI: 10.1155/2014/740506] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2014] [Accepted: 08/03/2014] [Indexed: 11/19/2022] Open
Abstract
DNase I hypersensitive sites (DHS) associated with a wide variety of regulatory DNA elements. Knowledge about the locations of DHS is helpful for deciphering the function of noncoding genomic regions. With the acceleration of genome sequences in the postgenomic age, it is highly desired to develop cost-effective computational methods to identify DHS. In the present work, a support vector machine based model was proposed to identify DHS by using the pseudo dinucleotide composition. In the jackknife test, the proposed model obtained an accuracy of 83%, which is competitive with that of the existing method. This result suggests that the proposed model may become a useful tool for DHS identifications.
Collapse
|
29
|
Variation and constraints in species-specific promoter sequences. J Theor Biol 2014; 363:357-66. [PMID: 25149367 DOI: 10.1016/j.jtbi.2014.08.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2014] [Revised: 07/30/2014] [Accepted: 08/04/2014] [Indexed: 11/24/2022]
Abstract
A vast literature is nowadays devoted to the search of correlations between transcription related functions and the composition of sequences upstream the Transcription Start Site. Little is known about the possible functional effects of nucleotide distributions on the conformational landscape of DNA in such regions. We have used suitable statistical indicators for identifying sequences that may play an important role in regulating transcription processes. In particular, we have analyzed base composition, periodicity and information content in sets of aligned promoters clustered according to functional information in order to obtain an insight on the main structural differences between promoters regulating genes with different functions. Our results show that when we select promoters according to some biological information, in a single species, at least in vertebrates, we observe structurally different classes of sequences. The highly variable and differentiated gene expression patterns may explain the great extent of structural differentiation observed in complex organisms. In fact, despite our analysis is focused on Homo sapiens, we provide also a comparison with other species, selected at different positions in the phylogenetic tree.
Collapse
|
30
|
Zuo Y, Zhang P, Liu L, Li T, Peng Y, Li G, Li Q. Sequence-specific flexibility organization of splicing flanking sequence and prediction of splice sites in the human genome. Chromosome Res 2014; 22:321-34. [PMID: 24728765 DOI: 10.1007/s10577-014-9414-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 03/24/2014] [Accepted: 03/26/2014] [Indexed: 12/15/2022]
Abstract
More and more reported results of nucleosome positioning and histone modifications showed that DNA structure play a well-established role in splicing. In this study, a set of DNA geometric flexibility parameters originated from molecular dynamics (MD) simulations were introduced to discuss the structure organization around splice sites at the DNA level. The obtained profiles of specific flexibility/stiffness around splice sites indicated that the DNA physical-geometry deformation could be used as an alternative way to describe the splicing junction region. In combination with structural flexibility as discriminatory parameter, we developed a hybrid computational model for predicting potential splicing sites. And the better prediction performance was achieved when the benchmark dataset evaluated. Our results showed that the mechanical deformability character of a splice junction is closely correlated with both the splice site strength and structural information in its flanking sequences.
Collapse
Affiliation(s)
- Yongchun Zuo
- The Key Laboratory of National Education Ministry for Mammalian Reproductive Biology and Biotechnology, Inner Mongolia University, Hohhot, 010021, China,
| | | | | | | | | | | | | |
Collapse
|
31
|
Zuo YC, Chen W, Fan GL, Li QZ. A similarity distance of diversity measure for discriminating mesophilic and thermophilic proteins. Amino Acids 2012; 44:573-80. [PMID: 22851052 DOI: 10.1007/s00726-012-1374-z] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2011] [Accepted: 07/17/2012] [Indexed: 11/25/2022]
Abstract
The successful prediction of thermophilic proteins is useful for designing stable enzymes that are functional at high temperature. We have used the increment of diversity (ID), a novel amino acid composition-based similarity distance, in a 2-class K-nearest neighbor classifier to classify thermophilic and mesophilic proteins. And the KNN-ID classifier was successfully developed to predict the thermophilic proteins. Instead of extracting features from protein sequences as done previously, our approach was based on a diversity measure of symbol sequences. The similarity distance between each pair of protein sequences was first calculated to quantitatively measure the similarity level of one given sequence and the other. The query protein is then determined using the K-nearest neighbor algorithm. Comparisons with multiple recently published methods showed that the KNN-ID proposed in this study outperforms the other methods. The improved predictive performance indicated it is a simple and effective classifier for discriminating thermophilic and mesophilic proteins. At last, the influence of protein length and protein identity on prediction accuracy was discussed further. The prediction model and dataset used in this article can be freely downloaded from http://wlxy.imu.edu.cn/college/biostation/fuwu/KNN-ID/index.htm .
Collapse
Affiliation(s)
- Yong-Chun Zuo
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China.
| | | | | | | |
Collapse
|
32
|
Predicting nucleosome binding motif set and analyzing their distributions around functional sites of human genes. Chromosome Res 2012; 20:685-98. [DOI: 10.1007/s10577-012-9305-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2012] [Revised: 07/13/2012] [Accepted: 07/17/2012] [Indexed: 01/30/2023]
|
33
|
Wiludda C, Schulze S, Gowik U, Engelmann S, Koczor M, Streubel M, Bauwe H, Westhoff P. Regulation of the photorespiratory GLDPA gene in C(4) flaveria: an intricate interplay of transcriptional and posttranscriptional processes. THE PLANT CELL 2012; 24:137-51. [PMID: 22294620 PMCID: PMC3289567 DOI: 10.1105/tpc.111.093872] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2011] [Revised: 12/23/2011] [Accepted: 01/12/2012] [Indexed: 05/05/2023]
Abstract
The mitochondrial Gly decarboxylase complex (GDC) is a key component of the photorespiratory pathway that occurs in all photosynthetically active tissues of C(3) plants but is restricted to bundle sheath cells in C(4) species. GDC is also required for general cellular C(1) metabolism. In the Asteracean C(4) species Flaveria trinervia, a single functional GLDP gene, GLDPA, encodes the P-subunit of GDC, a decarboxylating Gly dehydrogenase. GLDPA promoter reporter gene fusion studies revealed that this promoter is active in bundle sheath cells and the vasculature of transgenic Flaveria bidentis (C(4)) and the Brassicacean C(3) species Arabidopsis thaliana, suggesting the existence of an evolutionarily conserved gene regulatory system in the bundle sheath. Here, we demonstrate that GLDPA gene regulation is achieved by an intricate interplay of transcriptional and posttranscriptional mechanisms. The GLDPA promoter is composed of two tandem promoters, P(R2) and P(R7), that together ensure a strong bundle sheath expression. While the proximal promoter (P(R7)) is active in the bundle sheath and vasculature, the distal promoter (P(R2)) drives uniform expression in all leaf chlorenchyma cells and the vasculature. An intron in the 5' untranslated leader of P(R2)-derived transcripts is inefficiently spliced and apparently suppresses the output of P(R2) by eliciting RNA decay.
Collapse
Affiliation(s)
- Christian Wiludda
- Heinrich-Heine-Universität Düsseldorf, Institut für Entwicklungs- und Molekularbiologie der Pflanzen, 40225 Duesseldorf, Germany
| | - Stefanie Schulze
- Heinrich-Heine-Universität Düsseldorf, Institut für Entwicklungs- und Molekularbiologie der Pflanzen, 40225 Duesseldorf, Germany
| | - Udo Gowik
- Heinrich-Heine-Universität Düsseldorf, Institut für Entwicklungs- und Molekularbiologie der Pflanzen, 40225 Duesseldorf, Germany
| | - Sascha Engelmann
- Heinrich-Heine-Universität Düsseldorf, Institut für Entwicklungs- und Molekularbiologie der Pflanzen, 40225 Duesseldorf, Germany
| | - Maria Koczor
- Heinrich-Heine-Universität Düsseldorf, Institut für Entwicklungs- und Molekularbiologie der Pflanzen, 40225 Duesseldorf, Germany
| | - Monika Streubel
- Heinrich-Heine-Universität Düsseldorf, Institut für Entwicklungs- und Molekularbiologie der Pflanzen, 40225 Duesseldorf, Germany
| | - Hermann Bauwe
- Universität Rostock, Abteilung Pflanzenphysiologie, 18059 Rostock, Germany
| | - Peter Westhoff
- Heinrich-Heine-Universität Düsseldorf, Institut für Entwicklungs- und Molekularbiologie der Pflanzen, 40225 Duesseldorf, Germany
| |
Collapse
|
34
|
Xing Y, Zhao X, Cai L. Prediction of nucleosome occupancy in Saccharomyces cerevisiae using position-correlation scoring function. Genomics 2011; 98:359-66. [DOI: 10.1016/j.ygeno.2011.07.008] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2011] [Revised: 07/16/2011] [Accepted: 07/26/2011] [Indexed: 10/17/2022]
|