1
|
Ye W, Lian Q, Ye C, Wu X. A Survey on Methods for Predicting Polyadenylation Sites from DNA Sequences, Bulk RNA-seq, and Single-cell RNA-seq. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022:S1672-0229(22)00121-8. [PMID: 36167284 PMCID: PMC10372920 DOI: 10.1016/j.gpb.2022.09.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 08/17/2022] [Accepted: 09/19/2022] [Indexed: 05/08/2023]
Abstract
Alternative polyadenylation (APA) plays important roles in modulating mRNA stability, translation, and subcellular localization, and contributes extensively to shaping eukaryotic transcriptome complexity and proteome diversity. Identification of poly(A) sites (pAs) on a genome-wide scale is a critical step toward understanding the underlying mechanism of APA-mediated gene regulation. A number of established computational tools have been proposed to predict pAs from diverse genomic data. Here we provided an exhaustive overview of computational approaches for predicting pAs from DNA sequences, bulk RNA sequencing (RNA-seq) data, and single-cell RNA sequencing (scRNA-seq) data. Particularly, we examined several representative tools using bulk RNA-seq and scRNA-seq data from peripheral blood mononuclear cells and put forward operable suggestions on how to assess the reliability of pAs predicted by different tools. We also proposed practical guidelines on choosing appropriate methods applicable to diverse scenarios. Moreover, we discussed in depth the challenges in improving the performance of pA prediction and benchmarking different methods. Additionally, we highlighted outstanding challenges and opportunities using new machine learning and integrative multi-omics techniques, and provided our perspective on how computational methodologies might evolve in the future for non-3' untranslated region, tissue-specific, cross-species, and single-cell pA prediction.
Collapse
Affiliation(s)
- Wenbin Ye
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
| | - Qiwei Lian
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China; Department of Automation, Xiamen University, Xiamen 361005, China
| | - Congting Ye
- Key Laboratory of the Coastal and Wetland Ecosystems, Ministry of Education, College of the Environment and Ecology, Xiamen University, Xiamen 361005, China
| | - Xiaohui Wu
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China.
| |
Collapse
|
2
|
Jankovic B, Gojobori T. From shallow to deep: some lessons learned from application of machine learning for recognition of functional genomic elements in human genome. Hum Genomics 2022; 16:7. [PMID: 35180894 PMCID: PMC8855580 DOI: 10.1186/s40246-022-00376-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 01/02/2022] [Indexed: 11/25/2022] Open
Abstract
Identification of genomic signals as indicators for functional genomic elements is one of the areas that received early and widespread application of machine learning methods. With time, the methods applied grew in variety and generally exhibited a tendency to improve their ability to identify some major genomic and transcriptomics signals. The evolution of machine learning in genomics followed a similar path to applications of machine learning in other fields. These were impacted in a major way by three dominant developments, namely an enormous increase in availability and quality of data, a significant increase in computational power available to machine learning applications, and finally, new machine learning paradigms, of which deep learning is the most well-known example. It is not easy in general to distinguish factors leading to improvements in results of applications of machine learning. This is even more so in the field of genomics, where the advent of next-generation sequencing and the increased ability to perform functional analysis of raw data have had a major effect on the applicability of machine learning in OMICS fields. In this paper, we survey the results from a subset of published work in application of machine learning in the recognition of genomic signals and regions in human genome and summarize some lessons learnt from this endeavor. There is no doubt that a significant progress has been made both in terms of accuracy and reliability of models. Questions remain however whether the progress has been sufficient and what these developments bring to the field of genomics in general and human genomics in particular. Improving usability, interpretability and accuracy of models remains an important open challenge for current and future research in application of machine learning and more generally of artificial intelligence methods in genomics.
Collapse
Affiliation(s)
- Boris Jankovic
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Takashi Gojobori
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia. .,Division of Biological and Environmental Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.
| |
Collapse
|
3
|
Caudai C, Galizia A, Geraci F, Le Pera L, Morea V, Salerno E, Via A, Colombo T. AI applications in functional genomics. Comput Struct Biotechnol J 2021; 19:5762-5790. [PMID: 34765093 PMCID: PMC8566780 DOI: 10.1016/j.csbj.2021.10.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 10/05/2021] [Accepted: 10/05/2021] [Indexed: 12/13/2022] Open
Abstract
We review the current applications of artificial intelligence (AI) in functional genomics. The recent explosion of AI follows the remarkable achievements made possible by "deep learning", along with a burst of "big data" that can meet its hunger. Biology is about to overthrow astronomy as the paradigmatic representative of big data producer. This has been made possible by huge advancements in the field of high throughput technologies, applied to determine how the individual components of a biological system work together to accomplish different processes. The disciplines contributing to this bulk of data are collectively known as functional genomics. They consist in studies of: i) the information contained in the DNA (genomics); ii) the modifications that DNA can reversibly undergo (epigenomics); iii) the RNA transcripts originated by a genome (transcriptomics); iv) the ensemble of chemical modifications decorating different types of RNA transcripts (epitranscriptomics); v) the products of protein-coding transcripts (proteomics); and vi) the small molecules produced from cell metabolism (metabolomics) present in an organism or system at a given time, in physiological or pathological conditions. After reviewing main applications of AI in functional genomics, we discuss important accompanying issues, including ethical, legal and economic issues and the importance of explainability.
Collapse
Affiliation(s)
- Claudia Caudai
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Antonella Galizia
- CNR, Institute of Applied Mathematics and Information Technologies (IMATI), Genoa, Italy
| | - Filippo Geraci
- CNR, Institute for Informatics and Telematics (IIT), Pisa, Italy
| | - Loredana Le Pera
- CNR, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Bari, Italy
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Veronica Morea
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Emanuele Salerno
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Allegra Via
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Teresa Colombo
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| |
Collapse
|
4
|
Characterization and functional analysis of Cshsp19.0 encoding a small heat shock protein in Chilo suppressalis (Walker). Int J Biol Macromol 2021; 188:924-931. [PMID: 34352319 DOI: 10.1016/j.ijbiomac.2021.07.186] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Revised: 07/27/2021] [Accepted: 07/29/2021] [Indexed: 11/22/2022]
Abstract
Small heat shock proteins (sHSPs) function as ATP-independent chaperones that preserve cellular proteostasis under stressful conditions. In this study, Cshsp19.0, which encodes a new small heat shock protein, was isolated and characterized from Chilo suppressalis (Walker) to better understand the contribution of sHSPs to insect development and stress tolerance. The full-length Cshsp19.0 cDNA was 697 bp and encoded a 19.0 kDa protein with an isoelectric point of 5.95. Phylogenetic analysis and amino acid alignments indicated that Cshsp19.0 is a member of the sHSP family. Cshsp19.0 was expressed at maximal levels in foreguts and showed the least amount of expression in fat bodies. Expression analysis in different developmental stages of C. suppressalis revealed that Cshsp19.0 was most highly expressed in 1st instar larvae. Furthermore, Cshsp19.0 was upregulated when insects were exposed to heat and cold stress for a 2-h period. There were significant differences in the male and female pupae in response to humidity; Cshsp19.0 expression increased in male pupae as RH increased, whereas the inverse pattern was observed in female pupae. Larvae exhibited a lower rate of survival when Cshsp19.0 was silenced by a nanomaterial-promoted RNAi method. The results confirm that Cshsp19.0 functions to increase environmental stress tolerance and regulates physiological activities in C. suppressalis.
Collapse
|
5
|
Steinhaus R, Proft S, Schuelke M, Cooper DN, Schwarz JM, Seelow D. MutationTaster2021. Nucleic Acids Res 2021; 49:W446-W451. [PMID: 33893808 PMCID: PMC8262698 DOI: 10.1093/nar/gkab266] [Citation(s) in RCA: 117] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 03/26/2021] [Accepted: 04/01/2021] [Indexed: 01/13/2023] Open
Abstract
Here we present an update to MutationTaster, our DNA variant effect prediction tool. The new version uses a different prediction model and attains higher accuracy than its predecessor, especially for rare benign variants. In addition, we have integrated many sources of data that only became available after the last release (such as gnomAD and ExAC pLI scores) and changed the splice site prediction model. To more easily assess the relevance of detected known disease mutations to the clinical phenotype of the patient, MutationTaster now provides information on the diseases they cause. Further changes represent a major overhaul of the interfaces to increase user-friendliness whilst many changes under the hood have been designed to accelerate the processing of uploaded VCF files. We also offer an API for the rapid automated query of smaller numbers of variants from within other software. MutationTaster2021 integrates our disease mutation search engine, MutationDistiller, to prioritise variants from VCF files using the patient's clinical phenotype. The novel version is available at https://www.genecascade.org/MutationTaster2021/. This website is free and open to all users and there is no login requirement.
Collapse
Affiliation(s)
- Robin Steinhaus
- Berliner Institut für Gesundheitsforschung in der Charité - Universitätsmedizin Berlin, 10117 Berlin, Germany.,Institut für Medizinische Genetik und Humangenetik, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, 13353 Berlin, Germany
| | - Sebastian Proft
- Berliner Institut für Gesundheitsforschung in der Charité - Universitätsmedizin Berlin, 10117 Berlin, Germany.,Institut für Medizinische Genetik und Humangenetik, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, 13353 Berlin, Germany
| | - Markus Schuelke
- Klinik für Pädiatrie m.S. Neurologie, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, 13353 Berlin, Germany.,NeuroCure Clinical Research Center, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, 10117 Berlin, Germany
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, CF14 4XW, UK
| | - Jana Marie Schwarz
- Klinik für Pädiatrie m.S. Neurologie, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, 13353 Berlin, Germany
| | - Dominik Seelow
- Berliner Institut für Gesundheitsforschung in der Charité - Universitätsmedizin Berlin, 10117 Berlin, Germany.,Institut für Medizinische Genetik und Humangenetik, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, 13353 Berlin, Germany
| |
Collapse
|
6
|
Shkurin A, Hughes TR. Known sequence features can explain half of all human gene ends. NAR Genom Bioinform 2021; 3:lqab042. [PMID: 34104882 PMCID: PMC8176999 DOI: 10.1093/nargab/lqab042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 04/14/2021] [Accepted: 05/10/2021] [Indexed: 11/15/2022] Open
Abstract
Cleavage and polyadenylation (CPA) sites define eukaryotic gene ends. CPA sites are associated with five key sequence recognition elements: the upstream UGUA, the polyadenylation signal (PAS), and U-rich sequences; the CA/UA dinucleotide where cleavage occurs; and GU-rich downstream elements (DSEs). Currently, it is not clear whether these sequences are sufficient to delineate CPA sites. Additionally, numerous other sequences and factors have been described, often in the context of promoting alternative CPA sites and preventing cryptic CPA site usage. Here, we dissect the contributions of individual sequence features to CPA using standard discriminative models. We show that models comprised only of the five primary CPA sequence features give highest probability scores to constitutive CPA sites at the ends of coding genes, relative to the entire pre-mRNA sequence, for 41% of all human genes. U1-hybridizing sequences provide a small boost in performance. The addition of all known RBP RNA binding motifs to the model, however, increases this figure to 49%, and suggests an involvement of both known and suspected CPA regulators as well as potential new factors in delineating constitutive CPA sites. To our knowledge, this high effectiveness of established features to predict human gene ends has not previously been documented.
Collapse
Affiliation(s)
- Aleksei Shkurin
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
- Terrence Donnelly Centre for Cellular & Biomolecular Research, Toronto, ON M5S 3E1, Canada
| | - Timothy R Hughes
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
- Terrence Donnelly Centre for Cellular & Biomolecular Research, Toronto, ON M5S 3E1, Canada
| |
Collapse
|
7
|
Yu H, Dai Z. SANPolyA: a deep learning method for identifying Poly(A) signals. Bioinformatics 2020; 36:2393-2400. [PMID: 31904817 DOI: 10.1093/bioinformatics/btz970] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 12/05/2019] [Accepted: 01/01/2020] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Polyadenylation plays a regulatory role in transcription. The recognition of polyadenylation signal (PAS) motif sequence is an important step in polyadenylation. In the past few years, some statistical machine learning-based and deep learning-based methods have been proposed for PAS identification. Although these methods predict PAS with success, there is room for their improvement on PAS identification. RESULTS In this study, we proposed a deep neural network-based computational method, called SANPolyA, for identifying PAS in human and mouse genomes. SANPolyA requires no manually crafted sequence features. We compared our method SANPolyA with several previous PAS identification methods on several PAS benchmark datasets. Our results showed that SANPolyA outperforms the state-of-art methods. SANPolyA also showed good performance on leave-one-motif-out evaluation. AVAILABILITY AND IMPLEMENTATION https://github.com/yuht4/SANPolyA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Zhiming Dai
- School of Data and Computer Science.,Guangdong Province Key Laboratory of Big Data Analysis and Processing, Sun Yat-Sen University, Guangzhou 510006, China
| |
Collapse
|
8
|
Arefeen A, Xiao X, Jiang T. DeepPASTA: deep neural network based polyadenylation site analysis. Bioinformatics 2020; 35:4577-4585. [PMID: 31081512 DOI: 10.1093/bioinformatics/btz283] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 03/22/2019] [Accepted: 04/16/2019] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Alternative polyadenylation (polyA) sites near the 3' end of a pre-mRNA create multiple mRNA transcripts with different 3' untranslated regions (3' UTRs). The sequence elements of a 3' UTR are essential for many biological activities such as mRNA stability, sub-cellular localization, protein translation, protein binding and translation efficiency. Moreover, numerous studies in the literature have reported the correlation between diseases and the shortening (or lengthening) of 3' UTRs. As alternative polyA sites are common in mammalian genes, several machine learning tools have been published for predicting polyA sites from sequence data. These tools either consider limited sequence features or use relatively old algorithms for polyA site prediction. Moreover, none of the previous tools consider RNA secondary structures as a feature to predict polyA sites. RESULTS In this paper, we propose a new deep learning model, called DeepPASTA, for predicting polyA sites from both sequence and RNA secondary structure data. The model is then extended to predict tissue-specific polyA sites. Moreover, the tool can predict the most dominant (i.e. frequently used) polyA site of a gene in a specific tissue and relative dominance when two polyA sites of the same gene are given. Our extensive experiments demonstrate that DeepPASTA signisficantly outperforms the existing tools for polyA site prediction and tissue-specific relative and absolute dominant polyA site prediction. AVAILABILITY AND IMPLEMENTATION https://github.com/arefeen/DeepPASTA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ashraful Arefeen
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA
| | - Xinshu Xiao
- Department of Integrative Biology and Physiology, University of California, Los Angeles, CA 90095, USA
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA.,Institute of Integrative Genome Biology, University of California, Riverside, CA 92521, USA.,Bioinformatics Division, BNRIST, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
9
|
Xia Z, Li Y, Zhang B, Li Z, Hu Y, Chen W, Gao X. DeeReCT-PolyA: a robust and generic deep learning method for PAS identification. Bioinformatics 2020; 35:2371-2379. [PMID: 30500881 PMCID: PMC6612895 DOI: 10.1093/bioinformatics/bty991] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Revised: 11/06/2018] [Accepted: 11/29/2018] [Indexed: 02/06/2023] Open
Abstract
Motivation Polyadenylation is a critical step for gene expression regulation during the maturation of mRNA. An accurate and robust method for poly(A) signals (PASs) identification is not only desired for the purpose of better transcripts’ end annotation, but can also help us gain a deeper insight of the underlying regulatory mechanism. Although many methods have been proposed for PAS recognition, most of them are PAS motif- and human-specific, which leads to high risks of overfitting, low generalization power, and inability to reveal the connections between the underlying mechanisms of different mammals. Results In this work, we propose a robust, PAS motif agnostic, and highly interpretable and transferrable deep learning model for accurate PAS recognition, which requires no prior knowledge or human-designed features. We show that our single model trained over all human PAS motifs not only outperforms the state-of-the-art methods trained on specific motifs, but can also be generalized well to two mouse datasets. Moreover, we further increase the prediction accuracy by transferring the deep learning model trained on the data of one species to the data of a different species. Several novel underlying poly(A) patterns are revealed through the visualization of important oligomers and positions in our trained models. Finally, we interpret the deep learning models by converting the convolutional filters into sequence logos and quantitatively compare the sequence logos between human and mouse datasets. Availability and implementation https://github.com/likesum/DeeReCT-PolyA Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhihao Xia
- Department of Computer Science and Engineering (CSE), Washington University in St Louis, St Louis, MO, USA
| | - Yu Li
- Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, Saudi Arabia
| | - Bin Zhang
- Department of Biology, Southern University of Science and Technology (SUSTC), Shenzhen, China
| | - Zhongxiao Li
- Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, Saudi Arabia
| | - Yuhui Hu
- Department of Biology, Southern University of Science and Technology (SUSTC), Shenzhen, China
| | - Wei Chen
- Department of Biology, Southern University of Science and Technology (SUSTC), Shenzhen, China
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, Saudi Arabia
| |
Collapse
|
10
|
Chahid A, Albalawi F, Alotaiby TN, Al-Hameed MH, Alshebeili S, Laleg-Kirati TM. QuPWM: Feature Extraction Method for Epileptic Spike Classification. IEEE J Biomed Health Inform 2020; 24:2814-2824. [PMID: 32054592 DOI: 10.1109/jbhi.2020.2972286] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Epilepsy is a neurological disorder ranked as the second most serious neurological disease known to humanity, after stroke. Inter-ictal spiking is an abnormal neuronal discharge after an epileptic seizure. This abnormal activity can originate from one or more cranial lobes, often travels from one lobe to another, and interferes with normal activity from the affected lobe. The common practice for Inter-ictal spike detection of brain signals is via visual scanning of the recordings, which is a subjective and a very time-consuming task. Motivated by that, this article focuses on using machine learning for epileptic spikes classification in magnetoencephalography (MEG) signals. First, we used the Position Weight Matrix (PWM) method combined with a uniform quantizer to generate useful features from time domain and frequency domain through a Fast Fourier Transform (FFT) of the framed raw MEG signals. Second, the extracted features are fed to standard classifiers for inter-ictel spikes classification. The proposed technique shows great potential in spike classification and reducing the feature vector size. Specifically, the proposed technique achieved average sensitivity up to 87% and specificity up to 97% using 5-folds cross-validation applied to a balanced dataset. These samples are extracted from nine epileptic subjects using a sliding frame of size 95 samples-points with a step-size of 8 sample-points.
Collapse
|
11
|
Chang YW, Zhang XX, Lu MX, Du YZ, Zhu-Salzman K. Molecular Cloning and Characterization of Small Heat Shock Protein Genes in the Invasive Leaf Miner Fly, Liriomyza trifolii. Genes (Basel) 2019; 10:genes10100775. [PMID: 31623413 PMCID: PMC6826454 DOI: 10.3390/genes10100775] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2019] [Revised: 09/26/2019] [Accepted: 09/27/2019] [Indexed: 11/26/2022] Open
Abstract
Small heat shock proteins (sHSPs) comprise numerous proteins with diverse structure and function. As molecular chaperones, they play essential roles in various biological processes, especially under thermal stresses. In this study, we identified three sHSP-encoding genes, LtHSP19.5, LtHSP20.8 and LtHSP21.7b from Liriomyza trifolii, an important insect pest of ornamental and vegetable crops worldwide. Putative proteins encoded by these genes all contain a conserved α-crystallin domain that is typical of the sHSP family. Their expression patterns during temperature stresses and at different insect development stages were studied by reverse-transcription quantitative PCR (RT-qPCR). In addition, the expression patterns were compared with those of LtHSP21.3 and LtHSP21.7, two previously published sHSPs. When pupae were exposed to temperatures ranging from −20 to 45 °C for 1 h, all LtsHSPs were strongly induced by either heat or cold stresses, but the magnitude was lower under the low temperature range than high temperatures. Developmentally regulated differential expression was also detected, with pupae and prepupae featuring the highest expression of sHSPs. Results suggest that LtsHSPs play a role in the development of the invasive leaf miner fly and may facilitate insect adaptation to climate change.
Collapse
Affiliation(s)
- Ya-Wen Chang
- College of Horticulture and Plant Protection, Institute of Applied Entomology, Yangzhou University, Yangzhou 225009, China.
| | - Xiao-Xiang Zhang
- College of Horticulture and Plant Protection, Institute of Applied Entomology, Yangzhou University, Yangzhou 225009, China.
| | - Ming-Xing Lu
- College of Horticulture and Plant Protection, Institute of Applied Entomology, Yangzhou University, Yangzhou 225009, China.
| | - Yu-Zhou Du
- College of Horticulture and Plant Protection, Institute of Applied Entomology, Yangzhou University, Yangzhou 225009, China.
- Joint International Research Laboratory of Agriculture and Agri-Product Safety, The Ministry of Education, Yangzhou University, Yangzhou 225009, China.
| | - Keyan Zhu-Salzman
- Department of Entomology, Texas A&M University, College Station, TX 77843, USA.
| |
Collapse
|
12
|
Doulazmi M, Cros C, Dusart I, Trembleau A, Dubacq C. Alternative polyadenylation produces multiple 3' untranslated regions of odorant receptor mRNAs in mouse olfactory sensory neurons. BMC Genomics 2019; 20:577. [PMID: 31299892 PMCID: PMC6624953 DOI: 10.1186/s12864-019-5927-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 06/23/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Odorant receptor genes constitute the largest gene family in mammalian genomes and this family has been extensively studied in several species, but to date far less attention has been paid to the characterization of their mRNA 3' untranslated regions (3'UTRs). Given the increasing importance of UTRs in the understanding of RNA metabolism, and the growing interest in alternative polyadenylation especially in the nervous system, we aimed at identifying the alternative isoforms of odorant receptor mRNAs generated through 3'UTR variation. RESULTS We implemented a dedicated pipeline using IsoSCM instead of Cufflinks to analyze RNA-Seq data from whole olfactory mucosa of adult mice and obtained an extensive description of the 3'UTR isoforms of odorant receptor mRNAs. To validate our bioinformatics approach, we exhaustively analyzed the 3'UTR isoforms produced from 2 pilot genes, using molecular approaches including northern blot and RNA ligation mediated polyadenylation test. Comparison between datasets further validated the pipeline and confirmed the alternative polyadenylation patterns of odorant receptors. Qualitative and quantitative analyses of the annotated 3' regions demonstrate that 1) Odorant receptor 3'UTRs are longer than previously described in the literature; 2) More than 77% of odorant receptor mRNAs are subject to alternative polyadenylation, hence generating at least 2 detectable 3'UTR isoforms; 3) Splicing events in 3'UTRs are restricted to a limited subset of odorant receptor genes; and 4) Comparison between male and female data shows no sex-specific differences in odorant receptor 3'UTR isoforms. CONCLUSIONS We demonstrated for the first time that odorant receptor genes are extensively subject to alternative polyadenylation. This ground-breaking change to the landscape of 3'UTR isoforms of Olfr mRNAs opens new avenues for investigating their respective functions, especially during the differentiation of olfactory sensory neurons.
Collapse
Affiliation(s)
- Mohamed Doulazmi
- CNRS, Institut de Biologie Paris Seine, Biological adaptation and ageing, B2A, Sorbonne Université, F-75005 Paris, France
| | - Cyril Cros
- CNRS, INSERM, Institut de Biologie Paris Seine, Neuroscience Paris Seine, NPS, Sorbonne Université, F-75005 Paris, France
- Present Address: Columbia University, New York, NY 10027 USA
| | - Isabelle Dusart
- CNRS, INSERM, Institut de Biologie Paris Seine, Neuroscience Paris Seine, NPS, Sorbonne Université, F-75005 Paris, France
| | - Alain Trembleau
- CNRS, INSERM, Institut de Biologie Paris Seine, Neuroscience Paris Seine, NPS, Sorbonne Université, F-75005 Paris, France
| | - Caroline Dubacq
- CNRS, INSERM, Institut de Biologie Paris Seine, Neuroscience Paris Seine, NPS, Sorbonne Université, F-75005 Paris, France
| |
Collapse
|
13
|
Albalawi F, Chahid A, Guo X, Albaradei S, Magana-Mora A, Jankovic BR, Uludag M, Van Neste C, Essack M, Laleg-Kirati TM, Bajic VB. Hybrid model for efficient prediction of poly(A) signals in human genomic DNA. Methods 2019; 166:31-39. [PMID: 30991099 DOI: 10.1016/j.ymeth.2019.04.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 03/12/2019] [Accepted: 04/01/2019] [Indexed: 12/15/2022] Open
Abstract
Polyadenylation signals (PAS) are found in most protein-coding and some non-coding genes in eukaryotes. Their accurate recognition improves understanding gene regulation mechanisms and recognition of the 3'-end of transcribed gene regions where premature or alternate transcription ends may lead to various diseases. Although different methods and tools for in-silico prediction of genomic signals have been proposed, the correct identification of PAS in genomic DNA remains challenging due to a vast number of non-relevant hexamers identical to PAS hexamers. In this study, we developed a novel method for PAS recognition. The method is implemented in a hybrid PAS recognition model (HybPAS), which is based on deep neural networks (DNNs) and logistic regression models (LRMs). One of such models is developed for each of the 12 most frequent human PAS hexamers. DNN models appeared the best for eight PAS types (including the two most frequent PAS hexamers), while LRM appeared best for the remaining four PAS types. The new models use different combinations of signal processing-based, statistical, and sequence-based features as input. The results obtained on human genomic data show that HybPAS outperforms the well-tuned state-of-the-art Omni-PolyA models, reducing the classification error for different PAS hexamers by up to 57.35% for 10 out of 12 PAS types, with Omni-PolyA models being better for two PAS types. For the most frequent PAS types, 'AATAAA' and 'ATTAAA', HybPAS reduced the error rate by 35.14% and 34.48%, respectively. On average, HybPAS reduces the error by 30.29%. HybPAS is implemented partly in Python and in MATLAB available at https://github.com/EMANG-KAUST/PolyA_Prediction_LRM_DNN.
Collapse
Affiliation(s)
- Fahad Albalawi
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia; Taif University, Electrical Engineering, Taif 21944, Saudi Arabia
| | - Abderrazak Chahid
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia
| | - Xingang Guo
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia
| | - Somayah Albaradei
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia
| | - Arturo Magana-Mora
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia; Saudi Aramco, EXPEC-ARC, Drilling Technology Team, Dhahran 31311, Saudi Arabia
| | - Boris R Jankovic
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia
| | - Mahmut Uludag
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia
| | - Christophe Van Neste
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia; Ghent University, Center for Medical Genetics Ghent (CMGG), B-9000 Ghent, Belgium
| | - Magbubah Essack
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia
| | - Taous-Meriem Laleg-Kirati
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia.
| | - Vladimir B Bajic
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia.
| |
Collapse
|
14
|
Flynn LL, Mitrpant C, Pitout IL, Fletcher S, Wilton SD. Antisense Oligonucleotide-Mediated Terminal Intron Retention of the SMN2 Transcript. MOLECULAR THERAPY. NUCLEIC ACIDS 2018; 11:91-102. [PMID: 29858094 PMCID: PMC5854547 DOI: 10.1016/j.omtn.2018.01.011] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Revised: 01/25/2018] [Accepted: 01/25/2018] [Indexed: 12/21/2022]
Abstract
The severe childhood disease spinal muscular atrophy (SMA) arises from the homozygous loss of the survival motor neuron 1 gene (SMN1). A homologous gene potentially encoding an identical protein, SMN2 can partially compensate for the loss of SMN1; however, the exclusion of a critical exon in the coding region during mRNA maturation results in insufficient levels of functional protein. The rate of transcription is known to influence the alternative splicing of gene transcripts, with a fast transcription rate correlating to an increase in alternative splicing. Conversely, a slower transcription rate is more likely to result in the inclusion of all exons in the transcript. Targeting SMN2 with antisense oligonucleotides to influence the processing of terminal exon 8 could be a way to slow transcription and induce the inclusion of exon 7. Interestingly, following oligomer treatment of SMA patient fibroblasts, we observed the inclusion of exon 7, as well as intron 7, in the transcript. Because the normal termination codon is located in exon 7, this exon/intron 7-SMN2 transcript should encode the normal protein and only carry a longer 3′ UTR. Further studies showed the extra 3′ UTR length contained a number of regulatory motifs that modify transcript and protein regulation, leading to translational repression of SMN. Although unlikely to provide therapeutic benefit for SMA patients, this novel technique for gene regulation could provide another avenue for the repression of undesirable gene expression in a variety of other diseases.
Collapse
Affiliation(s)
- Loren L Flynn
- Centre for Comparative Genomics, Murdoch University, Perth, WA, Australia; Perron Institute for Neurological and Translational Science, Perth, WA, Australia
| | - Chalermchai Mitrpant
- Perron Institute for Neurological and Translational Science, Perth, WA, Australia; Department of Biochemistry, Mahidol University, Bangkok, Thailand
| | - Ianthe L Pitout
- Centre for Comparative Genomics, Murdoch University, Perth, WA, Australia; Perron Institute for Neurological and Translational Science, Perth, WA, Australia
| | - Sue Fletcher
- Centre for Comparative Genomics, Murdoch University, Perth, WA, Australia; Perron Institute for Neurological and Translational Science, Perth, WA, Australia
| | - Steve D Wilton
- Centre for Comparative Genomics, Murdoch University, Perth, WA, Australia; Perron Institute for Neurological and Translational Science, Perth, WA, Australia.
| |
Collapse
|
15
|
Detection of subclonal L1 transductions in colorectal cancer by long-distance inverse-PCR and Nanopore sequencing. Sci Rep 2017; 7:14521. [PMID: 29109480 PMCID: PMC5673974 DOI: 10.1038/s41598-017-15076-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 10/20/2017] [Indexed: 02/07/2023] Open
Abstract
Long interspersed nuclear elements-1 (L1s) are a large family of retrotransposons. Retrotransposons are repetitive sequences that are capable of autonomous mobility via a copy-and-paste mechanism. In most copy events, only the L1 sequence is inserted, however, they can also mobilize the flanking non-repetitive region by a process known as 3' transduction. L1 insertions can contribute to genome plasticity and cause potentially tumorigenic genomic instability. However, detecting the activity of a particular source L1 and identifying new insertions stemming from it is a challenging task with current methodological approaches. We developed a long-distance inverse PCR (LDI-PCR) based approach to monitor the mobility of active L1 elements based on their 3' transduction activity. LDI-PCR requires no prior knowledge of the insertion target region. By applying LDI-PCR in conjunction with Nanopore sequencing (Oxford Nanopore Technologies) on one L1 reported to be particularly active in human cancer genomes, we detected 14 out of 15 3' transductions previously identified by whole genome sequencing in two different colorectal tumour samples. In addition we discovered 25 novel highly subclonal insertions. Furthermore, the long sequencing reads produced by LDI-PCR/Nanopore sequencing enabled the identification of both the 5' and 3' junctions and revealed detailed insertion sequence information.
Collapse
|
16
|
Szkop KJ, Nobeli I. Untranslated Parts of Genes Interpreted: Making Heads or Tails of High-Throughput Transcriptomic Data via Computational Methods: Computational methods to discover and quantify isoforms with alternative untranslated regions. Bioessays 2017; 39. [PMID: 29052251 DOI: 10.1002/bies.201700090] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2017] [Revised: 09/12/2017] [Indexed: 01/07/2023]
Abstract
In this review we highlight the importance of defining the untranslated parts of transcripts, and present a number of computational approaches for the discovery and quantification of alternative transcription start and poly-adenylation events in high-throughput transcriptomic data. The fate of eukaryotic transcripts is closely linked to their untranslated regions, which are determined by the position at which transcription starts and ends at a genomic locus. Although the extent of alternative transcription starts and alternative poly-adenylation sites has been revealed by sequencing methods focused on the ends of transcripts, the application of these methods is not yet widely adopted by the community. We suggest that computational methods applied to standard high-throughput technologies are a useful, albeit less accurate, alternative to the expertise-demanding 5' and 3' sequencing and they are the only option for analysing legacy transcriptomic data. We review these methods here, focusing on technical challenges and arguing for the need to include better normalization of the data and more appropriate statistical models of the expected variation in the signal.
Collapse
Affiliation(s)
- Krzysztof J Szkop
- Institute of Structural and Molecular Biology, Department of Biological Sciences Birkbeck, University of London, Malet Street, London WC1E 7HX, UK
| | - Irene Nobeli
- Institute of Structural and Molecular Biology, Department of Biological Sciences Birkbeck, University of London, Malet Street, London WC1E 7HX, UK
| |
Collapse
|
17
|
Magana-Mora A, Kalkatawi M, Bajic VB. Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA. BMC Genomics 2017; 18:620. [PMID: 28810905 PMCID: PMC5558757 DOI: 10.1186/s12864-017-4033-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Accepted: 08/07/2017] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Polyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and many long non-coding RNAs. The correct identification of poly(A) signals (PAS) not only helps to elucidate the 3'-end genomic boundaries of a transcribed DNA region and gene regulatory mechanisms but also gives insight into the multiple transcript isoforms resulting from alternative PAS. Although progress has been made in the in-silico prediction of genomic signals, the recognition of PAS in DNA genomic sequences remains a challenge. RESULTS In this study, we analyzed human genomic DNA sequences for the 12 most common PAS variants. Our analysis has identified a set of features that helps in the recognition of true PAS, which may be involved in the regulation of the polyadenylation process. The proposed features, in combination with a recognition model, resulted in a novel method and tool, Omni-PolyA. Omni-PolyA combines several machine learning techniques such as different classifiers in a tree-like decision structure and genetic algorithms for deriving a robust classification model. We performed a comparison between results obtained by state-of-the-art methods, deep neural networks, and Omni-PolyA. Results show that Omni-PolyA significantly reduced the average classification error rate by 35.37% in the prediction of the 12 considered PAS variants relative to the state-of-the-art results. CONCLUSIONS The results of our study demonstrate that Omni-PolyA is currently the most accurate model for the prediction of PAS in human and can serve as a useful complement to other PAS recognition methods. Omni-PolyA is publicly available as an online tool accessible at www.cbrc.kaust.edu.sa/omnipolya/ .
Collapse
Affiliation(s)
- Arturo Magana-Mora
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Manal Kalkatawi
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Vladimir B Bajic
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|
18
|
VanBelzen DJ, Malik AS, Henthorn PS, Kornegay JN, Stedman HH. Mechanism of Deletion Removing All Dystrophin Exons in a Canine Model for DMD Implicates Concerted Evolution of X Chromosome Pseudogenes. Mol Ther Methods Clin Dev 2017; 4:62-71. [PMID: 28344992 PMCID: PMC5363321 DOI: 10.1016/j.omtm.2016.12.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Accepted: 12/07/2016] [Indexed: 01/19/2023]
Abstract
Duchenne muscular dystrophy (DMD) is a lethal, X-linked, muscle-wasting disorder caused by mutations in the large, 2.4-Mb dystrophin gene. The majority of DMD-causing mutations are sporadic, multi-exon, frameshifting deletions, with the potential for variable immunological tolerance to the dystrophin protein from patient to patient. While systemic gene therapy holds promise in the treatment of DMD, immune responses to vectors and transgenes must first be rigorously evaluated in informative preclinical models to ensure patient safety. A widely used canine model for DMD, golden retriever muscular dystrophy, expresses detectable amounts of near full-length dystrophin due to alternative splicing around an intronic point mutation, thereby confounding the interpretation of immune responses to dystrophin-derived gene therapies. Here we characterize a naturally occurring deletion in a dystrophin-null canine, the German shorthaired pointer. The deletion spans 5.6 Mb of the X chromosome and encompasses all coding exons of the DMD and TMEM47 genes. The sequences surrounding the deletion breakpoints are virtually identical, suggesting that the deletion occurred through a homologous recombination event. Interestingly, the deletion breakpoints are within loci that are syntenically conserved among mammals, yet the high homology among this subset of ferritin-like loci is unique to the canine genome, suggesting lineage-specific concerted evolution of these atypical sequence elements.
Collapse
Affiliation(s)
- D. Jake VanBelzen
- Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Alock S. Malik
- Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Paula S. Henthorn
- Section of Medical Genetics, University of Pennsylvania School of Veterinary Medicine, Philadelphia, PA 19104, USA
| | - Joe N. Kornegay
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
| | - Hansell H. Stedman
- Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Corporal Michael Crescenz Veterans Administration Medical Center, Philadelphia, PA 19104, USA
| |
Collapse
|
19
|
Weng L, Li Y, Xie X, Shi Y. Poly(A) code analyses reveal key determinants for tissue-specific mRNA alternative polyadenylation. RNA (NEW YORK, N.Y.) 2016; 22:813-21. [PMID: 27095026 PMCID: PMC4878608 DOI: 10.1261/rna.055681.115] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Accepted: 02/22/2016] [Indexed: 05/23/2023]
Abstract
mRNA alternative polyadenylation (APA) is a critical mechanism for post-transcriptional gene regulation and is often regulated in a tissue- and/or developmental stage-specific manner. An ultimate goal for the APA field has been to be able to computationally predict APA profiles under different physiological or pathological conditions. As a first step toward this goal, we have assembled a poly(A) code for predicting tissue-specific poly(A) sites (PASs). Based on a compendium of over 600 features that have known or potential roles in PAS selection, we have generated and refined a machine-learning algorithm using multiple high-throughput sequencing-based data sets of tissue-specific and constitutive PASs. This code can predict tissue-specific PASs with >85% accuracy. Importantly, by analyzing the prediction performance based on different RNA features, we found that PAS context, including the distance between alternative PASs and the relative position of a PAS within the gene, is a key feature for determining the susceptibility of a PAS to tissue-specific regulation. Our poly(A) code provides a useful tool for not only predicting tissue-specific APA regulation, but also for studying its underlying molecular mechanisms.
Collapse
Affiliation(s)
- Lingjie Weng
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, California 92697, USA Institute for Genomics and Bioinformatics, University of California, Irvine, Irvine, California 92697, USA Department of Computer Science, University of California, Irvine, Irvine, California 92697, USA
| | - Yi Li
- Institute for Genomics and Bioinformatics, University of California, Irvine, Irvine, California 92697, USA Department of Computer Science, University of California, Irvine, Irvine, California 92697, USA
| | - Xiaohui Xie
- Institute for Genomics and Bioinformatics, University of California, Irvine, Irvine, California 92697, USA Department of Computer Science, University of California, Irvine, Irvine, California 92697, USA
| | - Yongsheng Shi
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, California 92697, USA
| |
Collapse
|
20
|
Abstract
Nesprins are a family of multi-isomeric scaffolding proteins that were originally identified at the nuclear envelope (NE), where they bind to lamin A/C, emerin, and SUN-domain containing proteins, to form the LInker of Nucleoskeleton-and-Cytoskeleton (LINC) complex that connects the NE to the actin cytoskeleton. However, nesprin genes also give rise to a variety of tissue-specific variants of different sizes with potential roles beyond the NE. These variants are generated through alternative initiation, termination, and splicing, which makes nesprin biology very complex to study due to the difficulty in generating specific antibodies and/or short interfering RNAs (siRNA) to particular isoforms. In order to distinguish genuine nesprin variants and eliminate confusion with degradation products of larger nesprin isoforms, in this chapter we discuss methods including 5' and 3' Rapid Amplification of cDNA Ends (RACE) and RT-PCR in combination with EST database searching, for identifying and validating putative nesprin isoforms. This information is essential to allow a better understanding of nesprin functions in different cell types.
Collapse
|
21
|
An improved poly(A) motifs recognition method based on decision level fusion. Comput Biol Chem 2014; 54:49-56. [PMID: 25594576 DOI: 10.1016/j.compbiolchem.2014.12.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Revised: 11/27/2014] [Accepted: 12/27/2014] [Indexed: 01/07/2023]
Abstract
Polyadenylation is the process of addition of poly(A) tail to mRNA 3' ends. Identification of motifs controlling polyadenylation plays an essential role in improving genome annotation accuracy and better understanding of the mechanisms governing gene regulation. The bioinformatics methods used for poly(A) motifs recognition have demonstrated that information extracted from sequences surrounding the candidate motifs can differentiate true motifs from the false ones greatly. However, these methods depend on either domain features or string kernels. To date, methods combining information from different sources have not been found yet. Here, we proposed an improved poly(A) motifs recognition method by combing different sources based on decision level fusion. First of all, two novel prediction methods was proposed based on support vector machine (SVM): one method is achieved by using the domain-specific features and principle component analysis (PCA) method to eliminate the redundancy (PCA-SVM); the other method is based on Oligo string kernel (Oligo-SVM). Then we proposed a novel machine-learning method for poly(A) motif prediction by marrying four poly(A) motifs recognition methods, including two state-of-the-art methods (Random Forest (RF) and HMM-SVM), and two novel proposed methods (PCA-SVM and Oligo-SVM). A decision level information fusion method was employed to combine the decision values of different classifiers by applying the DS evidence theory. We evaluated our method on a comprehensive poly(A) dataset that consists of 14,740 samples on 12 variants of poly(A) motifs and 2750 samples containing none of these motifs. Our method has achieved accuracy up to 86.13%. Compared with the four classifiers, our evidence theory based method reduces the average error rate by about 30%, 27%, 26% and 16%, respectively. The experimental results suggest that the proposed method is more effective for poly(A) motif recognition.
Collapse
|
22
|
Saravanaperumal SA, Pediconi D, Renieri C, La Terza A. Alternative splicing of the sheep MITF gene: novel transcripts detectable in skin. Gene 2014; 552:165-75. [PMID: 25239663 DOI: 10.1016/j.gene.2014.09.031] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2014] [Revised: 09/12/2014] [Accepted: 09/15/2014] [Indexed: 01/05/2023]
Abstract
Microphthalmia-associated transcription factor (MITF) is a basic helix-loop-helix leucine zipper (bHLH-LZ) transcription factor, which regulates the differentiation and development of melanocytes and pigment cell-specific transcription of the melanogenesis enzyme genes. Though multiple splice variants of MITF have been reported in humans, mice and other vertebrate species, in merino sheep (Ovis aries), MITF gene splicing has not yet been investigated until now. To investigate the sheep MITF isoforms, the full length mRNA/cDNAs from the skin of merino sheep were cloned, sequenced and characterized. Reverse transcriptase (RT)-PCR analysis and molecular prediction revealed two basic splice variants with (+) and without (-) an 18 bp insertion viz. CGTGTATTTTCCCCACAG, in the coding region (CDS) for the amino acids 'ACIFPT'. It was further confirmed by the complete nucleotide sequencing of splice junction covering intron-6 (2463 bp), wherein an 18bp intronic sequence is retained into the CDS of MITF (+) isoform. Further, full-length cDNA libraries were enriched by the method of 5' and 3' rapid amplification of cDNA ends (RACE-PCR). A total of seven sheep MITF splice variants, with distinct N-terminus sequences such as MITF-A, B, E, H, and M, the counterparts of human and mouse MITF, were identified by 5' RACE. The other two 5' RACE products were found to be novel splice variants of MITF and represented as 'MITF truncated form (Trn)-1, 2'. These alternative splice (AS) variants were illustrated using comparative genome analysis. By means of 3' RACE three different MITF 3' UTRs (625, 1083, 3167bp) were identified and characterized. We also demonstrated that the MITF gene expression determined at transcript level is mediated via an intron-6 splicing event. Here we summarize for the first time, the expression of seven MITF splice variants with three distinct 3' UTRs in the skin of merino sheep. Our data refine the structure of the MITF gene in sheep beyond what was previously known in humans, mice, dogs and other mammals.
Collapse
Affiliation(s)
- Siva Arumugam Saravanaperumal
- Animal and Molecular Ecology Lab, School of Biosciences and Veterinary Medicine, University of Camerino, via Gentile III da Varano, Camerino, Macerata 62032, Italy.
| | - Dario Pediconi
- Animal and Molecular Ecology Lab, School of Biosciences and Veterinary Medicine, University of Camerino, via Gentile III da Varano, Camerino, Macerata 62032, Italy.
| | - Carlo Renieri
- Animal and Molecular Ecology Lab, School of Biosciences and Veterinary Medicine, University of Camerino, via Gentile III da Varano, Camerino, Macerata 62032, Italy.
| | - Antonietta La Terza
- Animal and Molecular Ecology Lab, School of Biosciences and Veterinary Medicine, University of Camerino, via Gentile III da Varano, Camerino, Macerata 62032, Italy.
| |
Collapse
|
23
|
Alsemgeest J, Old JM, Young LJ. The macropod type 2 interferon gene shares important regulatory and functionally relevant regions with eutherian IFN-γ. Mol Immunol 2014; 63:297-304. [PMID: 25124143 DOI: 10.1016/j.molimm.2014.07.019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2014] [Revised: 07/17/2014] [Accepted: 07/22/2014] [Indexed: 11/30/2022]
Abstract
Interferon-γ (IFN-γ) is an important immune regulatory molecule that plays a significant role in internal and external modulation of the mammalian immune response to intracellular pathogens. Herein, we report the 492 nt expressed sequence for the coding domain of IFN-γ from the immune tissues of two Australian macropod marsupial species: the tammar wallaby (Macropus eugenii) and the vulnerable rufous hare-wallaby (Lagorchestes hirsutus). Both 5' and 3' untranslated regions and the coding domain of M. eugenii IFN-γ revealed the presence of motifs responsible for transcriptional regulation, mRNA regulation, post-translational modifications, and receptor binding in other mammals. Since diagnostic kits for mycobacterial disease commonly rely on the assessment of interferon levels, we can now use this information to develop reagents that can be applied in clinical and laboratory settings to further our understanding of marsupial responses to disease.
Collapse
Affiliation(s)
- Jenifer Alsemgeest
- Central Queensland University, School of Medical and Applied Sciences, Rockhampton, Queensland 4702, Australia
| | - Julie M Old
- University of Western Sydney, School of Science and Health, Penrith NSW 2751, Australia
| | - Lauren J Young
- Central Queensland University, School of Medical and Applied Sciences, Rockhampton, Queensland 4702, Australia; University of Western Sydney, School of Science and Health, Penrith NSW 2751, Australia.
| |
Collapse
|
24
|
Li XQ, Du D. Motif types, motif locations and base composition patterns around the RNA polyadenylation site in microorganisms, plants and animals. BMC Evol Biol 2014; 14:162. [PMID: 25052519 PMCID: PMC4360255 DOI: 10.1186/s12862-014-0162-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2014] [Accepted: 07/14/2014] [Indexed: 12/22/2022] Open
Abstract
Background The polyadenylation of RNA is critical for gene functioning, but the conserved sequence motifs (often called signal or signature motifs), motif locations and abundances, and base composition patterns around mRNA polyadenylation [poly(A)] sites are still uncharacterized in most species. The evolutionary tendency for poly(A) site selection is still largely unknown. Results We analyzed the poly(A) site regions of 31 species or phyla. Different groups of species showed different poly(A) signal motifs: UUACUU at the poly(A) site in the parasite Trypanosoma cruzi; UGUAAC (approximately 13 bases upstream of the site) in the alga Chlamydomonas reinhardtii; UGUUUG (or UGUUUGUU) at mainly the fourth base downstream of the poly(A) site in the parasite Blastocystis hominis; and AAUAAA at approximately 16 bases and approximately 19 bases upstream of the poly(A) site in animals and plants, respectively. Polyadenylation signal motifs are usually several hundred times more abundant around poly(A) sites than in whole genomes. These predominant motifs usually had very specific locations, whether upstream of, at, or downstream of poly(A) sites, depending on the species or phylum. The poly(A) site was usually an adenosine (A) in all analyzed species except for B. hominis, and there was weak A predominance in C. reinhardtii. Fungi, animals, plants, and the protist Phytophthora infestans shared a general base abundance pattern (or base composition pattern) of “U-rich—A-rich—U-rich—Poly(A) site—U-rich regions”, or U-A-U-A-U for short, with some variation for each kingdom or subkingdom. Conclusion This study identified the poly(A) signal motifs, motif locations, and base composition patterns around mRNA poly(A) sites in protists, fungi, plants, and animals and provided insight into poly(A) site evolution.
Collapse
Affiliation(s)
- Xiu-Qing Li
- Molecular Genetics Laboratory, Potato Research Centre, Agriculture and Agri-Food Canada, 850 Lincoln Road, Fredericton, New Brunswick, E3B 4Z7, Canada.
| | - Donglei Du
- Quantitative Methods Research Group, Faculty of Business Administration, University of New Brunswick, 7 Macaulay Lane, Fredericton, NB, E3B 5A3, Canada.
| |
Collapse
|
25
|
Genomic organization and molecular characterization of porcine cytomegalovirus. Virology 2014; 460-461:165-72. [DOI: 10.1016/j.virol.2014.05.014] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2013] [Revised: 10/17/2013] [Accepted: 05/07/2014] [Indexed: 11/22/2022]
|
26
|
Ji G, Guan J, Zeng Y, Li QQ, Wu X. Genome-wide identification and predictive modeling of polyadenylation sites in eukaryotes. Brief Bioinform 2014; 16:304-13. [DOI: 10.1093/bib/bbu011] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
27
|
Hafez D, Ni T, Mukherjee S, Zhu J, Ohler U. Genome-wide identification and predictive modeling of tissue-specific alternative polyadenylation. Bioinformatics 2013; 29:i108-16. [PMID: 23812974 PMCID: PMC3694680 DOI: 10.1093/bioinformatics/btt233] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Motivation: Pre-mRNA cleavage and polyadenylation are essential steps for 3′-end maturation and subsequent stability and degradation of mRNAs. This process is highly controlled by cis-regulatory elements surrounding the cleavage/polyadenylation sites (polyA sites), which are frequently constrained by sequence content and position. More than 50% of human transcripts have multiple functional polyA sites, and the specific use of alternative polyA sites (APA) results in isoforms with variable 3′-untranslated regions, thus potentially affecting gene regulation. Elucidating the regulatory mechanisms underlying differential polyA preferences in multiple cell types has been hindered both by the lack of suitable data on the precise location of cleavage sites, as well as of appropriate tests for determining APAs with significant differences across multiple libraries. Results: We applied a tailored paired-end RNA-seq protocol to specifically probe the position of polyA sites in three human adult tissue types. We specified a linear-effects regression model to identify tissue-specific biases indicating regulated APA; the significance of differences between tissue types was assessed by an appropriately designed permutation test. This combination allowed to identify highly specific subsets of APA events in the individual tissue types. Predictive models successfully classified constitutive polyA sites from a biologically relevant background (auROC = 99.6%), as well as tissue-specific regulated sets from each other. We found that the main cis-regulatory elements described for polyadenylation are a strong, and highly informative, hallmark for constitutive sites only. Tissue-specific regulated sites were found to contain other regulatory motifs, with the canonical polyadenylation signal being nearly absent at brain-specific polyA sites. Together, our results contribute to the understanding of the diversity of post-transcriptional gene regulation. Availability: Raw data are deposited on SRA, accession numbers: brain SRX208132, kidney SRX208087 and liver SRX208134. Processed datasets as well as model code are published on our website: http://www.genome.duke.edu/labs/ohler/research/UTR/ Contact:uwe.ohler@duke.edu
Collapse
Affiliation(s)
- Dina Hafez
- Department of Computer Science, Duke University, Durham, NC 27708, USA
| | | | | | | | | |
Collapse
|
28
|
Xie B, Jankovic BR, Bajic VB, Song L, Gao X. Poly(A) motif prediction using spectral latent features from human DNA sequences. Bioinformatics 2013; 29:i316-25. [PMID: 23813000 PMCID: PMC3694652 DOI: 10.1093/bioinformatics/btt218] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA. Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge. RESULTS We propose a novel machine-learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we used hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine-tune the classification performance. We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14 740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of the previous state-of-the-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false-negative rate and false-positive rate by 26, 15 and 35%, respectively. Meanwhile, our method makes ~30% fewer error predictions relative to the other string kernels. Furthermore, our method can be used to visualize the importance of oligomers and positions in predicting poly(A) motifs, from which we can observe a number of characteristics in the surrounding regions of true and false motifs that have not been reported before. AVAILABILITY http://sfb.kaust.edu.sa/Pages/Software.aspx. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bo Xie
- College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | | | | | | | | |
Collapse
|
29
|
Li XQ, Du D. RNA polyadenylation sites on the genomes of microorganisms, animals, and plants. PLoS One 2013; 8:e79511. [PMID: 24260238 PMCID: PMC3832601 DOI: 10.1371/journal.pone.0079511] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2013] [Accepted: 09/29/2013] [Indexed: 01/15/2023] Open
Abstract
Pre–messenger RNA (mRNA) 3′-end cleavage and subsequent polyadenylation strongly regulate gene expression. In comparison with the upstream or downstream motifs, relatively little is known about the feature differences of polyadenylation [poly(A)] sites among major kingdoms. We suspect that the precise poly(A) sites are very selective, and we therefore mapped mRNA poly(A) sites on complete and nearly complete genomes using mRNA sequences available in the National Center for Biotechnology Information (NCBI) Nucleotide database. In this paper, we describe the mRNA nucleotide [i.e., the poly(A) tail attachment position] that is directly in attachment with the poly(A) tail and the pre-mRNA nucleotide [i.e., the poly(A) tail starting position] that corresponds to the first adenosine of the poly(A) tail in the 29 most-mapped species (2 fungi, 2 protists, 18 animals, and 7 plants). The most representative pre-mRNA dinucleotides covering these two positions were UA, CA, and GA in 17, 10, and 2 of the species, respectively. The pre-mRNA nucleotide at the poly(A) tail starting position was typically an adenosine [i.e., A-type poly(A) sites], sometimes a uridine, and occasionally a cytidine or guanosine. The order was U>C>G at the attachment position but A>>U>C≥G at the starting position. However, in comparison with the mRNA nucleotide composition (base composition), the poly(A) tail attachment position selected C over U in plants and both C and G over U in animals, in both A-type and non-A-type poly(A) sites. Animals, dicot plants, and monocot plants had clear differences in C/G ratios at the poly(A) tail attachment position of the non-A-type poly(A) sites. This study of poly(A) site evolution indicated that the two positions within poly(A) sites had distinct nucleotide compositions and were different among kingdoms.
Collapse
Affiliation(s)
- Xiu-Qing Li
- Molecular Genetics Laboratory, Potato Research Centre, Agriculture and Agri-Food Canada, Fredericton, New Brunswick, Canada
- * E-mail:
| | - Donglei Du
- Quantitative Methods Research Group, Faculty of Business Administration, University of New Brunswick, Fredericton, New Brunswick, Canada
| |
Collapse
|
30
|
Han J, Liu Z, Zhong D, Wang T. A hybrid model for the prediction of mRNA polyadenylation signals. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2013; 2013:3511-4. [PMID: 24110486 DOI: 10.1109/embc.2013.6610299] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The mRNA polyadenylation is the cellular process that adds adenosine tails to mature mRNAs. Malfunction of polyadenylation has been implicated in several human diseases. In this paper, we proposed a novel feature extraction approach which employs the K-gram nucleotide acid pattern, the position weight matrix (PWM) and the increment of diversity (ID) to represent the original features. Then Principle Component Analysis (PCA) was applied to transform the original features into a new feature space where the low-dimensional features were used to train the real-coded genetic neural network model. In the experiments, our proposed algorithm (GA-BP) can achieve the accuracy about 82.98%, specificity 82.95% and sensitivity 83.01% in the specific dataset constructed by Kalkatawi. The results demonstrate that GA-BP is a promising algorithm for the prediction of mRNA polyadenylation signals.
Collapse
|
31
|
Wright CB, Chrenek MA, Foster SL, Duncan T, Redmond TM, Pardue MT, Boatright JH, Nickerson JM. Complementation test of Rpe65 knockout and tvrm148. Invest Ophthalmol Vis Sci 2013; 54:5111-22. [PMID: 23778877 DOI: 10.1167/iovs.13-12336] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
PURPOSE A mouse mutation, tvrm148, was previously reported as resulting in retinal degeneration. Tvrm148 and Rpe65 map between markers D3Mit147 and D3Mit19 on a genetic map, but the physical map places RPE65 outside the markers. We asked if Rpe65 or perhaps another nearby gene is mutated and if the mutant reduced 11-cis-retinal levels. We studied the impact of the tvrm148 mutation on visual function, morphology, and retinoid levels. METHODS Normal phase HPLC was used to measure retinoid levels. Rpe65(+/+), tvrm148/+ (T(+/-)), tvrm148/tvrm148 (T(-/-)), RPE65(KO/KO) (Rpe65(-/-)), and Rpe65(T/-) mice visual function was measured by optokinetic tracking (OKT) and electroretinography (ERG). Morphology was assessed by light microscopy and transmission electron microscopy (TEM). qRT-PCR was used to measure Rpe65 mRNA levels. Immunoblotting measured the size and amount of RPE65 protein. RESULTS The knockout and tvrm148 alleles did not complement. No 11-cis-retinal was detected in T(-/-) or Rpe65(-/-) mice. Visual acuity in Rpe65(+/+) and T(+/-) mouse was -0.382 c/d, but 0.037 c/d in T(-/-) mice at postnatal day 210 (P210). ERG response in T(-/-) mice was undetectable except at bright flash intensities. Outer nuclear layer (ONL) thickness in T(-/-) mice was -70% of Rpe65(+/+) by P210. Rpe65 mRNA levels in T(-/-) mice were unchanged, yet 14.5% of Rpe65(+/+) protein levels was detected. Protein size was unchanged. CONCLUSIONS A complementation test revealed the RPE65 knockout and tvrm148 alleles do not complement, proving that the tvrm148 mutation is in Rpe65. Behavioral, physiological, molecular, biochemical, and histological approaches indicate that tvrm148 is a null allele of Rpe65.
Collapse
Affiliation(s)
- Charles B Wright
- Department of Ophthalmology, School of Medicine, Emory University, Atlanta, Georgia 30322, USA
| | | | | | | | | | | | | | | |
Collapse
|
32
|
Schabath MB, Giuliano AR, Thompson ZJ, Amankwah EK, Gray JE, Fenstermacher DA, Jonathan KA, Beg AA, Haura EB. TNFRSF10B polymorphisms and haplotypes associated with increased risk of death in non-small cell lung cancer. Carcinogenesis 2013; 34:2525-30. [PMID: 23839018 DOI: 10.1093/carcin/bgt244] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Presently, there are few validated biomarkers that can predict survival or treatment response for non-small cell lung cancer (NSCLC) and most are based on tumor markers. Biomarkers based on germ line DNA variations represent a valuable complementary strategy, which could have translational implications by subclassifying patients to tailored, patient-specific treatment. We analyzed single nucleotide polymorphisms (SNPs) in 53 inflammation-related genes among 651 NSCLC patients. Multivariable Cox proportional hazard models, adjusted for lung cancer prognostic factors, were used to assess the association of genotypes and haplotypes with overall survival. Four of the top 15 SNPs associated with survival were located in the TNF-receptor superfamily member 10b (TNFRSF10B) gene. The T-allele of the top ranked SNP (rs11785599) was associated with a 41% increased risk of death (95% confidence interval [CI] = 1.16-1.70) and the other three TNFRSF10B SNPs (rs1047275, rs4460370 and rs883429) exhibited a 35% (95% CI = 1.11-1.65), 29% (95% CI = 1.06-1.57) and 24% (95% CI = 0.99-1.54) increased risk of death, respectively. Haplotype analyses revealed that the most common risk haplotype (TCTT) was associated with a 78% (95% CI = 1.25-2.54) increased risk of death compared with the low-risk haplotype (CGCC). When the data were stratified by treatment, the risk haplotypes exhibited statistically significantly increased risk of death among patients who had surgery only and no statistically significant effects among patients who had surgery and adjuvant chemotherapy. These data suggest that possessing one or more risk alleles in TNFRSF10B is associated with an increased risk of death. Validated germ line biomarkers may have potential important clinical implications by optimizing patient-specific treatment.
Collapse
|
33
|
Patnala R, Clements J, Batra J. Candidate gene association studies: a comprehensive guide to useful in silico tools. BMC Genet 2013; 14:39. [PMID: 23656885 PMCID: PMC3655892 DOI: 10.1186/1471-2156-14-39] [Citation(s) in RCA: 80] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Accepted: 04/15/2013] [Indexed: 01/01/2023] Open
Abstract
The candidate gene approach has been a pioneer in the field of genetic epidemiology, identifying risk alleles and their association with clinical traits. With the advent of rapidly changing technology, there has been an explosion of in silico tools available to researchers, giving them fast, efficient resources and reliable strategies important to find casual gene variants for candidate or genome wide association studies (GWAS). In this review, following a description of candidate gene prioritisation, we summarise the approaches to single nucleotide polymorphism (SNP) prioritisation and discuss the tools available to assess functional relevance of the risk variant with consideration to its genomic location. The strategy and the tools discussed are applicable to any study investigating genetic risk factors associated with a particular disease. Some of the tools are also applicable for the functional validation of variants relevant to the era of GWAS and next generation sequencing (NGS).
Collapse
Affiliation(s)
- Radhika Patnala
- Australian Prostate Cancer Research Centre - Queensland, Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, QLD 4059, Australia
| | | | | |
Collapse
|
34
|
Molecular cloning, expression profiles and subcellular localization of cyclin B in ovary of the mud crab, Scylla paramamosain. Genes Genomics 2013. [DOI: 10.1007/s13258-013-0077-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
35
|
Rehfeld A, Plass M, Krogh A, Friis-Hansen L. Alterations in polyadenylation and its implications for endocrine disease. Front Endocrinol (Lausanne) 2013; 4:53. [PMID: 23658553 PMCID: PMC3647115 DOI: 10.3389/fendo.2013.00053] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/10/2013] [Accepted: 04/22/2013] [Indexed: 12/17/2022] Open
Abstract
INTRODUCTION Polyadenylation is the process in which the pre-mRNA is cleaved at the poly(A) site and a poly(A) tail is added - a process necessary for normal mRNA formation. Genes with multiple poly(A) sites can undergo alternative polyadenylation (APA), producing distinct mRNA isoforms with different 3' untranslated regions (3' UTRs) and in some cases different coding regions. Two thirds of all human genes undergo APA. The efficiency of the polyadenylation process regulates gene expression and APA plays an important part in post-transcriptional regulation, as the 3' UTR contains various cis-elements associated with post-transcriptional regulation, such as target sites for micro-RNAs and RNA-binding proteins. Implications of alterations in polyadenylation for endocrine disease: Alterations in polyadenylation have been found to be causative of neonatal diabetes and IPEX (immune dysfunction, polyendocrinopathy, enteropathy, X-linked) and to be associated with type I and II diabetes, pre-eclampsia, fragile X-associated premature ovarian insufficiency, ectopic Cushing syndrome, and many cancer diseases, including several types of endocrine tumor diseases. PERSPECTIVES Recent developments in high-throughput sequencing have made it possible to characterize polyadenylation genome-wide. Antisense elements inhibiting or enhancing specific poly(A) site usage can induce desired alterations in polyadenylation, and thus hold the promise of new therapeutic approaches. SUMMARY This review gives a detailed description of alterations in polyadenylation in endocrine disease, an overview of the current literature on polyadenylation and summarizes the clinical implications of the current state of research in this field.
Collapse
Affiliation(s)
- Anders Rehfeld
- Genomic Medicine, Rigshospitalet, Copenhagen University HospitalCopenhagen, Denmark
| | - Mireya Plass
- Department of Biology, The Bioinformatics Centre, University of CopenhagenCopenhagen, Denmark
| | - Anders Krogh
- Department of Biology, The Bioinformatics Centre, University of CopenhagenCopenhagen, Denmark
| | - Lennart Friis-Hansen
- Genomic Medicine, Rigshospitalet, Copenhagen University HospitalCopenhagen, Denmark
- *Correspondence: Lennart Friis-Hansen, Genomic Medicine, Rigshospitalet, Copenhagen University Hospital, 4113, Blegdamsvej 9, DK2100 Copenhagen, Denmark. e-mail:
| |
Collapse
|
36
|
Bajic VB, Charn TH, Xu JX, Panda SK, T Krishnan SP. Prediction Models for DNA Transcription Termination Based on SOM Networks. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2012; 2005:4791-4. [PMID: 17281313 DOI: 10.1109/iembs.2005.1615543] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This paper presents two efficient models for predicting transcription termination (TT) in human DNA. A neural network, self-organizing map, was used for finding features from a human polyadenylation (polyA) sites dataset. We derived prediction models related to different polyA signals. A program, "Dragon PolyAtt", for predicting TT regions was designed for the two most frequent polyA sites "AAUAAA" and "AUUAAA". In our tests, Dragon PolyAtt predicts TT regions with a sensitivity of 48.4% (13.6%) and specificity of 74% (79.1%) when searching for polyA signal "AAUAAA" ("AUUAAA"). Both tests were done on human chromosome 21. Results of Dragon PolyAtt system are substantially better than those obtained by the well-known "polyadq" program.
Collapse
|
37
|
Neuronal classification and marker gene identification via single-cell expression profiling of brainstem vestibular neurons subserving cerebellar learning. J Neurosci 2012; 32:7819-31. [PMID: 22674258 DOI: 10.1523/jneurosci.0543-12.2012] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Identification of marker genes expressed in specific cell types is essential for the genetic dissection of neural circuits. Here we report a new strategy for classifying heterogeneous populations of neurons into functionally distinct types and for identifying associated marker genes. Quantitative single-cell expression profiling of genes related to neurotransmitters and ion channels enables functional classification of neurons; transcript profiles for marker gene candidates identify molecular handles for manipulating each cell type. We apply this strategy to the mouse medial vestibular nucleus (MVN), which comprises several types of neurons subserving cerebellar-dependent learning in the vestibulo-ocular reflex. Ion channel gene expression differed both qualitatively and quantitatively across cell types and could distinguish subtle differences in intrinsic electrophysiology. Single-cell transcript profiling of MVN neurons established six functionally distinct cell types and associated marker genes. This strategy is applicable throughout the nervous system and could facilitate the use of molecular genetic tools to examine the behavioral roles of distinct neuronal populations.
Collapse
|
38
|
Saravanaperumal SA, Pediconi D, Renieri C, La Terza A. Skipping of exons by premature termination of transcription and alternative splicing within intron-5 of the sheep SCF gene: a novel splice variant. PLoS One 2012; 7:e38657. [PMID: 22719917 PMCID: PMC3376141 DOI: 10.1371/journal.pone.0038657] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2011] [Accepted: 05/08/2012] [Indexed: 11/23/2022] Open
Abstract
Stem cell factor (SCF) is a growth factor, essential for haemopoiesis, mast cell development and melanogenesis. In the hematopoietic microenvironment (HM), SCF is produced either as a membrane-bound (-) or soluble (+) forms. Skin expression of SCF stimulates melanocyte migration, proliferation, differentiation, and survival. We report for the first time, a novel mRNA splice variant of SCF from the skin of white merino sheep via cloning and sequencing. Reverse transcriptase (RT)-PCR and molecular prediction revealed two different cDNA products of SCF. Full-length cDNA libraries were enriched by the method of rapid amplification of cDNA ends (RACE-PCR). Nucleotide sequencing and molecular prediction revealed that the primary 1519 base pair (bp) cDNA encodes a precursor protein of 274 amino acids (aa), commonly known as 'soluble' isoform. In contrast, the shorter (835 and/or 725 bp) cDNA was found to be a 'novel' mRNA splice variant. It contains an open reading frame (ORF) corresponding to a truncated protein of 181 aa (vs 245 aa) with an unique C-terminus lacking the primary proteolytic segment (28 aa) right after the D(175)G site which is necessary to produce 'soluble' form of SCF. This alternative splice (AS) variant was explained by the complete nucleotide sequencing of splice junction covering exon 5-intron (5)-exon 6 (948 bp) with a premature termination codon (PTC) whereby exons 6 to 9/10 are skipped (Cassette Exon, CE 6-9/10). We also demonstrated that the Northern blot analysis at transcript level is mediated via an intron-5 splicing event. Our data refine the structure of SCF gene; clarify the presence (+) and/or absence (-) of primary proteolytic-cleavage site specific SCF splice variants. This work provides a basis for understanding the functional role and regulation of SCF in hair follicle melanogenesis in sheep beyond what was known in mice, humans and other mammals.
Collapse
Affiliation(s)
| | - Dario Pediconi
- School of Environmental Sciences, University of Camerino, via Gentile III da Varano, Camerino (MC), Italy
| | - Carlo Renieri
- School of Environmental Sciences, University of Camerino, via Gentile III da Varano, Camerino (MC), Italy
| | - Antonietta La Terza
- School of Environmental Sciences, University of Camerino, via Gentile III da Varano, Camerino (MC), Italy
| |
Collapse
|
39
|
3D profile-based approach to proteome-wide discovery of novel human chemokines. PLoS One 2012; 7:e36151. [PMID: 22586462 PMCID: PMC3346806 DOI: 10.1371/journal.pone.0036151] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2012] [Accepted: 03/27/2012] [Indexed: 12/29/2022] Open
Abstract
Chemokines are small secreted proteins with important roles in immune responses. They consist of a conserved three-dimensional (3D) structure, so-called IL8-like chemokine fold, which is supported by disulfide bridges characteristic of this protein family. Sequence- and profile-based computational methods have been proficient in discovering novel chemokines by making use of their sequence-conserved cysteine patterns. However, it has been recently shown that some chemokines escaped annotation by these methods due to low sequence similarity to known chemokines and to different arrangement of cysteines in sequence and in 3D. Innovative methods overcoming the limitations of current techniques may allow the discovery of new remote homologs in the still functionally uncharacterized fraction of the human genome. We report a novel computational approach for proteome-wide identification of remote homologs of the chemokine family that uses fold recognition techniques in combination with a scaffold-based automatic mapping of disulfide bonds to define a 3D profile of the chemokine protein family. By applying our methodology to all currently uncharacterized human protein sequences, we have discovered two novel proteins that, without having significant sequence similarity to known chemokines or characteristic cysteine patterns, show strong structural resemblance to known anti-HIV chemokines. Detailed computational analysis and experimental structural investigations based on mass spectrometry and circular dichroism support our structural predictions and highlight several other chemokine-like features. The results obtained support their functional annotation as putative novel chemokines and encourage further experimental characterization. The identification of remote homologs of human chemokines may provide new insights into the molecular mechanisms causing pathologies such as cancer or AIDS, and may contribute to the development of novel treatments. Besides, the genome-wide applicability of our methodology based on 3D protein family profiles may open up new possibilities for improving and accelerating protein function annotation processes.
Collapse
|
40
|
Martins R, Proença D, Silva B, Barbosa C, Silva AL, Faustino P, Romão L. Alternative polyadenylation and nonsense-mediated decay coordinately regulate the human HFE mRNA levels. PLoS One 2012; 7:e35461. [PMID: 22530027 PMCID: PMC3329446 DOI: 10.1371/journal.pone.0035461] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2011] [Accepted: 03/18/2012] [Indexed: 01/06/2023] Open
Abstract
Nonsense-mediated decay (NMD) is an mRNA surveillance pathway that selectively recognizes and degrades defective mRNAs carrying premature translation-termination codons. However, several studies have shown that NMD also targets physiological transcripts that encode full-length proteins, modulating their expression. Indeed, some features of physiological mRNAs can render them NMD-sensitive. Human HFE is a MHC class I protein mainly expressed in the liver that, when mutated, can cause hereditary hemochromatosis, a common genetic disorder of iron metabolism. The HFE gene structure comprises seven exons; although the sixth exon is 1056 base pairs (bp) long, only the first 41 bp encode for amino acids. Thus, the remaining downstream 1015 bp sequence corresponds to the HFE 3′ untranslated region (UTR), along with exon seven. Therefore, this 3′ UTR encompasses an exon/exon junction, a feature that can make the corresponding physiological transcript NMD-sensitive. Here, we demonstrate that in UPF1-depleted or in cycloheximide-treated HeLa and HepG2 cells the HFE transcripts are clearly upregulated, meaning that the physiological HFE mRNA is in fact an NMD-target. This role of NMD in controlling the HFE expression levels was further confirmed in HeLa cells transiently expressing the HFE human gene. Besides, we show, by 3′-RACE analysis in several human tissues that HFE mRNA expression results from alternative cleavage and polyadenylation at four different sites – two were previously described and two are novel polyadenylation sites: one located at exon six, which confers NMD-resistance to the corresponding transcripts, and another located at exon seven. In addition, we show that the amount of HFE mRNA isoforms resulting from cleavage and polyadenylation at exon seven, although present in both cell lines, is higher in HepG2 cells. These results reveal that NMD and alternative polyadenylation may act coordinately to control HFE mRNA levels, possibly varying its protein expression according to the physiological cellular requirements.
Collapse
Affiliation(s)
- Rute Martins
- Departamento de Genética, Instituto Nacional de Saúde Dr. Ricardo Jorge, Lisboa, Portugal
| | - Daniela Proença
- Departamento de Genética, Instituto Nacional de Saúde Dr. Ricardo Jorge, Lisboa, Portugal
| | - Bruno Silva
- Departamento de Genética, Instituto Nacional de Saúde Dr. Ricardo Jorge, Lisboa, Portugal
| | - Cristina Barbosa
- Departamento de Genética, Instituto Nacional de Saúde Dr. Ricardo Jorge, Lisboa, Portugal
| | - Ana Luísa Silva
- Departamento de Genética, Instituto Nacional de Saúde Dr. Ricardo Jorge, Lisboa, Portugal
| | - Paula Faustino
- Departamento de Genética, Instituto Nacional de Saúde Dr. Ricardo Jorge, Lisboa, Portugal
| | - Luísa Romão
- Departamento de Genética, Instituto Nacional de Saúde Dr. Ricardo Jorge, Lisboa, Portugal
- BioFIG - Center for Biodiversity, Functional and Integrative Genomics, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
- * E-mail:
| |
Collapse
|
41
|
Liu JL, Liang XH, Su RW, Lei W, Jia B, Feng XH, Li ZX, Yang ZM. Combined analysis of microRNome and 3'-UTRome reveals a species-specific regulation of progesterone receptor expression in the endometrium of rhesus monkey. J Biol Chem 2012; 287:13899-910. [PMID: 22378788 DOI: 10.1074/jbc.m111.301275] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The establishment of endometrial receptivity is a prerequisite for successful pregnancy, which is controlled by a complex mechanism. MicroRNAs (miRNAs) are small non-coding RNAs that have emerged as important regulators of gene expression. However, the contribution of miRNAs in endometrial receptivity is still unknown. Here we used rhesus monkey as an animal model and compared the endometrial miRNA expression profiles during early-secretory (pre-receptive) phase and mid-secretory (receptive) phase by deep sequencing. A set of differentially expressed miRNAs were identified, 8 of which were selected and validated using quantitative RT-PCR. To facilitate the prediction of their target genes, the 3'-UTRome was also determined using tag sequencing of mRNA 3'-termini. Surprisingly, about 50% of the 10,677 genes expressed in the rhesus monkey endometrium exhibited alternative 3'-UTRs. Of special interest, the progesterone receptor (PGR) gene, which is necessary for endometrial receptivity, processes an ultra long 3'-UTR (~10 kb) along with a short variant (~2.5 kb). Evolutionary analysis showed that the 3'-UTR sequences of PGR are poorly conserved between primates and rodents, suggesting a species-biased miRNA binding pattern. We further demonstrated that PGR is a valid target of miR-96 in rhesus monkey and human but not in rodents, whereas the regulation of PGR by miR-375 is rhesus monkey-specific. Additionally, we found that miR-219-5p regulates PGR expression through a primate-specific long non-coding RNA immediately downstream of the PGR locus. Our study provides new insights into the molecular mechanisms underlying endometrial receptivity and presents intriguing species-specific regulatory roles of miRNAs.
Collapse
Affiliation(s)
- Ji-Long Liu
- Department of Biology, Shantou University, Shantou 515063, China
| | | | | | | | | | | | | | | |
Collapse
|
42
|
Kalkatawi M, Rangkuti F, Schramm M, Jankovic BR, Kamau A, Chowdhary R, Archer JAC, Bajic VB. Dragon PolyA Spotter: predictor of poly(A) motifs within human genomic DNA sequences. ACTA ACUST UNITED AC 2011; 28:127-9. [PMID: 22088842 PMCID: PMC3244764 DOI: 10.1093/bioinformatics/btr602] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Motivation: Recognition of poly(A) signals in mRNA is relatively straightforward due to the presence of easily recognizable polyadenylic acid tail. However, the task of identifying poly(A) motifs in the primary genomic DNA sequence that correspond to poly(A) signals in mRNA is a far more challenging problem. Recognition of poly(A) signals is important for better gene annotation and understanding of the gene regulation mechanisms. In this work, we present one such poly(A) motif prediction method based on properties of human genomic DNA sequence surrounding a poly(A) motif. These properties include thermodynamic, physico-chemical and statistical characteristics. For predictions, we developed Artificial Neural Network and Random Forest models. These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity and specificity and furthermore provide a consistent level of accuracy for 12 poly(A) motif variants. Contact:vladimir.bajic@kaust.edu.sa Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Manal Kalkatawi
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia
| | | | | | | | | | | | | | | |
Collapse
|
43
|
Kaer K, Branovets J, Hallikma A, Nigumann P, Speek M. Intronic L1 retrotransposons and nested genes cause transcriptional interference by inducing intron retention, exonization and cryptic polyadenylation. PLoS One 2011; 6:e26099. [PMID: 22022525 PMCID: PMC3192792 DOI: 10.1371/journal.pone.0026099] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2011] [Accepted: 09/19/2011] [Indexed: 12/30/2022] Open
Abstract
Background Transcriptional interference has been recently recognized as an unexpectedly complex and mostly negative regulation of genes. Despite a relatively few studies that emerged in recent years, it has been demonstrated that a readthrough transcription derived from one gene can influence the transcription of another overlapping or nested gene. However, the molecular effects resulting from this interaction are largely unknown. Methodology/Principal Findings Using in silico chromosome walking, we searched for prematurely terminated transcripts bearing signatures of intron retention or exonization of intronic sequence at their 3′ ends upstream to human L1 retrotransposons, protein-coding and noncoding nested genes. We demonstrate that transcriptional interference induced by intronic L1s (or other repeated DNAs) and nested genes could be characterized by intron retention, forced exonization and cryptic polyadenylation. These molecular effects were revealed from the analysis of endogenous transcripts derived from different cell lines and tissues and confirmed by the expression of three minigenes in cell culture. While intron retention and exonization were comparably observed in introns upstream to L1s, forced exonization was preferentially detected in nested genes. Transcriptional interference induced by L1 or nested genes was dependent on the presence or absence of cryptic splice sites, affected the inclusion or exclusion of the upstream exon and the use of cryptic polyadenylation signals. Conclusions/Significance Our results suggest that transcriptional interference induced by intronic L1s and nested genes could influence the transcription of the large number of genes in normal as well as in tumor tissues. Therefore, this type of interference could have a major impact on the regulation of the host gene expression.
Collapse
Affiliation(s)
- Kristel Kaer
- Department of Gene Technology, Tallinn University of Technology, Tallinn, Estonia
| | - Jelena Branovets
- Department of Gene Technology, Tallinn University of Technology, Tallinn, Estonia
| | - Anni Hallikma
- Department of Gene Technology, Tallinn University of Technology, Tallinn, Estonia
| | - Pilvi Nigumann
- Department of Gene Technology, Tallinn University of Technology, Tallinn, Estonia
| | - Mart Speek
- Department of Gene Technology, Tallinn University of Technology, Tallinn, Estonia
- * E-mail:
| |
Collapse
|
44
|
Why does the giant panda eat bamboo? A comparative analysis of appetite-reward-related genes among mammals. PLoS One 2011; 6:e22602. [PMID: 21818345 PMCID: PMC3144909 DOI: 10.1371/journal.pone.0022602] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2011] [Accepted: 06/25/2011] [Indexed: 01/08/2023] Open
Abstract
Background The giant panda has an interesting bamboo diet unlike the other species in the order of Carnivora. The umami taste receptor gene T1R1 has been identified as a pseudogene during its genome sequencing project and confirmed using a different giant panda sample. The estimated mutation time for this gene is about 4.2 Myr. Such mutation coincided with the giant panda's dietary change and also reinforced its herbivorous life style. However, as this gene is preserved in herbivores such as cow and horse, we need to look for other reasons behind the giant panda's diet switch. Methodology/Principal Findings Since taste is part of the reward properties of food related to its energy and nutrition contents, we did a systematic analysis on those genes involved in the appetite-reward system for the giant panda. We extracted the giant panda sequence information for those genes and compared with the human sequence first and then with seven other species including chimpanzee, mouse, rat, dog, cat, horse, and cow. Orthologs in panda were further analyzed based on the coding region, Kozak consensus sequence, and potential microRNA binding of those genes. Conclusions/Significance Our results revealed an interesting dopamine metabolic involvement in the panda's food choice. This finding suggests a new direction for molecular evolution studies behind the panda's dietary switch.
Collapse
|
45
|
Belancio VP. Importance of RNA analysis in interpretation of reporter gene expression data. Anal Biochem 2011; 417:159-61. [PMID: 21693100 DOI: 10.1016/j.ab.2011.05.035] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2011] [Revised: 05/20/2011] [Accepted: 05/23/2011] [Indexed: 11/26/2022]
Abstract
Reporter gene assays have proven to be an important tool in analyzing cis and trans factors that influence gene expression. However, they have sometimes been adapted for studies in which they are not totally reliable. Modifications that change the RNA expressed from the reporter gene may result in regulation of reporter gene expression at multiple levels simultaneously. The data provided here illustrate the difficulties that may arise from posttranscription regulation in various reporter gene formats. This serves as a warning that further RNA studies may be necessary, if comparisons are to be made between reporter constructs whose RNA is not identical.
Collapse
Affiliation(s)
- Victoria P Belancio
- Department of Structural and Cellular Biology, Tulane School of Medicine, Tulane Cancer Center, New Orleans, LA 70112, USA.
| |
Collapse
|
46
|
Ying SH, Feng MG. A conidial protein (CP15) of Beauveria bassiana contributes to the conidial tolerance of the entomopathogenic fungus to thermal and oxidative stresses. Appl Microbiol Biotechnol 2011; 90:1711-20. [PMID: 21455593 DOI: 10.1007/s00253-011-3205-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2011] [Revised: 02/07/2011] [Accepted: 02/07/2011] [Indexed: 11/25/2022]
Abstract
Aerial conidia are central dispersing structures for most fungi and represent the infectious propagule for entomopathogenic fungus Beauveria bassiana, thus the active ingredients of commercial mycoinsecticides. Although a number of formic-acid-extractable (FAE) cell wall proteins from conidia have been characterized, the functions of many such proteins remain obscure. We report that a conidial FAE protein, termed CP15, isolated from B. bassiana is related to fungal tolerance to thermal and oxidative stresses. The full-length genomic sequence of CP15 was shown to lack introns, encoding for a 131 amino acid protein (15.0 kDa) with no sequence identity to any known proteins in the NCBI database. The function of this new gene with two genomic copies was examined using the antisense-RNA method. Five transgenic strains displayed various degrees of silenced CP15 expression, resulting in significantly reduced conidial FAE protein profiles. The FAE protein contents of the strains were linearly correlated to the survival indices of their conidia when exposed to 30-min wet stress at 48°C (r (2) = 0.93). Under prolonged 75-min heat stress, the median lethal times (LT(50)s) of their conidia were significantly reduced by 13.6-29.5%. The CP15 silenced strains were also 20-50% less resistant to oxidative stress but were not affected with respect to UV-B or hyperosmotic stress. Our data indicate that discrete conidial proteins may mediate resistance to some abiotic stresses, and that manipulation of such proteins may be a viable approach to enhancing the environmental fitness of B. bassiana for more persisting control of insect pests in warmer climates.
Collapse
Affiliation(s)
- Sheng-Hua Ying
- Institute of Microbiology, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, People's Republic of China
| | | |
Collapse
|
47
|
Insights into Polyomaviridae microRNA function derived from study of the bandicoot papillomatosis carcinomatosis viruses. J Virol 2011; 85:4487-500. [PMID: 21345962 DOI: 10.1128/jvi.02557-10] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Several different members of the Polyomaviridae, including some human pathogens, encode microRNAs (miRNAs) that lie antisense with respect to the early gene products, the tumor (T) antigens. These miRNAs negatively regulate T antigen expression by directing small interfering RNA (siRNA)-like cleavage of the early transcripts. miRNA mutant viruses of some members of the Polyomaviridae express increased levels of early proteins during lytic infection. However, the importance of miRNA-mediated negative regulation of the T antigens remains uncertain. Bandicoot papillomatosis carcinomatosis virus type 1 (BPCV1) is associated with papillomas and carcinomas in the endangered marsupial the western barred bandicoot (Perameles bougainville). BPCV1 is the founding member of a new group of viruses that remarkably share distinct properties in common with both the polyomavirus and papillomavirus families. Here, we show that BPCV1 encodes, in the same orientation as the papillomavirus-like transcripts, a miRNA located within a long noncoding region (NCR) of the genome. Furthermore, this NCR serves the function of both promoter and template for the primary transcript that gives rise to the miRNA. Unlike the polyomavirus miRNAs, the BPCV1 miRNA is not encoded antisense to the T antigen transcripts but rather lies in a separate, proximal region of the genome. We have mapped the 3' untranslated region (UTR) of the BPCV1 large T antigen early transcript and identified a functional miRNA target site that is imperfectly complementary to the BPCV1 miRNA. Chimeric reporters containing the entire BPCV1 T antigen 3' UTR undergo negative regulation when coexpressed with the BPCV1 miRNA. Notably, the degree of negative regulation observed is equivalent to that of an identical reporter that is engineered to bind to the BPCV1 miRNA with perfect complementarity. We also show that this miRNA and this novel mode of early gene regulation are conserved with the related BPCV2. Finally, papillomatous lesions from a western barred bandicoot express readily detectable levels of this miRNA, stressing its likely importance in vivo. Combined, the alternative mechanisms of negative regulation of T antigen expression between the BPCVs and the polyomaviruses support the importance of miRNA-mediated autoregulation in the life cycles of some divergent polyomaviruses and polyomavirus-like viruses.
Collapse
|
48
|
Characterization and prediction of mRNA polyadenylation sites in human genes. Med Biol Eng Comput 2011; 49:463-72. [PMID: 21286831 DOI: 10.1007/s11517-011-0732-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2009] [Accepted: 01/02/2011] [Indexed: 12/31/2022]
Abstract
The accurate identification of potential poly(A) sites has contributed to all many studies with regard to alternative polyadenylation. The aim of this study was the development of a machine-learning methodology that will help to discriminate real polyadenylation signals from randomly occurring signals in genomic sequence. Since previous studies have revealed that RNA secondary structure in certain genes has significant impact, the authors tried to computationally pinpoint common structural patterns around the poly(A) sites and to investigate how RNA secondary structure may influence polyadenylation. This involved an initial study on the impact of RNA structure and it was found using motif search tools that hairpin structures might be important. Thus, it was propose that, in addition to the sequence pattern around poly(A) sites, there exists a widespread structural pattern that is also employed during human mRNA polyadenylation. In this study, the authors present a computational model that uses support vector machines to predict human poly(A) sites. The results show that this predictive model has a comparable performance to the current prediction tool. In addition, it was identified common structural patterns associated with polyadenylation using several motif finding programs and this provides new insight into the role of RNA secondary structure plays in polyadenylation.
Collapse
|
49
|
Wiedemann SM, Mildner SN, Bönisch C, Israel L, Maiser A, Matheisl S, Straub T, Merkl R, Leonhardt H, Kremmer E, Schermelleh L, Hake SB. Identification and characterization of two novel primate-specific histone H3 variants, H3.X and H3.Y. ACTA ACUST UNITED AC 2010; 190:777-91. [PMID: 20819935 PMCID: PMC2935562 DOI: 10.1083/jcb.201002043] [Citation(s) in RCA: 99] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
The expression of a new histone variant H3.Y increases during cellular stress to regulate cell cycle progression and gene expression. Nucleosomal incorporation of specialized histone variants is an important mechanism to generate different functional chromatin states. Here, we describe the identification and characterization of two novel primate-specific histone H3 variants, H3.X and H3.Y. Their messenger RNAs are found in certain human cell lines, in addition to several normal and malignant human tissues. In keeping with their primate specificity, H3.X and H3.Y are detected in different brain regions. Transgenic H3.X and H3.Y proteins are stably incorporated into chromatin in a similar fashion to the known H3 variants. Importantly, we demonstrate biochemically and by mass spectrometry that endogenous H3.Y protein exists in vivo, and that stress stimuli, such as starvation and cellular density, increase the abundance of H3.Y-expressing cells. Global transcriptome analysis revealed that knockdown of H3.Y affects cell growth and leads to changes in the expression of many genes involved in cell cycle control. Thus, H3.Y is a novel histone variant involved in the regulation of cellular responses to outside stimuli.
Collapse
Affiliation(s)
- Sonja M Wiedemann
- Adolf-Butenandt-Institute, Department of Molecular Biology, Ludwig Maximilians University of Munich, 80336 Munich, Germany
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Poly(A) signals located near the 5' end of genes are silenced by a general mechanism that prevents premature 3'-end processing. Mol Cell Biol 2010; 31:639-51. [PMID: 21135120 DOI: 10.1128/mcb.00919-10] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Poly(A) signals located at the 3' end of eukaryotic genes drive cleavage and polyadenylation at the same end of pre-mRNA. Although these sequences are expected only at the 3' end of genes, we found that strong poly(A) signals are also predicted within the 5' untranslated regions (UTRs) of many Drosophila melanogaster mRNAs. Most of these 5' poly(A) signals have little influence on the processing of the endogenous transcripts, but they are very active when placed at the 3' end of reporter genes. In investigating these unexpected observations, we discovered that both these novel poly(A) signals and standard poly(A) signals become functionally silent when they are positioned close to transcription start sites in either Drosophila or human cells. This indicates that the stage when the poly(A) signal emerges from the polymerase II (Pol II) transcription complex determines whether a putative poly(A) signal is recognized as functional. The data suggest that this mechanism, which probably prevents cryptic poly(A) signals from causing premature transcription termination, depends on low Ser2 phosphorylation of the C-terminal domain of Pol II and inefficient recruitment of processing factors.
Collapse
|