1
|
Monteil A, Guérineau NC, Gil-Nagel A, Parra-Diaz P, Lory P, Senatore A. New insights into the physiology and pathophysiology of the atypical sodium leak channel NALCN. Physiol Rev 2024; 104:399-472. [PMID: 37615954 DOI: 10.1152/physrev.00014.2022] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 07/13/2023] [Accepted: 08/15/2023] [Indexed: 08/25/2023] Open
Abstract
Cell excitability and its modulation by hormones and neurotransmitters involve the concerted action of a large repertoire of membrane proteins, especially ion channels. Unique complements of coexpressed ion channels are exquisitely balanced against each other in different excitable cell types, establishing distinct electrical properties that are tailored for diverse physiological contributions, and dysfunction of any component may induce a disease state. A crucial parameter controlling cell excitability is the resting membrane potential (RMP) set by extra- and intracellular concentrations of ions, mainly Na+, K+, and Cl-, and their passive permeation across the cell membrane through leak ion channels. Indeed, dysregulation of RMP causes significant effects on cellular excitability. This review describes the molecular and physiological properties of the Na+ leak channel NALCN, which associates with its accessory subunits UNC-79, UNC-80, and NLF-1/FAM155 to conduct depolarizing background Na+ currents in various excitable cell types, especially neurons. Studies of animal models clearly demonstrate that NALCN contributes to fundamental physiological processes in the nervous system including the control of respiratory rhythm, circadian rhythm, sleep, and locomotor behavior. Furthermore, dysfunction of NALCN and its subunits is associated with severe pathological states in humans. The critical involvement of NALCN in physiology is now well established, but its study has been hampered by the lack of specific drugs that can block or agonize NALCN currents in vitro and in vivo. Molecular tools and animal models are now available to accelerate our understanding of how NALCN contributes to key physiological functions and the development of novel therapies for NALCN channelopathies.
Collapse
Affiliation(s)
- Arnaud Monteil
- Institut de Génomique Fonctionnelle, Université de Montpellier, CNRS, INSERM, Montpellier, France
- LabEx "Ion Channel Science and Therapeutics," Montpellier, France
- Department of Physiology, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Nathalie C Guérineau
- Institut de Génomique Fonctionnelle, Université de Montpellier, CNRS, INSERM, Montpellier, France
- LabEx "Ion Channel Science and Therapeutics," Montpellier, France
| | - Antonio Gil-Nagel
- Department of Neurology, Epilepsy Program, Hospital Ruber Internacional, Madrid, Spain
| | - Paloma Parra-Diaz
- Department of Neurology, Epilepsy Program, Hospital Ruber Internacional, Madrid, Spain
| | - Philippe Lory
- Institut de Génomique Fonctionnelle, Université de Montpellier, CNRS, INSERM, Montpellier, France
- LabEx "Ion Channel Science and Therapeutics," Montpellier, France
| | - Adriano Senatore
- Department of Biology, University of Toronto Mississauga, Mississauga, Ontario, Canada
| |
Collapse
|
2
|
Yang R, Wang H, Zhu L, Zhu L, Liu T, Zhang D. Identification and Functional Analysis of Acyl-Acyl Carrier Protein Δ 9 Desaturase from Nannochloropsis oceanica. J Microbiol 2023; 61:95-107. [PMID: 36719619 DOI: 10.1007/s12275-022-00001-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/03/2022] [Accepted: 11/08/2022] [Indexed: 02/01/2023]
Abstract
The oleaginous marine microalga Nannochloropsis oceanica strain IMET1 has attracted increasing attention as a promising photosynthetic cell factory due to its unique excellent capacity to accumulate large amounts of triacylglycerols and eicosapentaenoic acid. To complete the genomic annotation for genes in the fatty acid biosynthesis pathway of N. oceanica, we conducted the present study to identify a novel candidate gene encoding the archetypical chloroplast stromal acyl-acyl carrier protein Δ9 desaturase. The full-length cDNA was generated using rapid-amplification of cDNA ends, and the structure of the coding region interrupted by four introns was determined. The RT-qPCR results demonstrated the upregulated transcriptional abundance of this gene under nitrogen starvation condition. Fluorescence localization studies using EGFP-fused protein revealed that the translated protein was localized in chloroplast stroma. The catalytic activity of the translated protein was characterized by inducible expression in Escherichia coli and a mutant yeast strain BY4389, indicating its potential desaturated capacity for palmitoyl-ACP (C16:0-ACP) and stearoyl-ACP (C18:0-ACP). Further functional complementation assay using BY4839 on plate demonstrated that the expressed enzyme restored the biosynthesis of oleic acid. These results support the desaturated activity of the expressed protein in chloroplast stroma to fulfill the biosynthesis and accumulation of monounsaturated fatty acids in N. oceanica strain IMET1.
Collapse
Affiliation(s)
- Ruigang Yang
- Department of Biology and Chemistry, College of Sciences, National University of Defense Technology, Changsha, 410073, People's Republic of China
- Key Laboratory of Biofuels, Key Laboratory of Shandong Energy Biological Genetic Resources, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, 266101, People's Republic of China
| | - Hui Wang
- Functional Laboratory of Solar Energy, Shandong Energy Institute, Qingdao, 266101, People's Republic of China
| | - Lingyun Zhu
- Department of Biology and Chemistry, College of Sciences, National University of Defense Technology, Changsha, 410073, People's Republic of China
| | - Lvyun Zhu
- Department of Biology and Chemistry, College of Sciences, National University of Defense Technology, Changsha, 410073, People's Republic of China
| | - Tianzhong Liu
- Key Laboratory of Biofuels, Key Laboratory of Shandong Energy Biological Genetic Resources, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, 266101, People's Republic of China.
| | - Dongyi Zhang
- Hunan Key Laboratory of Economic Crops, Genetic Improvement, and Integrated Utilization, School of Life Sciences, Hunan University of Science and Technology, Xiangtan, 411201, People's Republic of China.
| |
Collapse
|
3
|
Genome-Wide cis-Regulatory Element Based Discovery of Auxin-Responsive Genes in Higher Plant. Genes (Basel) 2021; 13:genes13010024. [PMID: 35052364 PMCID: PMC8775021 DOI: 10.3390/genes13010024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 12/20/2021] [Accepted: 12/21/2021] [Indexed: 11/17/2022] Open
Abstract
Auxin has a profound impact on plant physiology and participates in almost all aspects of plant development processes. Auxin exerts profound pleiotropic effects on plant growth and differentiation by regulating the auxin response genes’ expressions. The classical auxin reaction is usually mediated by auxin response factors (ARFs), which bind to the auxin response element (AuxRE) in the promoter region of the target gene. Experiments have generated only a limited number of plant genes with well-characterized functions. It is still unknown how many genes respond to exogenous auxin treatment. An economical and effective method was proposed for the genome-wide discovery of genes responsive to auxin in a model plant, Arabidopsis thaliana (A. thaliana). Our method relies on cis-regulatory-element-based targeted gene finding across different promoters in a genome. We first exploit and analyze auxin-specific cis-regulatory elements for the transcription of the target genes, and then identify putative auxin responsive genes whose promoters contain the elements in the collection of over 25,800 promoters in the A. thaliana genome. Evaluating our result by comparing with a published database and the literature, we found that this method has an accuracy rate of 65.2% (309/474) for predicting candidate genes responsive to auxin. Chromosome distribution and annotation of the putative auxin-responsive genes predicted here were also mined. The results can markedly decrease the number of identified but merely potential auxin target genes and also provide useful clues for improving the annotation of gene that lack functional information.
Collapse
|
4
|
SAVMD: An adaptive signal processing method for identifying protein coding regions. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
5
|
Zheng Q, Chen T, Zhou W, Xie L, Su H. Gene prediction by the noise-assisted MEMD and wavelet transform for identifying the protein coding regions. Biocybern Biomed Eng 2021. [DOI: 10.1016/j.bbe.2020.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
6
|
Xing Y, Yang W, Liu G, Cui X, Meng H, Zhao H, Zhao X, Li J, Liu Z, Zhang MQ, Cai L. Dynamic Alternative Splicing During Mouse Preimplantation Embryo Development. Front Bioeng Biotechnol 2020; 8:35. [PMID: 32117919 PMCID: PMC7019016 DOI: 10.3389/fbioe.2020.00035] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Accepted: 01/15/2020] [Indexed: 11/13/2022] Open
Abstract
The mechanism of alternative pre-mRNA splicing (AS) during preimplantation development is largely unknown. In order to capture the dynamic changes of AS occurring during embryogenesis, we carried out bioinformatics analysis based on scRNA-seq data over the time-course preimplantation development in mouse. We detected numerous previously-unreported differentially expressed genes at specific developmental stages and investigated the nature of AS at both minor and major zygotic genome activation (ZGA). The AS and differential AS atlas over preimplantation development were established. The differentially alternatively spliced genes (DASGs) are likely to be key splicing factors (SFs) during preimplantation development. We also demonstrated that there is a regulatory cascade of AS events in which some key SFs are regulated by differentially AS of their own gene transcripts. Moreover, 212 isoform switches (ISs) during preimplantation development were detected, which may be critical for decoding the mechanism of early embryogenesis. Importantly, we uncovered that zygotic AS activation (ZASA) is in conformity with ZGA and revealed that AS is coupled with transcription during preimplantation development. Our results may provide a deeper insight into the regulation of early embryogenesis.
Collapse
Affiliation(s)
- Yongqiang Xing
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,The Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| | - Wuritu Yang
- The Key Laboratory of Mammalian Reproductive Biology and Biotechnology of the Ministry of Education, Inner Mongolia University, Hohhot, China
| | - Guoqing Liu
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,The Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| | - Xiangjun Cui
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,The Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| | - Hu Meng
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,The Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| | - Hongyu Zhao
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,The Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| | - Xiujuan Zhao
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,The Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| | - Jun Li
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,The Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| | - Zhe Liu
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China
| | - Michael Q Zhang
- Department of Biological Sciences, Center for Systems Biology, The University of Texas at Dallas, Richardson, TX, United States
| | - Lu Cai
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,The Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| |
Collapse
|
7
|
Lu S, Zhang J, Lian X, Sun L, Meng K, Chen Y, Sun Z, Yin X, Li Y, Zhao J, Wang T, Zhang G, He QY. A hidden human proteome encoded by 'non-coding' genes. Nucleic Acids Res 2019; 47:8111-8125. [PMID: 31340039 PMCID: PMC6735797 DOI: 10.1093/nar/gkz646] [Citation(s) in RCA: 104] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2019] [Revised: 07/07/2019] [Accepted: 07/15/2019] [Indexed: 01/27/2023] Open
Abstract
It has been a long debate whether the 98% ‘non-coding’ fraction of human genome can encode functional proteins besides short peptides. With full-length translating mRNA sequencing and ribosome profiling, we found that up to 3330 long non-coding RNAs (lncRNAs) were bound to ribosomes with active translation elongation. With shotgun proteomics, 308 lncRNA-encoded new proteins were detected. A total of 207 unique peptides of these new proteins were verified by multiple reaction monitoring (MRM) and/or parallel reaction monitoring (PRM); and 10 new proteins were verified by immunoblotting. We found that these new proteins deviated from the canonical proteins with various physical and chemical properties, and emerged mostly in primates during evolution. We further deduced the protein functions by the assays of translation efficiency, RNA folding and intracellular localizations. As the new protein UBAP1-AST6 is localized in the nucleoli and is preferentially expressed by lung cancer cell lines, we biologically verified that it has a function associated with cell proliferation. In sum, we experimentally evidenced a hidden human functional proteome encoded by purported lncRNAs, suggesting a resource for annotating new human proteins.
Collapse
Affiliation(s)
- Shaohua Lu
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Jing Zhang
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Xinlei Lian
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China.,Laboratory of Veterinary Pharmacology, College of Veterinary Medicine, South China Agricultural University, Guangzhou 510642, China
| | - Li Sun
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Kun Meng
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Yang Chen
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Zhenghua Sun
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Xingfeng Yin
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Yaxing Li
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Jing Zhao
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Tong Wang
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Gong Zhang
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Qing-Yu He
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| |
Collapse
|
8
|
Wilbrandt J, Misof B, Panfilio KA, Niehuis O. Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models. BMC Genomics 2019; 20:753. [PMID: 31623555 PMCID: PMC6798390 DOI: 10.1186/s12864-019-6064-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 08/27/2019] [Indexed: 02/06/2023] Open
Abstract
Background The location and modular structure of eukaryotic protein-coding genes in genomic sequences can be automatically predicted by gene annotation algorithms. These predictions are often used for comparative studies on gene structure, gene repertoires, and genome evolution. However, automatic annotation algorithms do not yet correctly identify all genes within a genome, and manual annotation is often necessary to obtain accurate gene models and gene sets. As manual annotation is time-consuming, only a fraction of the gene models in a genome is typically manually annotated, and this fraction often differs between species. To assess the impact of manual annotation efforts on genome-wide analyses of gene structural properties, we compared the structural properties of protein-coding genes in seven diverse insect species sequenced by the i5k initiative. Results Our results show that the subset of genes chosen for manual annotation by a research community (3.5–7% of gene models) may have structural properties (e.g., lengths and exon counts) that are not necessarily representative for a species’ gene set as a whole. Nonetheless, the structural properties of automatically generated gene models are only altered marginally (if at all) through manual annotation. Major correlative trends, for example a negative correlation between genome size and exonic proportion, can be inferred from either the automatically predicted or manually annotated gene models alike. Vice versa, some previously reported trends did not appear in either the automatic or manually annotated gene sets, pointing towards insect-specific gene structural peculiarities. Conclusions In our analysis of gene structural properties, automatically predicted gene models proved to be sufficiently reliable to recover the same gene-repertoire-wide correlative trends that we found when focusing on manually annotated gene models only. We acknowledge that analyses on the individual gene level clearly benefit from manual curation. However, as genome sequencing and annotation projects often differ in the extent of their manual annotation and curation efforts, our results indicate that comparative studies analyzing gene structural properties in these genomes can nonetheless be justifiable and informative. Electronic supplementary material The online version of this article (10.1186/s12864-019-6064-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jeanne Wilbrandt
- Center for molecular Biodiversity Research, Zoological Research Museum Alexander Koenig (ZFMK), Adenauerallee 160, 53113, Bonn, Germany. .,Present address: Hoffmann Research Group, Leibniz Institute on Aging - Fritz Lipmann Institute, Beutenbergstraße 11, 07745, Jena, Germany.
| | - Bernhard Misof
- Center for molecular Biodiversity Research, Zoological Research Museum Alexander Koenig (ZFMK), Adenauerallee 160, 53113, Bonn, Germany
| | - Kristen A Panfilio
- School of Life Sciences, University of Warwick, Gibbet Hill Campus, Coventry, CV4 7AL, UK
| | - Oliver Niehuis
- Evolutionary Biology and Ecology, Institute of Biology I (Zoology), Albert Ludwig University, Hauptstr. 1, 79104, Freiburg, Germany
| |
Collapse
|
9
|
Abstract
Every microarray experiment is based on a common format. First, a large number of nucleotide "spots" are arrayed onto a substrate, typically a glass slide, a silicon chip, or microbeads. Second, a complex population of nucleic acids (isolated from cells, selected from in vitro-synthesized libraries, or obtained from another source) is labeled, typically with fluorescent dyes. Third, the labeled nucleic acids are allowed to hybridize to their complementary spot(s) on the microarray. Fourth, the hybridized microarray is washed, allowing the amount of hybridized label to then be quantified. Analysis of the raw data generates a readout of the levels of each species of RNA in the original complex population. This introduction includes several examples of microarray applications and provides a discussion of the basic steps of most microarray experiments.
Collapse
|
10
|
Abstract
This unit describes the usage of geneid, an efficient gene-finding program that allows for the analysis of large genomic sequences, including whole mammalian chromosomes. These sequences can be partially annotated, and geneid can be used to refine this initial annotation. Training geneid is relatively easy, and parameter configurations exist for a number of eukaryotic species. geneid produces output in a variety of standard formats. The results, thus, can be processed by a variety of software tools, including visualization programs. geneid software is in the public domain, and is undergoing constant development. It is easy to install and use. Exhaustive benchmark evaluations show that geneid compares favorably with other existing gene-finding tools. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Tyler Alioto
- Centre Nacional d'Anàlisi Genòmica (CNAG-CRG), Barcelona, Spain.,Universitat Pompeu Fabra, Barcelona, Spain.,Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Enrique Blanco
- Centre de Regulació Genòmica (CRG), Barcelona, Spain.,Universitat Pompeu Fabra, Barcelona, Spain.,Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Genís Parra
- Centre Nacional d'Anàlisi Genòmica (CNAG-CRG), Barcelona, Spain.,Centre de Regulació Genòmica (CRG), Barcelona, Spain.,Universitat Pompeu Fabra, Barcelona, Spain.,Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Roderic Guigó
- Centre de Regulació Genòmica (CRG), Barcelona, Spain.,Universitat Pompeu Fabra, Barcelona, Spain.,Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| |
Collapse
|
11
|
Jha SK, Malik S, Sharma M, Pandey A, Pandey GK. Recent Advances in Substrate Identification of Protein Kinases in Plants and Their Role in Stress Management. Curr Genomics 2017; 18:523-541. [PMID: 29204081 PMCID: PMC5684648 DOI: 10.2174/1389202918666170228142703] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Revised: 10/13/2016] [Accepted: 11/11/2016] [Indexed: 12/20/2022] Open
Abstract
Protein phosphorylation-dephosphorylation is a well-known regulatory mechanism in biological systems and has become one of the significant means of protein function regulation, modulating most of the biological processes. Protein kinases play vital role in numerous cellular processes. Kinases transduce external signal into responses such as growth, immunity and stress tolerance through phosphorylation of their target proteins. In order to understand these cellular processes at the molecular level, one needs to be aware of the different substrates targeted by protein kinases. Advancement in tools and techniques has bestowed practice of multiple approaches that enable target identification of kinases. However, so far none of the methodologies has been proved to be as good as a panacea for the substrate identification. In this review, the recent advances that have been made in the identifications of putative substrates and the implications of these kinases and their substrates in stress management are discussed.
Collapse
Affiliation(s)
- Saroj K Jha
- Department of Plant Molecular Biology, University of Delhi South Campus, Benito Juarez Road, Dhaula Kuan, New Delhi-110021, India
| | - Shikha Malik
- Department of Biological Sciences, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, USA
| | - Manisha Sharma
- Department of Plant Molecular Biology, University of Delhi South Campus, Benito Juarez Road, Dhaula Kuan, New Delhi-110021, India
| | - Amita Pandey
- Department of Plant Molecular Biology, University of Delhi South Campus, Benito Juarez Road, Dhaula Kuan, New Delhi-110021, India
| | - Girdhar K Pandey
- Department of Plant Molecular Biology, University of Delhi South Campus, Benito Juarez Road, Dhaula Kuan, New Delhi-110021, India
| |
Collapse
|
12
|
Wilbrandt J, Misof B, Niehuis O. COGNATE: comparative gene annotation characterizer. BMC Genomics 2017; 18:535. [PMID: 28716078 PMCID: PMC5513398 DOI: 10.1186/s12864-017-3870-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Accepted: 06/19/2017] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND The comparison of gene and genome structures across species has the potential to reveal major trends of genome evolution. However, such a comparative approach is currently hampered by a lack of standardization (e.g., Elliott TA, Gregory TR, Philos Trans Royal Soc B: Biol Sci 370:20140331, 2015). For example, testing the hypothesis that the total amount of coding sequences is a reliable measure of potential proteome diversity (Wang M, Kurland CG, Caetano-Anollés G, PNAS 108:11954, 2011) requires the application of standardized definitions of coding sequence and genes to create both comparable and comprehensive data sets and corresponding summary statistics. However, such standard definitions either do not exist or are not consistently applied. These circumstances call for a standard at the descriptive level using a minimum of parameters as well as an undeviating use of standardized terms, and for software that infers the required data under these strict definitions. The acquisition of a comprehensive, descriptive, and standardized set of parameters and summary statistics for genome publications and further analyses can thus greatly benefit from the availability of an easy to use standard tool. RESULTS We developed a new open-source command-line tool, COGNATE (Comparative Gene Annotation Characterizer), which uses a given genome assembly and its annotation of protein-coding genes for a detailed description of the respective gene and genome structure parameters. Additionally, we revised the standard definitions of gene and genome structures and provide the definitions used by COGNATE as a working draft suggestion for further reference. Complete parameter lists and summary statistics are inferred using this set of definitions to allow down-stream analyses and to provide an overview of the genome and gene repertoire characteristics. COGNATE is written in Perl and freely available at the ZFMK homepage ( https://www.zfmk.de/en/COGNATE ) and on github ( https://github.com/ZFMK/COGNATE ). CONCLUSION The tool COGNATE allows comparing genome assemblies and structural elements on multiples levels (e.g., scaffold or contig sequence, gene). It clearly enhances comparability between analyses. Thus, COGNATE can provide the important standardization of both genome and gene structure parameter disclosure as well as data acquisition for future comparative analyses. With the establishment of comprehensive descriptive standards and the extensive availability of genomes, an encompassing database will become possible.
Collapse
Affiliation(s)
- Jeanne Wilbrandt
- Zoologisches Forschungsmuseum Alexander Koenig (ZFMK), Zentrum für Molekulare Biodiversitätsforschung (zmb), Bonn, Germany
| | - Bernhard Misof
- Zoologisches Forschungsmuseum Alexander Koenig (ZFMK), Zentrum für Molekulare Biodiversitätsforschung (zmb), Bonn, Germany
| | - Oliver Niehuis
- Abteilung Evolutionsbiologie und Ökologie, Albert-Ludwigs-Universität Freiburg, Institut für Biologie I (Zoologie), Freiburg, Germany
| |
Collapse
|
13
|
Schenk H, Müller-Deile J, Kinast M, Schiffer M. Disease modeling in genetic kidney diseases: zebrafish. Cell Tissue Res 2017; 369:127-141. [PMID: 28331970 DOI: 10.1007/s00441-017-2593-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2016] [Accepted: 02/22/2017] [Indexed: 01/07/2023]
Abstract
Growing numbers of translational genomics studies are based on the highly efficient and versatile zebrafish (Danio rerio) vertebrate model. The increasing types of zebrafish models have improved our understanding of inherited kidney diseases, since they not only display pathophysiological changes but also give us the opportunity to develop and test novel treatment options in a high-throughput manner. New paradigms in inherited kidney diseases have been developed on the basis of the distinct genome conservation of approximately 70 % between zebrafish and humans in terms of existing gene orthologs. Several options are available to determine the functional role of a specific gene or gene sets. Permanent genome editing can be induced via complete gene knockout by using the CRISPR/Cas-system, among others, or via transient modification by using various morpholino techniques. Cross-species rescues succeeding knockdown techniques are employed to determine the functional significance of a target gene or a specific mutation. This article summarizes the current techniques and discusses their perspectives.
Collapse
Affiliation(s)
- Heiko Schenk
- Department of Medicine/Nephrology, Hannover Medical School, Hannover, Germany
- Mount Desert Island Biological Laboratory, Salisbury Cove, Bar Harbor, Me., USA
| | - Janina Müller-Deile
- Department of Medicine/Nephrology, Hannover Medical School, Hannover, Germany
- Mount Desert Island Biological Laboratory, Salisbury Cove, Bar Harbor, Me., USA
| | - Mark Kinast
- Department of Medicine/Nephrology, Hannover Medical School, Hannover, Germany
- Mount Desert Island Biological Laboratory, Salisbury Cove, Bar Harbor, Me., USA
| | - Mario Schiffer
- Department of Medicine/Nephrology, Hannover Medical School, Hannover, Germany.
- Mount Desert Island Biological Laboratory, Salisbury Cove, Bar Harbor, Me., USA.
| |
Collapse
|
14
|
Klasberg S, Bitard-Feildel T, Mallet L. Computational Identification of Novel Genes: Current and Future Perspectives. Bioinform Biol Insights 2016; 10:121-31. [PMID: 27493475 PMCID: PMC4970615 DOI: 10.4137/bbi.s39950] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Revised: 05/31/2016] [Accepted: 06/05/2016] [Indexed: 12/31/2022] Open
Abstract
While it has long been thought that all genomic novelties are derived from the existing material, many genes lacking homology to known genes were found in recent genome projects. Some of these novel genes were proposed to have evolved de novo, ie, out of noncoding sequences, whereas some have been shown to follow a duplication and divergence process. Their discovery called for an extension of the historical hypotheses about gene origination. Besides the theoretical breakthrough, increasing evidence accumulated that novel genes play important roles in evolutionary processes, including adaptation and speciation events. Different techniques are available to identify genes and classify them as novel. Their classification as novel is usually based on their similarity to known genes, or lack thereof, detected by comparative genomics or against databases. Computational approaches are further prime methods that can be based on existing models or leveraging biological evidences from experiments. Identification of novel genes remains however a challenging task. With the constant software and technologies updates, no gold standard, and no available benchmark, evaluation and characterization of genomic novelty is a vibrant field. In this review, the classical and state-of-the-art tools for gene prediction are introduced. The current methods for novel gene detection are presented; the methodological strategies and their limits are discussed along with perspective approaches for further studies.
Collapse
Affiliation(s)
- Steffen Klasberg
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Huefferstrasse 1, Muenster, Germany
| | - Tristan Bitard-Feildel
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Huefferstrasse 1, Muenster, Germany
| | - Ludovic Mallet
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Huefferstrasse 1, Muenster, Germany
| |
Collapse
|
15
|
Singh S, Kaur S, Goel N. A Review of Computational Intelligence Methods for Eukaryotic Promoter Prediction. NUCLEOSIDES NUCLEOTIDES & NUCLEIC ACIDS 2016; 34:449-62. [PMID: 26158565 DOI: 10.1080/15257770.2015.1013126] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
In past decades, prediction of genes in DNA sequences has attracted the attention of many researchers but due to its complex structure it is extremely intricate to correctly locate its position. A large number of regulatory regions are present in DNA that helps in transcription of a gene. Promoter is one such region and to find its location is a challenging problem. Various computational methods for promoter prediction have been developed over the past few years. This paper reviews these promoter prediction methods. Several difficulties and pitfalls encountered by these methods are also detailed, along with future research directions.
Collapse
Affiliation(s)
- Shailendra Singh
- a Department of Computer Science and Engineering , PEC University of Technology , Chandigarh , India
| | | | | |
Collapse
|
16
|
Bond C, Tang Y, Li L. Saccharomyces cerevisiae as a tool for mining, studying and engineering fungal polyketide synthases. Fungal Genet Biol 2016; 89:52-61. [PMID: 26850128 PMCID: PMC4789138 DOI: 10.1016/j.fgb.2016.01.005] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Revised: 01/01/2016] [Accepted: 01/09/2016] [Indexed: 12/17/2022]
Abstract
Small molecule secondary metabolites produced by organisms such as plants, bacteria, and fungi form a fascinating and important group of natural products, many of which have shown promise as medicines. Fungi in particular have been important sources of natural product polyketide pharmaceuticals. While the structural complexity of these polyketides makes them interesting and useful bioactive compounds, these same features also make them difficult and expensive to prepare and scale-up using synthetic methods. Currently, nearly all commercial polyketides are prepared through fermentation or semi-synthesis. However, elucidation and engineering of polyketide pathways in the native filamentous fungi hosts are often hampered due to a lack of established genetic tools and of understanding of the regulation of fungal secondary metabolisms. Saccharomyces cerevisiae has many advantages beneficial to the study and development of polyketide pathways from filamentous fungi due to its extensive genetic toolbox and well-studied metabolism. This review highlights the benefits S. cerevisiae provides as a tool for mining, studying, and engineering fungal polyketide synthases (PKSs), as well as notable insights this versatile tool has given us into the mechanisms and products of fungal PKSs.
Collapse
Affiliation(s)
- Carly Bond
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, CA 90095, United States
| | - Yi Tang
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, CA 90095, United States; Department of Chemistry and Biochemistry, University of California, Los Angeles, CA 90095, United States.
| | - Li Li
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, CA 90095, United States; Engineering Research Center of Industrial Microbiology (Ministry of Education), College of Life Sciences, Fujian Normal University, Fuzhou, Fujian 350108, China; State Key Laboratory of Microbial Metabolism, Shanghai Jiao Tong University, Shanghai 200030, China
| |
Collapse
|
17
|
Mouilleron H, Delcourt V, Roucou X. Death of a dogma: eukaryotic mRNAs can code for more than one protein. Nucleic Acids Res 2016; 44:14-23. [PMID: 26578573 PMCID: PMC4705651 DOI: 10.1093/nar/gkv1218] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2015] [Revised: 10/26/2015] [Accepted: 10/28/2015] [Indexed: 12/13/2022] Open
Abstract
mRNAs carry the genetic information that is translated by ribosomes. The traditional view of a mature eukaryotic mRNA is a molecule with three main regions, the 5' UTR, the protein coding open reading frame (ORF) or coding sequence (CDS), and the 3' UTR. This concept assumes that ribosomes translate one ORF only, generally the longest one, and produce one protein. As a result, in the early days of genomics and bioinformatics, one CDS was associated with each protein-coding gene. This fundamental concept of a single CDS is being challenged by increasing experimental evidence indicating that annotated proteins are not the only proteins translated from mRNAs. In particular, mass spectrometry (MS)-based proteomics and ribosome profiling have detected productive translation of alternative open reading frames. In several cases, the alternative and annotated proteins interact. Thus, the expression of two or more proteins translated from the same mRNA may offer a mechanism to ensure the co-expression of proteins which have functional interactions. Translational mechanisms already described in eukaryotic cells indicate that the cellular machinery is able to translate different CDSs from a single viral or cellular mRNA. In addition to summarizing data showing that the protein coding potential of eukaryotic mRNAs has been underestimated, this review aims to challenge the single translated CDS dogma.
Collapse
Affiliation(s)
- Hélène Mouilleron
- Department of biochemistry, Université de Sherbrooke, Sherbrooke, Quebec J1E 4K8, Canada PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Quebec, Canada
| | - Vivian Delcourt
- Department of biochemistry, Université de Sherbrooke, Sherbrooke, Quebec J1E 4K8, Canada PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Quebec, Canada Inserm U-1192, Laboratoire de Protéomique, Réponse Inflammatoire, Spectrométrie de Masse (PRISM), Université de Lille 1, Cité Scientifique, 59655 Villeneuve D'Ascq, France
| | - Xavier Roucou
- Department of biochemistry, Université de Sherbrooke, Sherbrooke, Quebec J1E 4K8, Canada PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Quebec, Canada
| |
Collapse
|
18
|
A Comprehensive Review of Emerging Computational Methods for Gene Identification. JOURNAL OF INFORMATION PROCESSING SYSTEMS 2016. [DOI: 10.3745/jips.04.0023] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
19
|
Jimenez J, Duncan CDS, Gallardo M, Mata J, Perez-Pulido AJ. AnABlast: a new in silico strategy for the genome-wide search of novel genes and fossil regions. DNA Res 2015; 22:439-49. [PMID: 26494834 PMCID: PMC4675712 DOI: 10.1093/dnares/dsv025] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2015] [Accepted: 09/25/2015] [Indexed: 12/15/2022] Open
Abstract
Genome annotation, assisted by computer programs, is one of the great advances in modern biology. Nevertheless, the in silico identification of small and complex coding sequences is still challenging. We observed that amino acid sequences inferred from coding-but rarely from non-coding-DNA sequences accumulated alignments in low-stringency BLAST searches, suggesting that this alignments accumulation could be used to highlight coding regions in sequenced DNA. To investigate this possibility, we developed a computer program (AnABlast) that generates profiles of accumulated alignments in query amino acid sequences using a low-stringency BLAST strategy. To validate this approach, all six-frame translations of DNA sequences between every two annotated exons of the fission yeast genome were analysed with AnABlast. AnABlast-generated profiles identified three new copies of known genes, and four new genes supported by experimental evidence. New pseudogenes, ancestral carboxyl- and amino-terminal subtractions, complex gene rearrangements, and ancient fragments of mitDNA and of bacterial origin, were also inferred. Thus, this novel in silico approach provides a powerful tool to uncover new genes, as well as fossil-coding sequences, thus providing insight into the evolutionary history of annotated genomes.
Collapse
Affiliation(s)
- Juan Jimenez
- Centro Andaluz de Biología del Desarrollo, Universidad Pablo de Olavide de Sevilla/CSIC, Sevilla, Spain
| | - Caia D S Duncan
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - María Gallardo
- Centro Andaluz de Biología del Desarrollo, Universidad Pablo de Olavide de Sevilla/CSIC, Sevilla, Spain
| | - Juan Mata
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Antonio J Perez-Pulido
- Centro Andaluz de Biología del Desarrollo, Universidad Pablo de Olavide de Sevilla/CSIC, Sevilla, Spain
| |
Collapse
|
20
|
Pauciullo A, Erhardt G. Molecular Characterization of the Llamas (Lama glama) Casein Cluster Genes Transcripts (CSN1S1, CSN2, CSN1S2, CSN3) and Regulatory Regions. PLoS One 2015; 10:e0124963. [PMID: 25923814 PMCID: PMC4414411 DOI: 10.1371/journal.pone.0124963] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2015] [Accepted: 03/19/2015] [Indexed: 11/19/2022] Open
Abstract
In the present paper, we report for the first time the characterization of llama (Lama glama) caseins at transcriptomic and genetic level. A total of 288 casein clones transcripts were analysed from two lactating llamas. The most represented mRNA populations were those correctly assembled (85.07%) and they encoded for mature proteins of 215, 217, 187 and 162 amino acids respectively for the CSN1S1, CSN2, CSN1S2 and CSN3 genes. The exonic subdivision evidenced a structure made of 21, 9, 17 and 6 exons for the αs1-, β-, αs2- and κ-casein genes respectively. Exon skipping and duplication events were evidenced. Two variants A and B were identified in the αs1-casein gene as result of the alternative out-splicing of the exon 18. An additional exon coding for a novel esapeptide was found to be cryptic in the κ-casein gene, whereas one extra exon was found in the αs2-casein gene by the comparison with the Camelus dromedaries sequence. A total of 28 putative phosphorylated motifs highlighted a complex heterogeneity and a potential variable degree of post-translational modifications. Ninety-six polymorphic sites were found through the comparison of the lama casein cDNAs with the homologous camel sequences, whereas the first description and characterization of the 5'- and 3'-regulatory regions allowed to identify the main putative consensus sequences involved in the casein genes expression, thus opening the way to new investigations -so far- never achieved in this species.
Collapse
Affiliation(s)
- Alfredo Pauciullo
- Department of Agricultural, Forest and Food Sciences, University of Torino, Grugliasco, Italy
- Institute for Animal Breeding and Genetics, Justus Liebig University, Gießen, Germany
| | - Georg Erhardt
- Institute for Animal Breeding and Genetics, Justus Liebig University, Gießen, Germany
| |
Collapse
|
21
|
Chu Q, Ma J, Saghatelian A. Identification and characterization of sORF-encoded polypeptides. Crit Rev Biochem Mol Biol 2015; 50:134-41. [PMID: 25857697 DOI: 10.3109/10409238.2015.1016215] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Molecular biology, genomics and proteomics methods have been utilized to reveal a non-annotated class of endogenous polypeptides (small proteins and peptides) encoded by short open reading frames (sORFs), or small open reading frames (smORFs). We refer to these polypeptides as s(m)ORF-encoded polypeptides or SEPs. The early SEPs were identified via genetic screens, and many of the RNAs that contain s(m)ORFs were originally considered to be non-coding; however, elegant work in bacteria and flies demonstrated that these s(m)ORFs code for functional polypeptides as small as 11-amino acids in length. The discovery of these initial SEPs led to search for these molecules using methods such as ribosome profiling and proteomics, which have revealed the existence of many SEPs, including novel human SEPs. Unlike screens, omics methods do not necessarily link a SEP to a cellular or biological function, but functional genomic and proteomic strategies have demonstrated that at least some of these newly discovered SEPs have biochemical and cellular functions. Here, we provide an overview of these results and discuss the future directions in this emerging field.
Collapse
Affiliation(s)
- Qian Chu
- Clayton Foundation Laboratories for Peptide Biology, Salk Institute for Biological Studies, Helmsley Center for Genomic Medicine , La Jolla, CA , USA and
| | | | | |
Collapse
|
22
|
Caminsky NG, Mucaki EJ, Rogan PK. Interpretation of mRNA splicing mutations in genetic disease: review of the literature and guidelines for information-theoretical analysis. F1000Res 2015. [DOI: 10.12688/f1000research.5654.2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The interpretation of genomic variants has become one of the paramount challenges in the post-genome sequencing era. In this review we summarize nearly 20 years of research on the applications of information theory (IT) to interpret coding and non-coding mutations that alter mRNA splicing in rare and common diseases. We compile and summarize the spectrum of published variants analyzed by IT, to provide a broad perspective of the distribution of deleterious natural and cryptic splice site variants detected, as well as those affecting splicing regulatory sequences. Results for natural splice site mutations can be interrogated dynamically with Splicing Mutation Calculator, a companion software program that computes changes in information content for any splice site substitution, linked to corresponding publications containing these mutations. The accuracy of IT-based analysis was assessed in the context of experimentally validated mutations. Because splice site information quantifies binding affinity, IT-based analyses can discern the differences between variants that account for the observed reduced (leaky) versus abolished mRNA splicing. We extend this principle by comparing predicted mutations in natural, cryptic, and regulatory splice sites with observed deleterious phenotypic and benign effects. Our analysis of 1727 variants revealed a number of general principles useful for ensuring portability of these analyses and accurate input and interpretation of mutations. We offer guidelines for optimal use of IT software for interpretation of mRNA splicing mutations.
Collapse
|
23
|
Caminsky N, Mucaki EJ, Rogan PK. Interpretation of mRNA splicing mutations in genetic disease: review of the literature and guidelines for information-theoretical analysis. F1000Res 2014; 3:282. [PMID: 25717368 PMCID: PMC4329672 DOI: 10.12688/f1000research.5654.1] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/10/2014] [Indexed: 12/14/2022] Open
Abstract
The interpretation of genomic variants has become one of the paramount challenges in the post-genome sequencing era. In this review we summarize nearly 20 years of research on the applications of information theory (IT) to interpret coding and non-coding mutations that alter mRNA splicing in rare and common diseases. We compile and summarize the spectrum of published variants analyzed by IT, to provide a broad perspective of the distribution of deleterious natural and cryptic splice site variants detected, as well as those affecting splicing regulatory sequences. Results for natural splice site mutations can be interrogated dynamically with Splicing Mutation Calculator, a companion software program that computes changes in information content for any splice site substitution, linked to corresponding publications containing these mutations. The accuracy of IT-based analysis was assessed in the context of experimentally validated mutations. Because splice site information quantifies binding affinity, IT-based analyses can discern the differences between variants that account for the observed reduced (leaky) versus abolished mRNA splicing. We extend this principle by comparing predicted mutations in natural, cryptic, and regulatory splice sites with observed deleterious phenotypic and benign effects. Our analysis of 1727 variants revealed a number of general principles useful for ensuring portability of these analyses and accurate input and interpretation of mutations. We offer guidelines for optimal use of IT software for interpretation of mRNA splicing mutations.
Collapse
Affiliation(s)
- Natasha Caminsky
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON, N6A 2C1, Canada
| | - Eliseos J Mucaki
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON, N6A 2C1, Canada
| | - Peter K Rogan
- Departments of Biochemistry and Computer Science, Western University, London, ON, N6A 2C1, Canada
| |
Collapse
|
24
|
WISCOD: a statistical web-enabled tool for the identification of significant protein coding regions. BIOMED RESEARCH INTERNATIONAL 2014; 2014:282343. [PMID: 25313355 PMCID: PMC4181902 DOI: 10.1155/2014/282343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/01/2013] [Revised: 12/18/2013] [Accepted: 02/11/2014] [Indexed: 11/17/2022]
Abstract
Classically, gene prediction programs are based on detecting signals such as boundary sites (splice sites, starts, and stops) and coding regions in the DNA sequence in order to build potential exons and join them into a gene structure. Although nowadays it is possible to improve their performance with additional information from related species or/and cDNA databases, further improvement at any step could help to obtain better predictions. Here, we present WISCOD, a web-enabled tool for the identification of significant protein coding regions, a novel software tool that tackles the exon prediction problem in eukaryotic genomes. WISCOD has the capacity to detect real exons from large lists of potential exons, and it provides an easy way to use global P value called expected probability of being a false exon (EPFE) that is useful for ranking potential exons in a probabilistic framework, without additional computational costs. The advantage of our approach is that it significantly increases the specificity and sensitivity (both between 80% and 90%) in comparison to other ab initio methods (where they are in the range of 70–75%). WISCOD is written in JAVA and R and is available to download and to run in a local mode on Linux and Windows platforms.
Collapse
|
25
|
Hua W, Wang J, Zhao J. Discrete Ramanujan transform for distinguishing the protein coding regions from other regions. Mol Cell Probes 2014; 28:228-36. [DOI: 10.1016/j.mcp.2014.04.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2013] [Revised: 03/31/2014] [Accepted: 04/17/2014] [Indexed: 11/25/2022]
|
26
|
ToPS: a framework to manipulate probabilistic models of sequence data. PLoS Comput Biol 2013; 9:e1003234. [PMID: 24098098 PMCID: PMC3789777 DOI: 10.1371/journal.pcbi.1003234] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2012] [Accepted: 08/05/2013] [Indexed: 11/19/2022] Open
Abstract
Discrete Markovian models can be used to characterize patterns in sequences of values and have many applications in biological sequence analysis, including gene prediction, CpG island detection, alignment, and protein profiling. We present ToPS, a computational framework that can be used to implement different applications in bioinformatics analysis by combining eight kinds of models: (i) independent and identically distributed process; (ii) variable-length Markov chain; (iii) inhomogeneous Markov chain; (iv) hidden Markov model; (v) profile hidden Markov model; (vi) pair hidden Markov model; (vii) generalized hidden Markov model; and (viii) similarity based sequence weighting. The framework includes functionality for training, simulation and decoding of the models. Additionally, it provides two methods to help parameter setting: Akaike and Bayesian information criteria (AIC and BIC). The models can be used stand-alone, combined in Bayesian classifiers, or included in more complex, multi-model, probabilistic architectures using GHMMs. In particular the framework provides a novel, flexible, implementation of decoding in GHMMs that detects when the architecture can be traversed efficiently.
Collapse
|
27
|
Shakya DK, Saxena R, Sharma SN. An adaptive window length strategy for eukaryotic CDS prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1241-1252. [PMID: 24384711 DOI: 10.1109/tcbb.2013.76] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Signal processing-based algorithms for identification of coding sequences (CDS) in eukaryotes are non-data driven and exploit the presence of three-base periodicity in these regions for their detection. Three-base periodicity is commonly detected using short time Fourier transform (STFT) that uses a window function of fixed length. As the length of the protein coding and noncoding regions varies widely, the identification accuracy of STFT-based algorithms is poor. In this paper, a novel signal processing-based algorithm is developed by enabling the window length adaptation in STFT of DNA sequences for improving the identification of three-base periodicity. The length of the window function has been made adaptive in coding regions to maximize the magnitude of period-3 measure, whereas in the noncoding regions, the window length is tailored to minimize this measure. Simulation results on bench mark data sets demonstrate the advantage of this algorithm when compared with other non-data-driven methods for CDS prediction.
Collapse
Affiliation(s)
| | - Rajiv Saxena
- Jaypee University of Engineering and Technology, Raghogarh, Guna
| | | |
Collapse
|
28
|
Abstract
Many bioinformatics problems, such as sequence alignment, gene prediction, phylogenetic tree estimation and RNA secondary structure prediction, are often affected by the 'uncertainty' of a solution, that is, the probability of the solution is extremely small. This situation arises for estimation problems on high-dimensional discrete spaces in which the number of possible discrete solutions is immense. In the analysis of biological data or the development of prediction algorithms, this uncertainty should be handled carefully and appropriately. In this review, I will explain several methods to combat this uncertainty, presenting a number of examples in bioinformatics. The methods include (i) avoiding point estimation, (ii) maximum expected accuracy (MEA) estimations and (iii) several strategies to design a pipeline involving several prediction methods. I believe that the basic concepts and ideas described in this review will be generally useful for estimation problems in various areas of bioinformatics.
Collapse
|
29
|
Abstract
By its very nature, genomics produces large, high-dimensional datasets that are well suited to analysis by machine learning approaches. Here, we explain some key aspects of machine learning that make it useful for genome annotation, with illustrative examples from ENCODE.
Collapse
Affiliation(s)
- Kevin Y Yip
- Program in Computational Biology and Bioinformatics, Yale University, 260/266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, 260/266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
- Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
- CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Chao Cheng
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
- Institute for Quantitative Biomedical Sciences, Norris Cotton Cancer Center, Geisel School of Medicine at Dartmouth, Lebanon, NH 03766, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, 260/266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, 260/266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, 51 Prospect Street, New Haven, CT 06511, USA
| |
Collapse
|
30
|
Gupta A, Singh TR. SHIFT: server for hidden stops analysis in frame-shifted translation. BMC Res Notes 2013; 6:68. [PMID: 23432998 PMCID: PMC3598200 DOI: 10.1186/1756-0500-6-68] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2012] [Accepted: 02/21/2013] [Indexed: 02/07/2023] Open
Abstract
Background Frameshift is one of the three classes of recoding. Frame-shifts lead to waste of energy, resources and activity of the biosynthetic machinery. In addition, some peptides synthesized after frame-shifts are probably cytotoxic which serve as plausible cause for innumerable number of diseases and disorders such as muscular dystrophies, lysosomal storage disorders, and cancer. Hidden stop codons occur naturally in coding sequences among all organisms. These codons are associated with the early termination of translation for incorrect reading frame selection and help to reduce the metabolic cost related to the frameshift events. Researchers have identified several consequences of hidden stop codons and their association with myriad disorders. However the wealth of information available is speckled and not effortlessly acquiescent to data-mining. To reduce this gap, this work describes an algorithmic web based tool to study hidden stops in frameshifted translation for all the lineages through respective genetic code systems. Findings This paper describes SHIFT, an algorithmic web application tool that provides a user-friendly interface for identifying and analyzing hidden stops in frameshifted translation of genomic sequences for all available genetic code systems. We have calculated the correlation between codon usage frequencies and the plausible contribution of codons towards hidden stops in an off-frame context. Markovian chains of various order have been used to model hidden stops in frameshifted peptides and their evolutionary association with naturally occurring hidden stops. In order to obtain reliable and persuasive estimates for the naturally occurring and predicted hidden stops statistical measures have been implemented. Conclusions This paper presented SHIFT, an algorithmic tool that allows user-friendly exploration, analysis, and visualization of hidden stop codons in frameshifted translations. It is expected that this web based tool would serve as a useful complement for analyzing hidden stop codons in all available genetic code systems. SHIFT is freely available for academic and research purpose at http://www.nuccore.org/shift/.
Collapse
Affiliation(s)
- Arun Gupta
- School of Computer Science and IT, DAVV, Indore, M.P., India
| | | |
Collapse
|
31
|
Maiolica A, Jünger MA, Ezkurdia I, Aebersold R. Targeted proteome investigation via selected reaction monitoring mass spectrometry. J Proteomics 2012; 75:3495-513. [PMID: 22579752 DOI: 10.1016/j.jprot.2012.04.048] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2011] [Revised: 04/27/2012] [Accepted: 04/29/2012] [Indexed: 12/20/2022]
Abstract
Due to the enormous complexity of proteomes which constitute the entirety of protein species expressed by a certain cell or tissue, proteome-wide studies performed in discovery mode are still limited in their ability to reproducibly identify and quantify all proteins present in complex biological samples. Therefore, the targeted analysis of informative subsets of the proteome has been beneficial to generate reproducible data sets across multiple samples. Here we review the repertoire of antibody- and mass spectrometry (MS) -based analytical tools which is currently available for the directed analysis of predefined sets of proteins. The topics of emphasis for this review are Selected Reaction Monitoring (SRM) mass spectrometry, emerging tools to control error rates in targeted proteomic experiments, and some representative examples of applications. The ability to cost- and time-efficiently generate specific and quantitative assays for large numbers of proteins and posttranslational modifications has the potential to greatly expand the range of targeted proteomic coverage in biological studies. This article is part of a Special Section entitled: Understanding genome regulation and genetic diversity by mass spectrometry.
Collapse
Affiliation(s)
- Alessio Maiolica
- Department of Biology, Institute of Molecular Systems Biology, Zurich, Switzerland
| | | | | | | |
Collapse
|
32
|
Aittokallio T, Kurki M, Nevalainen O, Nikula T, West A, Lahesmaa R. Computational Strategies for Analyzing Data in Gene Expression Microarray Experiments. J Bioinform Comput Biol 2012; 1:541-86. [PMID: 15290769 DOI: 10.1142/s0219720003000319] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2003] [Revised: 07/02/2003] [Indexed: 11/18/2022]
Abstract
Microarray analysis has become a widely used method for generating gene expression data on a genomic scale. Microarrays have been enthusiastically applied in many fields of biological research, even though several open questions remain about the analysis of such data. A wide range of approaches are available for computational analysis, but no general consensus exists as to standard for microarray data analysis protocol. Consequently, the choice of data analysis technique is a crucial element depending both on the data and on the goals of the experiment. Therefore, basic understanding of bioinformatics is required for optimal experimental design and meaningful interpretation of the results. This review summarizes some of the common themes in DNA microarray data analysis, including data normalization and detection of differential expression. Algorithms are demonstrated by analyzing cDNA microarray data from an experiment monitoring gene expression in T helper cells. Several computational biology strategies, along with their relative merits, are overviewed and potential areas for additional research discussed. The goal of the review is to provide a computational framework for applying and evaluating such bioinformatics strategies. Solid knowledge of microarray informatics contributes to the implementation of more efficient computational protocols for the given data obtained through microarray experiments.
Collapse
Affiliation(s)
- Tero Aittokallio
- Department of Computational Biology, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa-Shi, Chiba 277-8562, Japan.
| | | | | | | | | | | |
Collapse
|
33
|
Hawkins T, Kihara D. FUNCTION PREDICTION OF UNCHARACTERIZED PROTEINS. J Bioinform Comput Biol 2011; 5:1-30. [PMID: 17477489 DOI: 10.1142/s0219720007002503] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2006] [Revised: 09/23/2006] [Accepted: 10/10/2006] [Indexed: 11/18/2022]
Abstract
Function prediction of uncharacterized protein sequences generated by genome projects has emerged as an important focus for computational biology. We have categorized several approaches beyond traditional sequence similarity that utilize the overwhelmingly large amounts of available data for computational function prediction, including structure-, association (genomic context)-, interaction (cellular context)-, process (metabolic context)-, and proteomics-experiment-based methods. Because they incorporate structural and experimental data that is not used in sequence-based methods, they can provide additional accuracy and reliability to protein function prediction. Here, first we review the definition of protein function. Then the recent developments of these methods are introduced with special focus on the type of predictions that can be made. The need for further development of comprehensive systems biology techniques that can utilize the ever-increasing data presented by the genomics and proteomics communities is emphasized. For the readers' convenience, tables of useful online resources in each category are included. The role of computational scientists in the near future of biological research and the interplay between computational and experimental biology are also addressed.
Collapse
Affiliation(s)
- Troy Hawkins
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| | | |
Collapse
|
34
|
Bi C. SEAM: A STOCHASTIC EM-TYPE ALGORITHM FOR MOTIF-FINDING IN BIOPOLYMER SEQUENCES. J Bioinform Comput Biol 2011; 5:47-77. [PMID: 17477491 DOI: 10.1142/s0219720007002527] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2006] [Revised: 08/22/2006] [Accepted: 10/14/2006] [Indexed: 12/21/2022]
Abstract
Position weight matrix-based statistical modeling for the identification and characterization of motif sites in a set of unaligned biopolymer sequences is presented. This paper describes and implements a new algorithm, the Stochastic EM-type Algorithm for Motif-finding (SEAM), and redesigns and implements the EM-based motif-finding algorithm called deterministic EM (DEM) for comparison with SEAM, its stochastic counterpart. The gold standard example, cyclic adenosine monophosphate receptor protein (CRP) binding sequences, together with other biological sequences, is used to illustrate the performance of the new algorithm and compare it with other popular motif-finding programs. The convergence of the new algorithm is shown by simulation. The in silico experiments using simulated and biological examples illustrate the power and robustness of the new algorithm SEAM in de novo motif discovery.
Collapse
Affiliation(s)
- Chengpeng Bi
- Children's Mercy Hospitals and Clinics, 2401 Gillham Road, Pediatrics Research Building, Third Floor, Kansas City, Missouri 64108, USA.
| |
Collapse
|
35
|
Haas BJ, Zeng Q, Pearson MD, Cuomo CA, Wortman JR. Approaches to Fungal Genome Annotation. Mycology 2011; 2:118-141. [PMID: 22059117 PMCID: PMC3207268 DOI: 10.1080/21501203.2011.606851] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Fungal genome annotation is the starting point for analysis of genome content. This generally involves the application of diverse methods to identify features on a genome assembly such as protein-coding and non-coding genes, repeats and transposable elements, and pseudogenes. Here we describe tools and methods leveraged for eukaryotic genome annotation with a focus on the annotation of fungal nuclear and mitochondrial genomes. We highlight the application of the latest technologies and tools to improve the quality of predicted gene sets. The Broad Institute eukaryotic genome annotation pipeline is described as one example of how such methods and tools are integrated into a sequencing center's production genome annotation environment.
Collapse
Affiliation(s)
- Brian J Haas
- Genome Sequencing and Analysis Program, Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, U.S.A
| | | | | | | | | |
Collapse
|
36
|
Jung S, Swart EC, Minx PJ, Magrini V, Mardis ER, Landweber LF, Eddy SR. Exploiting Oxytricha trifallax nanochromosomes to screen for non-coding RNA genes. Nucleic Acids Res 2011; 39:7529-47. [PMID: 21715380 PMCID: PMC3177221 DOI: 10.1093/nar/gkr501] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
We took advantage of the unusual genomic organization of the ciliate Oxytricha trifallax to screen for eukaryotic non-coding RNA (ncRNA) genes. Ciliates have two types of nuclei: a germ line micronucleus that is usually transcriptionally inactive, and a somatic macronucleus that contains a reduced, fragmented and rearranged genome that expresses all genes required for growth and asexual reproduction. In some ciliates including Oxytricha, the macronuclear genome is particularly extreme, consisting of thousands of tiny 'nanochromosomes', each of which usually contains only a single gene. Because the organism itself identifies and isolates most of its genes on single-gene nanochromosomes, nanochromosome structure could facilitate the discovery of unusual genes or gene classes, such as ncRNA genes. Using a draft Oxytricha genome assembly and a custom-written protein-coding genefinding program, we identified a subset of nanochromosomes that lack any detectable protein-coding gene, thereby strongly enriching for nanochromosomes that carry ncRNA genes. We found only a small proportion of non-coding nanochromosomes, suggesting that Oxytricha has few independent ncRNA genes besides homologs of already known RNAs. Other than new members of known ncRNA classes including C/D and H/ACA snoRNAs, our screen identified one new family of small RNA genes, named the Arisong RNAs, which share some of the features of small nuclear RNAs.
Collapse
Affiliation(s)
- Seolkyoung Jung
- Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn VA 20147, USA
| | | | | | | | | | | | | |
Collapse
|
37
|
ZHU LF, HE X, YUAN DJ, XU L, XU L, TU LL, SHEN GX, ZHANG H, ZHANG XL. Genome-Wide Identification of Genes Responsive to ABA and Cold/Salt Stresses in Gossypium hirsutum by Data-Mining and Expression Pattern Analysis. ACTA ACUST UNITED AC 2011. [DOI: 10.1016/s1671-2927(11)60030-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
38
|
Goh MY, Pan MZ, Blake DP, Wan KL, Song BK. Eimeria maxima phosphatidylinositol 4-phosphate 5-kinase: locus sequencing, characterization, and cross-phylum comparison. Parasitol Res 2011; 108:611-20. [PMID: 20938684 DOI: 10.1007/s00436-010-2104-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2010] [Accepted: 09/23/2010] [Indexed: 10/19/2022]
Abstract
Phosphatidylinositol 4-phosphate 5-kinase (PIP5K) may play an important role in host-cell invasion by the Eimeria species, protozoan parasites which can cause severe intestinal disease in livestock. Here, we report the structural organization of the PIP5K gene in Eimeria maxima (Weybridge strain). Two E. maxima BAC clones carrying the E. maxima PIP5K (EmPIP5K) coding sequences were selected for shotgun sequencing, yielding a 9.1-kb genomic segment. The EmPIP5K coding region was initially identified using in silico gene-prediction approaches and subsequently confirmed by mapping rapid amplification of cDNA ends and RT-PCR-generated cDNA sequence to its genomic segment. The putative EmPIP5K gene was located at position 710-8036 nt on the complimentary strand and comprised of 23 exons. Alignment of the 1147 amino acid sequence with previously annotated PIP5K proteins from other Apicomplexa species detected three conserved motifs encompassing the kinase core domain, which has been shown by previous protein deletion studies to be necessary for PIP5K protein function. Phylogenetic analysis provided further evidence that the putative EmPIP5K protein is orthologous to that of other Apicomplexa. Subsequent comparative gene structure characterization revealed events of intron loss/gain throughout the evolution of the apicomplexan PIP5K gene. Further scrutiny of the genomic structure revealed a possible trend towards "intron gain" between two of the motif regions. Our findings offer preliminary insights into the structural variations that have occurred during the evolution of the PIP5K locus and may aid in understanding the functional role of this gene in the cellular biology of apicomplexan parasites.
Collapse
Affiliation(s)
- Mei-Yen Goh
- School of Science, Monash University Sunway Campus, Jalan Lagoon Selatan, 46150 Bandar Sunway, Selangor, DE, Malaysia
| | | | | | | | | |
Collapse
|
39
|
Renuse S, Chaerkady R, Pandey A. Proteogenomics. Proteomics 2011; 11:620-30. [DOI: 10.1002/pmic.201000615] [Citation(s) in RCA: 106] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2010] [Revised: 11/14/2010] [Accepted: 11/16/2010] [Indexed: 12/13/2022]
|
40
|
Buckley KM, Florea LD, Smith LC. A method for identifying alternative or cryptic donor splice sites within gene and mRNA sequences. Comparisons among sequences from vertebrates, echinoderms and other groups. BMC Genomics 2009; 10:318. [PMID: 19607703 PMCID: PMC2721852 DOI: 10.1186/1471-2164-10-318] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2008] [Accepted: 07/16/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND As the amount of genome sequencing data grows, so does the problem of computational gene identification, and in particular, the splicing signals that flank exon borders. Traditional methods for identifying splicing signals have been created and optimized using sequences from model organisms, mostly vertebrate and yeast species. However, as genome sequencing extends across the animal kingdom and includes various invertebrate species, the need for mechanisms to recognize splice signals in these organisms increases as well. With that aim in mind, we generated a model for identifying donor and acceptor splice sites that was optimized using sequences from the purple sea urchin, Strongylocentrotus purpuratus. This model was then used to assess the possibility of alternative or cryptic splicing within the highly variable immune response gene family known as 185/333. RESULTS A donor splice site model was generated from S. purpuratus sequences that incorporates non-adjacent dependences among positions within the 9 nt splice signal and uses position weight matrices to determine the probability that the site is used for splicing. The Purpuratus model was shown to predict splice signals better than a similar model created from vertebrate sequences. Although the Purpuratus model was able to correctly predict the true splice sites within the 185/333 genes, no evidence for alternative or trans-gene splicing was observed. CONCLUSION The data presented herein describe the first published analyses of echinoderm splice sites and suggest that the previous methods of identifying splice signals that are based largely on vertebrate sequences may be insufficient. Furthermore, alternative or trans-gene splicing does not appear to be acting as a diversification mechanism in the 185/333 gene family.
Collapse
Affiliation(s)
- Katherine M Buckley
- The Department of Biological Sciences, Washington University, Washington, DC 20052, USA.
| | | | | |
Collapse
|
41
|
Bill BR, Petzold AM, Clark KJ, Schimmenti LA, Ekker SC. A primer for morpholino use in zebrafish. Zebrafish 2009; 6:69-77. [PMID: 19374550 DOI: 10.1089/zeb.2008.0555] [Citation(s) in RCA: 314] [Impact Index Per Article: 20.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Morpholino oligonucleotides are the most common anti-sense "knockdown" technique used in zebrafish (Danio rerio). This review discusses common practices for the design, preparation, and deployment of morpholinos in this vertebrate model system. Off-targeting effects of morpholinos are discussed as well as method to minimize this potentially confounding variable via co-injection of a tP53-targeting morpholino. Finally, new uses of morpholinos are summarized and contextualized with respect to the complementary, DNA-based knockout technologies recently developed for zebrafish.
Collapse
Affiliation(s)
- Brent R Bill
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, USA
| | | | | | | | | |
Collapse
|
42
|
Zhou L, Pertea M, Delcher AL, Florea L. Sim4cc: a cross-species spliced alignment program. Nucleic Acids Res 2009; 37:e80. [PMID: 19429899 PMCID: PMC2699533 DOI: 10.1093/nar/gkp319] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Advances in sequencing technologies have accelerated the sequencing of new genomes, far outpacing the generation of gene and protein resources needed to annotate them. Direct comparison and alignment of existing cDNA sequences from a related species is an effective and readily available means to determine genes in the new genomes. Current spliced alignment programs are inadequate for comparing sequences between different species, owing to their low sensitivity and splice junction accuracy. A new spliced alignment tool, sim4cc, overcomes problems in the earlier tools by incorporating three new features: universal spaced seeds, to increase sensitivity and allow comparisons between species at various evolutionary distances, and powerful splice signal models and evolutionarily-aware alignment techniques, to improve the accuracy of gene models. When tested on vertebrate comparisons at diverse evolutionary distances, sim4cc had significantly higher sensitivity compared to existing alignment programs, more than 10% higher than the closest competitor for some comparisons, while being comparable in speed to its predecessor, sim4. Sim4cc can be used in one-to-one or one-to-many comparisons of genomic and cDNA sequences, and can also be effectively incorporated into a high-throughput annotation engine, as demonstrated by the mapping of 64 000 Fagus grandifolia 454 ESTs and unigenes to the poplar genome.
Collapse
Affiliation(s)
- Leming Zhou
- Department of Computer Science, George Washington University, Washington, DC 20052, USA
| | | | | | | |
Collapse
|
43
|
Jiang Y, Cukic B, Adjeroh DA, Skinner HD, Lin J, Shen QJ, Jiang BH. An algorithm for identifying novel targets of transcription factor families: application to hypoxia-inducible factor 1 targets. Cancer Inform 2009; 7:75-89. [PMID: 19352460 PMCID: PMC2664698 DOI: 10.4137/cin.s1054] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Efficient and effective analysis of the growing genomic databases requires the development of adequate computational tools. We introduce a fast method based on the suffix tree data structure for predicting novel targets of hypoxia-inducible factor 1 (HIF-1) from huge genome databases. The suffix tree data structure has two powerful applications here: one is to extract unknown patterns from multiple strings/sequences in linear time; the other is to search multiple strings/sequences using multiple patterns in linear time. Using 15 known HIF-1 target gene sequences as a training set, we extracted 105 common patterns that all occur in the 15 training genes using suffix trees. Using these 105 common patterns along with known subsequences surrounding HIF-1 binding sites from the literature, the algorithm searches a genome database that contains 2,078,786 DNA sequences. It reported 258 potentially novel HIF-1 targets including 25 known HIF-1 targets. Based on microarray studies from the literature, 17 putative genes were confirmed to be upregulated by HIF-1 or hypoxia inside these 258 genes. We further studied one of the potential targets, COX-2, in the biological lab; and showed that it was a biologically relevant HIF-1 target. These results demonstrate that our methodology is an effective computational approach for identifying novel HIF-1 targets.
Collapse
Affiliation(s)
- Yue Jiang
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV 26506, USA.
| | | | | | | | | | | | | |
Collapse
|
44
|
Abstract
Proteolytic enzymes play an essential role in many biological and pathological processes. Taking advantage of the recent availability of several mammalian genome sequences and by using a set of computational approaches, we have annotated and compared the degradome or complete repertoire of proteases of different mammalian species including human, mouse, rat, and chimpanzee. These studies have allowed us to expand our knowledge about the complexity, evolution, and diversity of proteolytic systems, which represent about 2% of the studied genomes. In this chapter, we review the genomic and computational methodologies used in this degradomic analysis and summarize the main findings derived from comparison of mammalian degradomes.
Collapse
Affiliation(s)
- Gonzalo R Ordóñez
- Departamento de Bioquímica y Biología Molecular, Facultad de Medicina, Instituto Universitario de Oncología, Universidad de Oviedo, Oviedo, Spain
| | | | | | | |
Collapse
|
45
|
Abstract
The sequence of many eukaryotic genomes is nowadays available from a personal computer to any researcher in the world-wide scientific community. However, the sequences are worthless without the adequate annotation of the biological meaningful elements. The annotation of the genes, in particular, is a challenging task that can not be tackled without the aid of specific bioinformatics tools. We present in this chapter a simple protocol mainly based on the combination of the program GeneID and other computational tools to annotate the location of a gene, which was previously annotated in D. melanogaster, in the recently assembled genome of D. yakuba.
Collapse
Affiliation(s)
- Enrique Blanco
- Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Spain
| | | |
Collapse
|
46
|
Wan L, Li D, Zhang D, Liu X, Fu WJ, Zhu L, Deng M, Sun F, Qian M. Conservation and implications of eukaryote transcriptional regulatory regions across multiple species. BMC Genomics 2008; 9:623. [PMID: 19099599 PMCID: PMC2640395 DOI: 10.1186/1471-2164-9-623] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2008] [Accepted: 12/20/2008] [Indexed: 01/14/2023] Open
Abstract
Background Increasing evidence shows that whole genomes of eukaryotes are almost entirely transcribed into both protein coding genes and an enormous number of non-protein-coding RNAs (ncRNAs). Therefore, revealing the underlying regulatory mechanisms of transcripts becomes imperative. However, for a complete understanding of transcriptional regulatory mechanisms, we need to identify the regions in which they are found. We will call these transcriptional regulation regions, or TRRs, which can be considered functional regions containing a cluster of regulatory elements that cooperatively recruit transcriptional factors for binding and then regulating the expression of transcripts. Results We constructed a hierarchical stochastic language (HSL) model for the identification of core TRRs in yeast based on regulatory cooperation among TRR elements. The HSL model trained based on yeast achieved comparable accuracy in predicting TRRs in other species, e.g., fruit fly, human, and rice, thus demonstrating the conservation of TRRs across species. The HSL model was also used to identify the TRRs of genes, such as p53 or OsALYL1, as well as microRNAs. In addition, the ENCODE regions were examined by HSL, and TRRs were found to pervasively locate in the genomes. Conclusion Our findings indicate that 1) the HSL model can be used to accurately predict core TRRs of transcripts across species and 2) identified core TRRs by HSL are proper candidates for the further scrutiny of specific regulatory elements and mechanisms. Meanwhile, the regulatory activity taking place in the abundant numbers of ncRNAs might account for the ubiquitous presence of TRRs across the genome. In addition, we also found that the TRRs of protein coding genes and ncRNAs are similar in structure, with the latter being more conserved than the former.
Collapse
Affiliation(s)
- Lin Wan
- School of Mathematical Sciences, Peking University, Beijing, PR China.
| | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Paar V, Pavin N, Basar I, Rosandić M, Gluncić M, Paar N. Hierarchical structure of cascade of primary and secondary periodicities in Fourier power spectrum of alphoid higher order repeats. BMC Bioinformatics 2008; 9:466. [PMID: 18980673 PMCID: PMC2661002 DOI: 10.1186/1471-2105-9-466] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2008] [Accepted: 11/03/2008] [Indexed: 11/28/2022] Open
Abstract
Background Identification of approximate tandem repeats is an important task of broad significance and still remains a challenging problem of computational genomics. Often there is no single best approach to periodicity detection and a combination of different methods may improve the prediction accuracy. Discrete Fourier transform (DFT) has been extensively used to study primary periodicities in DNA sequences. Here we investigate the application of DFT method to identify and study alphoid higher order repeats. Results We used method based on DFT with mapping of symbolic into numerical sequence to identify and study alphoid higher order repeats (HOR). For HORs the power spectrum shows equidistant frequency pattern, with characteristic two-level hierarchical organization as signature of HOR. Our case study was the 16 mer HOR tandem in AC017075.8 from human chromosome 7. Very long array of equidistant peaks at multiple frequencies (more than a thousand higher harmonics) is based on fundamental frequency of 16 mer HOR. Pronounced subset of equidistant peaks is based on multiples of the fundamental HOR frequency (multiplication factor n for nmer) and higher harmonics. In general, nmer HOR-pattern contains equidistant secondary periodicity peaks, having a pronounced subset of equidistant primary periodicity peaks. This hierarchical pattern as signature for HOR detection is robust with respect to monomer insertions and deletions, random sequence insertions etc. For a monomeric alphoid sequence only primary periodicity peaks are present. The 1/fβ – noise and periodicity three pattern are missing from power spectra in alphoid regions, in accordance with expectations. Conclusion DFT provides a robust detection method for higher order periodicity. Easily recognizable HOR power spectrum is characterized by hierarchical two-level equidistant pattern: higher harmonics of the fundamental HOR-frequency (secondary periodicity) and a subset of pronounced peaks corresponding to constituent monomers (primary periodicity). The number of lower frequency peaks (secondary periodicity) below the frequency of the first primary periodicity peak reveals the size of nmer HOR, i.e., the number n of monomers contained in consensus HOR.
Collapse
Affiliation(s)
- Vladimir Paar
- Faculty of Science, University of Zagreb, Bijenicka 32, Zagreb, Croatia.
| | | | | | | | | | | |
Collapse
|
48
|
Li Y, Zhu Y, Liu Y, Shu Y, Meng F, Lu Y, Bai X, Liu B, Guo D. Genome-wide identification of osmotic stress response gene in Arabidopsis thaliana. Genomics 2008; 92:488-93. [PMID: 18804526 DOI: 10.1016/j.ygeno.2008.08.011] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2008] [Revised: 08/14/2008] [Accepted: 08/18/2008] [Indexed: 11/18/2022]
Abstract
In this paper, we present a cis-regulatory element based computational approach to genome-wide identification of genes putatively responding to various osmotic stresses in Arabidopsis thaliana. The rationale of our method is that gene expression is largely controlled at the transcriptional level through the interactions between transcription factors and cis-regulatory elements. Using cis-regulatory motifs known to regulate osmotic stress response, we therefore built an artificial neural network model to identify other functionally relevant genes involved in the same process. We performed Gene Ontology enrichment analysis on the 500 top-scoring predictions and found that, except for un-annotated ORFs ( approximately 40%), 91.3% of the enriched GO classification was related to stress response and ABA response. Publicly available gene expression profiling data of Arabidopsis under various stresses were used for cross validation. We also conducted RT-PCR analysis to experimentally verify selected predictions. According to our results, transcript levels of 27 out of 41 top-ranked genes (65.8%) altered under various osmotic stress treatments. We believe that a similar approach could be extensively adopted elsewhere to infer gene function in various cellular processes from different species.
Collapse
Affiliation(s)
- Yong Li
- Plant Bioengineering Laboratory, Northeast Agricultural University, Harbin, China
| | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Knapp K, Chonka A, Chen YPP. POEM, A 3-dimensional exon taxonomy and patterns in untranslated exons. BMC Genomics 2008; 9:428. [PMID: 18803852 PMCID: PMC2561055 DOI: 10.1186/1471-2164-9-428] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2008] [Accepted: 09/20/2008] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND The existence of exons and introns has been known for thirty years. Despite this knowledge, there is a lack of formal research into the categorization of exons. Exon taxonomies used by researchers tend to be selected ad hoc or based on an information poor de-facto standard. Exons have been shown to have specific properties and functions based on among other things their location and order. These factors should play a role in the naming to increase specificity about which exon type(s) are in question. RESULTS POEM (Protein Oriented Exon Monikers) is a new taxonomy focused on protein proximal exons. It integrates three dimensions of information (Global Position, Regional Position and Region), thus its exon categories are based on known statistical exon features. POEM is applied to two congruent untranslated exon datasets resulting in the following statistical properties. Using the POEM taxonomy previous wide ranging estimates of initial 5' untranslated region exons are resolved. According to our datasets, 29-36% of genes have wholly untranslated first exons. Untranslated exon containing sequences are shown to have consistently up to 6 times more 5' untranslated exons than 3' untranslated exons. Finally, three exon patterns are determined which account for 70% of untranslated exon genes. CONCLUSION We describe a thorough three-dimensional exon taxonomy called POEM, which is biologically and statistically relevant. No previous taxonomy provides such fine grained information and yet still includes all valid information dimensions. The use of POEM will improve the accuracy of genefinder comparisons and analysis by means of a common taxonomy. It will also facilitate unambiguous communication due to its fine granularity.
Collapse
Affiliation(s)
- Keith Knapp
- Faculty of Science and Technology, Deakin University, Victoria, Australia.
| | | | | |
Collapse
|
50
|
Morello L, Breviario D. Plant spliceosomal introns: not only cut and paste. Curr Genomics 2008; 9:227-38. [PMID: 19452040 PMCID: PMC2682935 DOI: 10.2174/138920208784533629] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2008] [Revised: 04/21/2008] [Accepted: 04/29/2008] [Indexed: 01/13/2023] Open
Abstract
Spliceosomal introns in higher eukaryotes are present in a high percentage of protein coding genes and represent a high proportion of transcribed nuclear DNA. In the last fifteen years, a growing mass of data concerning functional roles carried out by such intervening sequences elevated them from a selfish burden carried over by the nucleus to important active regulatory elements. Introns mediate complex gene regulation via alternative splicing; they may act in cis as expression enhancers through IME (intron-mediated enhancement of gene expression) and in trans as negative regulators through the generation of intronic microRNA. Furthermore, some introns also contain promoter sequences for alternative transcripts. Nevertheless, such regulatory roles do not require long conserved sequences, so that introns are relatively free to evolve faster than exons: this feature makes them important tools for evolutionary studies and provides the basis for the development of DNA molecular markers for polymorphisms detection. A survey of introns functions in the plant kingdom is presented.
Collapse
Affiliation(s)
| | - D Breviario
- Istituto Biologia e Biotecnologia Agraria, Via Bassini 15, 20133 Milano, Italy
| |
Collapse
|