1
|
Deviatiiarov RM, Gams A, Kulakovskiy IV, Buyan A, Meshcheryakov G, Syunyaev R, Singh R, Shah P, Tatarinova TV, Gusev O, Efimov IR. An atlas of transcribed human cardiac promoters and enhancers reveals an important role of regulatory elements in heart failure. NATURE CARDIOVASCULAR RESEARCH 2023; 2:58-75. [PMID: 39196209 DOI: 10.1038/s44161-022-00182-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2021] [Accepted: 11/02/2022] [Indexed: 08/29/2024]
Abstract
A deeper knowledge of the dynamic transcriptional activity of promoters and enhancers is needed to improve mechanistic understanding of the pathogenesis of heart failure and heart diseases. In this study, we used cap analysis of gene expression (CAGE) to identify and quantify the activity of transcribed regulatory elements (TREs) in the four cardiac chambers of 21 healthy and ten failing adult human hearts. We identified 17,668 promoters and 14,920 enhancers associated with the expression of 14,519 genes. We showed how these regulatory elements are alternatively transcribed in different heart regions, in healthy versus failing hearts and in ischemic versus non-ischemic heart failure samples. Cardiac-disease-related single-nucleotide polymorphisms (SNPs) appeared to be enriched in TREs, potentially affecting the allele-specific transcription factor binding. To conclude, our open-source heart CAGE atlas will serve the cardiovascular community in improving the understanding of the role of the cardiac gene regulatory networks in cardiovascular disease and therapy.
Collapse
Affiliation(s)
- Ruslan M Deviatiiarov
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, Russia
| | - Anna Gams
- Department of Biomedical Engineering, The George Washington University, Washington, DC, USA
| | - Ivan V Kulakovskiy
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, Russia
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Andrey Buyan
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, Russia
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia
| | | | - Roman Syunyaev
- Department of Biomedical Engineering, The George Washington University, Washington, DC, USA
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Ramesh Singh
- Inova Heart and Vascular Institute, Falls Church, VA, USA
| | - Palak Shah
- Department of Biomedical Engineering, The George Washington University, Washington, DC, USA
- Inova Heart and Vascular Institute, Falls Church, VA, USA
| | - Tatiana V Tatarinova
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia.
- Department of Biology, University of La Verne, La Verne, CA, USA.
| | - Oleg Gusev
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, Russia.
- Graduate School of Medicine, Juntendo University, Tokyo, Japan.
- RIKEN Center for Integrative Medical Sciences, RIKEN, Yokohama, Japan.
- Endocrinology Research Center, Moscow, Russia.
| | - Igor R Efimov
- Department of Biomedical Engineering, The George Washington University, Washington, DC, USA.
- Department of Biomedical Engineering, Northwestern University, Chicago, IL, USA.
- Department of Medicine, Northwestern University, Chicago, IL, USA.
| |
Collapse
|
2
|
Mishra A, Siwach P, Misra P, Dhiman S, Pandey AK, Srivastava P, Jayaram B. Intron exon boundary junctions in human genome have in-built unique structural and energetic signals. Nucleic Acids Res 2021; 49:2674-2683. [PMID: 33621338 PMCID: PMC7969029 DOI: 10.1093/nar/gkab098] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2019] [Revised: 01/21/2021] [Accepted: 02/22/2021] [Indexed: 11/13/2022] Open
Abstract
Precise identification of correct exon–intron boundaries is a prerequisite to analyze the location and structure of genes. The existing framework for genomic signals, delineating exon and introns in a genomic segment, seems insufficient, predominantly due to poor sequence consensus as well as limitations of training on available experimental data sets. We present here a novel concept for characterizing exon–intron boundaries in genomic segments on the basis of structural and energetic properties. We analyzed boundary junctions on both sides of all the exons (3 28 368) of protein coding genes from human genome (GENCODE database) using 28 structural and three energy parameters. Study of sequence conservation at these sites shows very poor consensus. It is observed that DNA adopts a unique structural and energy state at the boundary junctions. Also, signals are somewhat different for housekeeping and tissue specific genes. Clustering of 31 parameters into four derived vectors gives some additional insights into the physical mechanisms involved in this biological process. Sites of structural and energy signals correlate well to the positions playing important roles in pre-mRNA splicing.
Collapse
Affiliation(s)
- Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi, India.,Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Priyanka Siwach
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi, India.,Department of Biotechnology, Chaudhary Devi Lal University, Sirsa, Haryana, India
| | - Pallavi Misra
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi, India
| | - Simran Dhiman
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi, India
| | | | - Parul Srivastava
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi, India
| | - B Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi, India.,Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India.,Department of Chemistry, Indian Institute of Technology, Delhi, India
| |
Collapse
|
3
|
Zhao N, Guo H, Jia L, Guo B, Zheng D, Liu S, Zhang B. Genome assembly and annotation at the chromosomal level of first Pleuronectidae: Verasper variegatus provides a basis for phylogenetic study of Pleuronectiformes. Genomics 2021; 113:717-726. [PMID: 33535123 DOI: 10.1016/j.ygeno.2021.01.024] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 01/26/2021] [Accepted: 01/28/2021] [Indexed: 01/09/2023]
Abstract
High quality genome is of great significance for the mining of biological information resources of species. Up to now, the genomic information of several important economic flatfishes has been well explained. All these fishes are eyes on left side-type, and no high-quality genome of eyes on right side-type species has been reported. In this study, we applied a combined strategy involving stLFR and Hi-C technologies to generate sequencing data for constructing the chromosomal genome of Verasper variegates, which belongs to Pleuronectidae with characteristic of eyes on right side. The size of genome of V. variegatus is 556 Mb. More than 97.2% of BUSCO genes were detected, and N50 lengths of the contigs and scaffolds reached 79.8 Kb and 23.8 Mb, respectively, demonstrating the outstanding completeness and sequence continuity of the genome. A total of 22,199 protein-coding genes were predicted in the assembled genome, and more than 95% of those genes could be functionally annotated. Meanwhile, the genomic collinearity, gene family and phylogenetic analyses of similar species in Pleuronectiformes were also investigated and portrayed for metamorphosis and benthic adaptation. Sex related genes mapping has also been achieved at the chromosome level. This study is the first chromosomal level genome of a Pleuronectidae fish (V. variegatus). The chromosomal genome assembly constructed in this work will not only be valuable for conservation and aquaculture studies of the V. variegatus but will also be of general interest in the phylogenetic and taxonomic studies of Pleuronectiformes.
Collapse
Affiliation(s)
- Na Zhao
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Shanghai Ocean University), Ministry of Education, International Research Center for Marine Biosciences at Shanghai Ocean University, Shanghai Ocean University, Shanghai 201306, China
| | - Haobing Guo
- BGI-Qingdao, BGI-Shenzhen, Qingdao 266555, China
| | - Lei Jia
- Tianjin Fisheries Research Institute, Tianjin 300200, China
| | - Biao Guo
- Tianjin Fisheries Research Institute, Tianjin 300200, China
| | - Debin Zheng
- Tianjin Fisheries Research Institute, Tianjin 300200, China
| | - Shanshan Liu
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Shanghai Ocean University), Ministry of Education, International Research Center for Marine Biosciences at Shanghai Ocean University, Shanghai Ocean University, Shanghai 201306, China
| | - Bo Zhang
- Tianjin Fisheries Research Institute, Tianjin 300200, China.
| |
Collapse
|
4
|
Recognition of alternatively spliced cassette exons based on a hybrid model. Biochem Biophys Res Commun 2016; 471:368-72. [PMID: 26869516 DOI: 10.1016/j.bbrc.2016.02.022] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Accepted: 02/06/2016] [Indexed: 12/22/2022]
Abstract
Alternative splicing (AS) is an important mechanism of gene regulation that contributes to protein diversity. It is of great significance to recognize different kinds of AS accurately so as to understand the mechanism of gene regulation. Many in silico methods have been applied to detecting AS with vast features, but the result is far from satisfactory. In this paper, we used the features proven to be useful in recognizing AS in previous literature and proposed a hybrid method combining Gene Expression Programming (GEP) and Random Forests (RF) to classify the constitutive exons and cassette exons which is the most common AS phenomenon. GEP will firstly make prediction to the samples of strong signal, and the other samples of weak signal will be distinguished with a more complex classifier based on RF. The experiment result indicates that this method can highly improve the recognition level in this issue.
Collapse
|
5
|
Horvath A, Faucz F, Finkielstain GP, Nikita ME, Rothenbuhler A, Almeida M, Mericq V, Stratakis CA. Haplotype analysis of the promoter region of phosphodiesterase type 8B (PDE8B) in correlation with inactivating PDE8B mutation and the serum thyroid-stimulating hormone levels. Thyroid 2010; 20:363-7. [PMID: 20373981 PMCID: PMC2867554 DOI: 10.1089/thy.2009.0260] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
BACKGROUND Human phosphodiesterase (PDE) type 8B (PDE8B) is located at 5q14.1 and is known as the PDE with the highest affinity to cAMP. We recently described a family with bilateral micronodular adrenocortical disease that was apparently caused by an inactivating PDE8B mutation (H305P). As a result of a genome-wide study, a strong association between six polymorphic variants in the PDE8B promoter and serum levels of the thyroid-stimulating hormone (TSH) has been recently reported. Despite an extended analysis of the regions surrounding 5q14.1, no other potential genetic variants that could be responsible for the associated TSH levels were found. METHODS In this study, we genotyped by polymerase chain reaction the described six polymorphic variants in the PDE8B promoter in the family with micronodular adrenocortical disease and inactivating PDE8B mutation and analyzed their correlation with individual TSH values in the family members. RESULTS We observed complete segregation between the reported association and individual TSH values in the family we studied. Haplotype analysis showed that the haplotype associated with the high TSH levels is different from the one that segregated with H305P, suggesting that the mutation most probably has arisen on an allele independent of the high TSH-associated allele. CONCLUSIONS The proposed mechanism by which PDE8B may influence TSH levels is through control of cAMP signaling. Our analysis revealed separate segregation of an inactivating PDE8B allele from the high-TSH-allele and showed low TSH levels in persons who carry an inactivating PDE8B allele. These data suggest that, indeed, PDE8B may be involved in regulation of TSH levels.
Collapse
Affiliation(s)
- Anelia Horvath
- Program on Developmental Endocrinology and Genetics, Section of Endocrinology and Genetics, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| | - Fabio Faucz
- Program on Developmental Endocrinology and Genetics, Section of Endocrinology and Genetics, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
- Laboratory of Molecular Genetics, Center for Healthy and Biological Science, Pontificia Universidade Catolica do Parana, Curitiba, Brazil
| | - Gabriela P. Finkielstain
- Program on Developmental Endocrinology and Genetics, Section of Endocrinology and Genetics, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| | - Maria Eleni Nikita
- Program on Developmental Endocrinology and Genetics, Section of Endocrinology and Genetics, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| | - Anya Rothenbuhler
- Program on Developmental Endocrinology and Genetics, Section of Endocrinology and Genetics, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| | - Madson Almeida
- Program on Developmental Endocrinology and Genetics, Section of Endocrinology and Genetics, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| | - Veronica Mericq
- Faculty of Medicine, Institute of Maternal and Child Research, University of Chile, Santiago, Chile
| | - Constantine A. Stratakis
- Program on Developmental Endocrinology and Genetics, Section of Endocrinology and Genetics, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
6
|
Recognition of atypical 5' splice sites by shifted base-pairing to U1 snRNA. Nat Struct Mol Biol 2009; 16:176-82. [PMID: 19169258 PMCID: PMC2719486 DOI: 10.1038/nsmb.1546] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2008] [Accepted: 12/19/2008] [Indexed: 11/11/2022]
Abstract
Accurate pre-mRNA splicing is critical for gene expression. The 5' splice site (5' ss) — the highly diverse element at the 5' end of introns — is initially recognized via base-pairing to the 5' end of U1 small nuclear RNA (snRNA). However, many natural 5' ss have a very poor match to the consensus sequence, and are predicted to be very weak. Using genetic suppression experiments in human cells, we demonstrate that some atypical 5' ss are actually efficiently recognized by U1, in an alternative base-pairing register that is shifted by one nucleotide. These atypical 5' ss are phylogenetically widespread, and many of them are conserved. Moreover, shifted base-pairing provides an explanation for the effect of a 5' ss mutation associated with pontocerebellar hypoplasia. The unexpected flexibility in 5' ss/U1 base-pairing challenges an established paradigm, and has broad implications for splice-site prediction algorithms and gene-annotation efforts in genome projects.
Collapse
|
7
|
Baten AKMA, Halgamuge SK, Chang BCH. Fast splice site detection using information content and feature reduction. BMC Bioinformatics 2008; 9 Suppl 12:S8. [PMID: 19091031 PMCID: PMC2638148 DOI: 10.1186/1471-2105-9-s12-s8] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate identification of splice sites in DNA sequences plays a key role in the prediction of gene structure in eukaryotes. Already many computational methods have been proposed for the detection of splice sites and some of them showed high prediction accuracy. However, most of these methods are limited in terms of their long computation time when applied to whole genome sequence data. RESULTS In this paper we propose a hybrid algorithm which combines several effective and informative input features with the state of the art support vector machine (SVM). To obtain the input features we employ information content method based on Shannon's information theory, Shapiro's score scheme, and Markovian probabilities. We also use a feature elimination scheme to reduce the less informative features from the input data. CONCLUSION In this study we propose a new feature based splice site detection method that shows improved acceptor and donor splice site detection in DNA sequences when the performance is compared with various state of the art and well known methods.
Collapse
Affiliation(s)
- AKMA Baten
- Biomechanical Engineering Research Group, Department of Mechanical Engineering, Melbourne School of Engineering, The University of Melbourne, Victoria 3010, Australia
| | - SK Halgamuge
- Biomechanical Engineering Research Group, Department of Mechanical Engineering, Melbourne School of Engineering, The University of Melbourne, Victoria 3010, Australia
| | - BCH Chang
- Institute of Plant and Microbial Biology, Academia Sinica, Taiwan
| |
Collapse
|
8
|
|
9
|
Dreszer TR, Wall GD, Haussler D, Pollard KS. Biased clustered substitutions in the human genome: the footprints of male-driven biased gene conversion. Genome Res 2007; 17:1420-30. [PMID: 17785536 PMCID: PMC1987345 DOI: 10.1101/gr.6395807] [Citation(s) in RCA: 89] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We examined fixed substitutions in the human lineage since divergence from the common ancestor with the chimpanzee, and determined what fraction are AT to GC (weak-to-strong). Substitutions that are densely clustered on the chromosomes show a remarkable excess of weak-to-strong "biased" substitutions. These unexpected biased clustered substitutions (UBCS) are common near the telomeres of all autosomes but not the sex chromosomes. Regions of extreme bias are enriched for genes. Human and chimp orthologous regions show a striking similarity in the shape and magnitude of their respective UBCS maps, suggesting a relatively stable force leads to clustered bias. The strong and stable signal near telomeres may have participated in the evolution of isochores. One exception to the UBCS pattern found in all autosomes is chromosome 2, which shows a UBCS peak midchromosome, mapping to the fusion site of two ancestral chromosomes. This provides evidence that the fusion occurred as recently as 740,000 years ago and no more than approximately 3 million years ago. No biased clustering was found in SNPs, suggesting that clusters of biased substitutions are selected from mutations. UBCS is strongly correlated with male (and not female) recombination rates, which explains the lack of UBCS signal on chromosome X. These observations support the hypothesis that biased gene conversion (BGC), specifically in the male germline, played a significant role in the evolution of the human genome.
Collapse
MESH Headings
- Animals
- Chromosomes, Human, Pair 2/genetics
- Chromosomes, Human, X/genetics
- Chromosomes, Human, Y/genetics
- Evolution, Molecular
- Female
- Gene Conversion
- Gene Fusion
- Genome, Human
- Humans
- Male
- Models, Genetic
- Pan troglodytes/genetics
- Polymorphism, Single Nucleotide
- Recombination, Genetic
- Sex Characteristics
- Species Specificity
- Telomere/genetics
- Time Factors
Collapse
Affiliation(s)
- Timothy R. Dreszer
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Gregory D. Wall
- Department of Statistics, University of California, Davis, California 95616, USA
| | - David Haussler
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, California 95064, USA
- Corresponding authors.E-mail ; fax (831) 459-1809.E-mail ; fax (530) 754-9658
| | - Katherine S. Pollard
- Department of Statistics, University of California, Davis, California 95616, USA
- UC Davis Genome Center, University of California, Davis, California 95616, USA
- Corresponding authors.E-mail ; fax (831) 459-1809.E-mail ; fax (530) 754-9658
| |
Collapse
|
10
|
Buratti E, Chivers M, Královičová J, Romano M, Baralle M, Krainer AR, Vořechovský I. Aberrant 5' splice sites in human disease genes: mutation pattern, nucleotide structure and comparison of computational tools that predict their utilization. Nucleic Acids Res 2007; 35:4250-63. [PMID: 17576681 PMCID: PMC1934990 DOI: 10.1093/nar/gkm402] [Citation(s) in RCA: 151] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Despite a growing number of splicing mutations found in hereditary diseases, utilization of aberrant splice sites and their effects on gene expression remain challenging to predict. We compiled sequences of 346 aberrant 5′splice sites (5′ss) that were activated by mutations in 166 human disease genes. Mutations within the 5′ss consensus accounted for 254 cryptic 5′ss and mutations elsewhere activated 92 de novo 5′ss. Point mutations leading to cryptic 5′ss activation were most common in the first intron nucleotide, followed by the fifth nucleotide. Substitutions at position +5 were exclusively G>A transitions, which was largely attributable to high mutability rates of C/G>T/A. However, the frequency of point mutations at position +5 was significantly higher than that observed in the Human Gene Mutation Database, suggesting that alterations of this position are particularly prone to aberrant splicing, possibly due to a requirement for sequential interactions with U1 and U6 snRNAs. Cryptic 5′ss were best predicted by computational algorithms that accommodate nucleotide dependencies and not by weight-matrix models. Discrimination of intronic 5′ss from their authentic counterparts was less effective than for exonic sites, as the former were intrinsically stronger than the latter. Computational prediction of exonic de novo 5′ss was poor, suggesting that their activation critically depends on exonic splicing enhancers or silencers. The authentic counterparts of aberrant 5′ss were significantly weaker than the average human 5′ss. The development of an online database of aberrant 5′ss will be useful for studying basic mechanisms of splice-site selection, identifying splicing mutations and optimizing splice-site prediction algorithms.
Collapse
Affiliation(s)
- Emanuele Buratti
- International Centre for Genetic Engineering and Biotechnology, Padriciano 99, 34012 Trieste, Italy, University of Southampton School of Medicine, Division of Human Genetics, Southampton SO16 6YD, UK and Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Martin Chivers
- International Centre for Genetic Engineering and Biotechnology, Padriciano 99, 34012 Trieste, Italy, University of Southampton School of Medicine, Division of Human Genetics, Southampton SO16 6YD, UK and Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Jana Královičová
- International Centre for Genetic Engineering and Biotechnology, Padriciano 99, 34012 Trieste, Italy, University of Southampton School of Medicine, Division of Human Genetics, Southampton SO16 6YD, UK and Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Maurizio Romano
- International Centre for Genetic Engineering and Biotechnology, Padriciano 99, 34012 Trieste, Italy, University of Southampton School of Medicine, Division of Human Genetics, Southampton SO16 6YD, UK and Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Marco Baralle
- International Centre for Genetic Engineering and Biotechnology, Padriciano 99, 34012 Trieste, Italy, University of Southampton School of Medicine, Division of Human Genetics, Southampton SO16 6YD, UK and Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Adrian R. Krainer
- International Centre for Genetic Engineering and Biotechnology, Padriciano 99, 34012 Trieste, Italy, University of Southampton School of Medicine, Division of Human Genetics, Southampton SO16 6YD, UK and Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Igor Vořechovský
- International Centre for Genetic Engineering and Biotechnology, Padriciano 99, 34012 Trieste, Italy, University of Southampton School of Medicine, Division of Human Genetics, Southampton SO16 6YD, UK and Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
- *To whom correspondence should be addressed. +44 2380 796425+44 2380 794264
| |
Collapse
|
11
|
Wimmer K, Roca X, Beiglböck H, Callens T, Etzler J, Rao AR, Krainer AR, Fonatsch C, Messiaen L. Extensive in silico analysis of NF1 splicing defects uncovers determinants for splicing outcome upon 5' splice-site disruption. Hum Mutat 2007; 28:599-612. [PMID: 17311297 DOI: 10.1002/humu.20493] [Citation(s) in RCA: 99] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We describe 94 pathogenic NF1 gene alterations in a cohort of 97 Austrian neurofibromatosis type 1 patients meeting the NIH criteria. All mutations were fully characterized at the genomic and mRNA levels. Over half of the patients carried novel mutations, and only a quarter carried recurrent minor-lesion mutations at 16 mutational warm spots. The remaining patients carried NF1 microdeletions (7%) and rare recurring mutations. Thirty-six of the mutations (38%) altered pre-mRNA splicing, and fall into five groups: exon skipping resulting from mutations at authentic splice sites (type I), cryptic exon inclusion caused by deep intronic mutations (type II), creation of de novo splice sites causing loss of exonic sequences (type III), activation of cryptic splice sites upon authentic splice-site disruption (type IV), and exonic sequence alterations causing exon skipping (type V). Extensive in silico analyses of 37 NF1 exons and surrounding intronic sequences suggested that the availability of a cryptic splice site combined with a strong natural upstream 3' splice site (3'ss)is the main determinant of cryptic splice-site activation upon 5' splice-site disruption. Furthermore, the exonic sequences downstream of exonic cryptic 5' splice sites (5'ss) resemble intronic more than exonic sequences with respect to exonic splicing enhancer and silencer density, helping to distinguish between exonic cryptic and pseudo 5'ss. This study provides valuable predictors for the splicing pathway used upon 5'ss mutation, and underscores the importance of using RNA-based techniques, together with methods to identify microdeletions and intragenic copy-number changes, for effective and reliable NF1 mutation detection.
Collapse
Affiliation(s)
- K Wimmer
- Department of Medical Genetics, Medical University of Vienna, Vienna, Austria.
| | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Pertea M, Mount SM, Salzberg SL. A computational survey of candidate exonic splicing enhancer motifs in the model plant Arabidopsis thaliana. BMC Bioinformatics 2007; 8:159. [PMID: 17517127 PMCID: PMC1892810 DOI: 10.1186/1471-2105-8-159] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2006] [Accepted: 05/21/2007] [Indexed: 02/05/2023] Open
Abstract
Background Algorithmic approaches to splice site prediction have relied mainly on the consensus patterns found at the boundaries between protein coding and non-coding regions. However exonic splicing enhancers have been shown to enhance the utilization of nearby splice sites. Results We have developed a new computational technique to identify significantly conserved motifs involved in splice site regulation. First, 84 putative exonic splicing enhancer hexamers are identified in Arabidopsis thaliana. Then a Gibbs sampling program called ELPH was used to locate conserved motifs represented by these hexamers in exonic regions near splice sites in confirmed genes. Oligomers containing 35 of these motifs have been shown experimentally to induce significant inclusion of A. thaliana exons. Second, integration of our regulatory motifs into two different splice site recognition programs significantly improved the ability of the software to correctly predict splice sites in a large database of confirmed genes. We have released GeneSplicerESE, the improved splice site recognition code, as open source software. Conclusion Our results show that the use of the ESE motifs consistently improves splice site prediction accuracy.
Collapse
Affiliation(s)
- Mihaela Pertea
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA.
| | | | | |
Collapse
|
13
|
Leparc GG, Mitra RD. Non-EST-based prediction of novel alternatively spliced cassette exons with cell signaling function in Caenorhabditis elegans and human. Nucleic Acids Res 2007; 35:3192-202. [PMID: 17452356 PMCID: PMC1904267 DOI: 10.1093/nar/gkm187] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
To better understand the complex role that alternative splicing plays in intracellular signaling, it is important to catalog the numerous splice variants involved in signal transduction. Therefore, we developed PASE (Prediction of Alternative Signaling Exons), a computational tool to identify novel alternative cassette exons that code for kinase phosphorylation or signaling protein-binding sites. We first applied PASE to the Caenorhabditis elegans genome. In this organism, our algorithm had an overall specificity of ≥76.4%, including 33 novel cassette exons that we experimentally verified. We then used PASE to analyze the human genome and made 804 predictions, of which 308 were found as alternative exons in the transcript database. We experimentally tested 384 of the remaining unobserved predictions and discovered 26 novel human exons for a total specificity of ≥41.5% in human. By using a test set of known alternatively spliced signaling exons, we determined that the sensitivity of PASE is ∼70%. GO term analysis revealed that our exon predictions were found in the introns of known signal transduction genes more often than expected by chance, indicating PASE enriches for splice variants that function in signaling pathways. Overall, PASE was able to uncover 59 novel alternative cassette exons in C. elegans and humans through a genome-wide ab initio prediction method that enriches for exons involved in signaling.
Collapse
Affiliation(s)
| | - Robi David Mitra
- *To whom correspondence should be addressed. Tel: +1-314-362-2751; Fax: +1-314-362-2156;
| |
Collapse
|
14
|
Vorechovský I. Aberrant 3' splice sites in human disease genes: mutation pattern, nucleotide structure and comparison of computational tools that predict their utilization. Nucleic Acids Res 2006; 34:4630-41. [PMID: 16963498 PMCID: PMC1636351 DOI: 10.1093/nar/gkl535] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The frequency distribution of mutation-induced aberrant 3' splice sites (3'ss) in exons and introns is more complex than for 5' splice sites, largely owing to sequence constraints upstream of intron/exon boundaries. As a result, prediction of their localization remains a challenging task. Here, nucleotide sequences of previously reported 218 aberrant 3'ss activated by disease-causing mutations in 131 human genes were compared with their authentic counterparts using currently available splice site prediction tools. Each tested algorithm distinguished authentic 3'ss from cryptic sites more effectively than from de novo sites. The best discrimination between aberrant and authentic 3'ss was achieved by the maximum entropy model. Almost one half of aberrant 3'ss was activated by AG-creating mutations and approximately 95% of the newly created AGs were selected in vivo. The overall nucleotide structure upstream of aberrant 3'ss was characterized by higher purine content than for authentic sites, particularly in position -3, that may be compensated by more stringent requirements for positive and negative nucleotide signatures centred around position -11. A newly developed online database of aberrant 3'ss will facilitate identification of splicing mutations in a gene or phenotype of interest and future optimization of splice site prediction tools.
Collapse
Affiliation(s)
- Igor Vorechovský
- University of Southampton School of Medicine, Division of Human Genetics, Mailpoint 808, Southampton SO16 6YD, UK
| |
Collapse
|
15
|
Gundersen-Rindal DE, Pedroni MJ. Characterization and transcriptional analysis of protein tyrosine phosphatase genes and an ankyrin repeat gene of the parasitoid Glyptapanteles indiensis polydnavirus in the parasitized host. J Gen Virol 2006; 87:311-322. [PMID: 16432017 DOI: 10.1099/vir.0.81326-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Glyptapanteles indiensis (Braconidae, Hymenoptera) is an endoparasitoid of Lymantria dispar, the gypsy moth. Expression of G. indiensis polydnavirus (GiBV)-encoded genes within the pest host results in inhibition of immune response and development and alteration of physiology, enabling successful development of the parasitoid. Here, GiBV genome segment F (segF), an 18·6 kb segment shown to encode nine protein tyrosine phosphatase (PTP) genes and a single ankyrin repeat gene (ank), is analysed. PTPs have presumed function as regulators of signal transduction, while ankyrin repeat genes are hypothesized to function in inhibition of NF-κB signalling in the parasitized host. In this study, transcription of each gene was mapped by 5′- and 3′-RACE (rapid amplification of cDNA ends) and temporal and tissue-specific expression was examined in the parasitized host. For polydnavirus gene prediction in the parasitized host, no available gene prediction parameters were entirely precise. The mRNAs for each GiBV segF gene initiated between 30 and 112 bp upstream of the translation initiation codon. All were encoded in single open reading frames (ORFs), with the exception of PTP9, which was transcribed as a bicistronic message with the adjacent ank gene. RT-PCR indicated that all GiBV segF PTPs were expressed early in parasitization and, for most, expression was sustained over the course of at least 7 days after parasitization, suggesting importance in both early and sustained virus-induced immunosuppression and alteration of physiology. Tissue-specific patterns of PTP expression of GiBV segF genes were variable, suggesting differing roles in facilitating parasitism.
Collapse
Affiliation(s)
- D E Gundersen-Rindal
- US Department of Agriculture, Agricultural Research Service, Insect Biocontrol Laboratory, Bldg 011A, Room 214, BARC West, Beltsville, MD 20705, USA
| | - M J Pedroni
- US Department of Agriculture, Agricultural Research Service, Insect Biocontrol Laboratory, Bldg 011A, Room 214, BARC West, Beltsville, MD 20705, USA
| |
Collapse
|
16
|
Roca X, Sachidanandam R, Krainer AR. Determinants of the inherent strength of human 5' splice sites. RNA (NEW YORK, N.Y.) 2005; 11:683-98. [PMID: 15840817 PMCID: PMC1370755 DOI: 10.1261/rna.2040605] [Citation(s) in RCA: 104] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2005] [Accepted: 02/09/2005] [Indexed: 05/24/2023]
Abstract
We previously showed that the authentic 5' splice site (5'ss) of the first exon in the human beta-globin gene is intrinsically stronger than a cryptic 5'ss located 16 nucleotides upstream. Here we examined by mutational analysis the contribution of individual 5'ss nucleotides to discrimination between these two 5'ss. Based on the in vitro splicing efficiencies of a panel of 26 wild-type and mutant substrates in two separate 5'ss competition assays, we established a hierarchy of 5'ss and grouped them into three functional subclasses: strong, intermediate, and weak. Competition between two 5'ss from different subclasses always resulted in selection of the 5'ss that belongs to the stronger subclass. Moreover, each subclass has different characteristic features. Strong and intermediate 5'ss can be distinguished by their predicted free energy of base-pairing to the U1 snRNA 5' terminus (DeltaG). Whereas the extent of splicing via the strong 5'ss correlates well with the DeltaG, this is not the case for competition between intermediate 5'ss. Weak 5'ss were used only when the competing authentic 5'ss was inactivated by mutation. These results indicate that extensive complementarity to U1 snRNA exerts a dominant effect for 5'ss selection, but in the case of competing 5'ss with similarly modest complementarity to U1, the role of other 5'ss features is more prominent. This study reveals the importance of additional submotifs present in certain 5'ss sequences, whose characterization will be critical for understanding 5'ss selection in human genes.
Collapse
Affiliation(s)
- Xavier Roca
- Cold Spring Harbor Laboratory, PO Box 100, Cold Spring Harbor, NY 11724, USA
| | | | | |
Collapse
|
17
|
Oshiumi H, Tsujita T, Shida K, Matsumoto M, Ikeo K, Seya T. Prediction of the prototype of the human Toll-like receptor gene family from the pufferfish, Fugu rubripes, genome. Immunogenetics 2003; 54:791-800. [PMID: 12618912 DOI: 10.1007/s00251-002-0519-8] [Citation(s) in RCA: 253] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2002] [Revised: 10/23/2002] [Indexed: 11/29/2022]
Abstract
The insect Toll family of proteins and their mammalian counterparts seemingly shared one common ancestor and evolved independently. Here we demonstrated that the prototype of the mammalian-type (M-type) Toll family is shared by the fish and humans. According to the draft of the pufferfish Fugu genome project, the signature Toll-IL-1 receptor homology domain (TIR domain) has been conserved during evolution. FuguTLR2, 3, 5, 7, 8 and 9 members correspond structurally to respective mammalian TLRs. One Fugu TLR showed equally high amino acid identity to human TLR1, 6 and 10, and we named it FuguTLR1. Fugu rubripes has genes for TLR21 and 22, which are unique to fish. One possible interpretation of these findings is that TLR1, 2, 3, 4, 5, 7, 8, 9, 21 and 22 existed in the ancestral genome common to fish and mammals, and that TLR4 was lost in the fish lineage, while TLR21 and 22 were lost in the mammalian lineage. Strikingly, a solitary ascidian, Halocynthia roretzi, has only a few Toll-like proteins, which, like Caenorhabditis elegans Toll, represent primitive ones before the expansion of the Toll family. Therefore, the expansion of TLR genes should have occurred earlier than fish, but not C. intestinalis, separated evolutionarily from mammals. These results infer that the appearance of the M-type innate system was completed before or concomitant with the appearance of acquired immunity. We interpret the present data to mean that the differences of TLRs identified in this study between fishes and humans may be rather peripheral, partially due to selection pressure exerted by pathogens in distinct environments.
Collapse
Affiliation(s)
- Hiroyuki Oshiumi
- Department of Immunology, Osaka Medical Center for Cancer and Cardiovascular Diseases, Higashinari-ku, Japan
| | | | | | | | | | | |
Collapse
|
18
|
Golden TA, Schauer SE, Lang JD, Pien S, Mushegian AR, Grossniklaus U, Meinke DW, Ray A. SHORT INTEGUMENTS1/SUSPENSOR1/CARPEL FACTORY, a Dicer homolog, is a maternal effect gene required for embryo development in Arabidopsis. PLANT PHYSIOLOGY 2002; 130:808-22. [PMID: 12376646 PMCID: PMC166608 DOI: 10.1104/pp.003491] [Citation(s) in RCA: 103] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2002] [Revised: 04/03/2002] [Accepted: 06/24/2002] [Indexed: 05/18/2023]
Abstract
The importance of maternal cells in controlling early embryogenesis is well understood in animal development, yet in plants the precise role of maternal cells in embryogenesis is unclear. We demonstrated previously that maternal activity of the SIN1 (SHORT INTEGUMENTS1) gene of Arabidopsis is essential for embryo pattern formation and viability, and that its postembryonic activity is required for several processes in reproductive development, including flowering time control and ovule morphogenesis. Here, we report the cloning of SIN1, and demonstrate its identity to the CAF (CARPEL FACTORY) gene important for normal flower morphogenesis and to the SUS1 (SUSPENSOR1) gene essential for embryogenesis. SIN1/SUS1/CAF has sequence similarity to the Drosophila melanogaster gene Dicer, which encodes a multidomain ribonuclease specific for double-stranded RNA, first identified by its role in RNA silencing. The Dicer protein is essential for temporal control of development in animals, through the processing of small RNA hairpins that in turn inhibit the translation of target mRNAs. Structural modeling of the wild-type and sin1 mutant proteins indicates that the RNA helicase domain of SIN1/SUS1/CAF is important for function. The mRNA was detected in floral meristems, ovules, and early embryos, consistent with the mutant phenotypes. A 3.3-kb region 5' of the SIN1/SUS1/CAF gene shows asymmetric parent-of-origin activity in the embryo: It confers transcriptional activation of a reporter gene in early embryos only when transmitted through the maternal gamete. These results suggest that maternal SIN1/SUS1/CAF functions early in Arabidopsis development, presumably through posttranscriptional regulation of specific mRNA molecules.
Collapse
Affiliation(s)
- Teresa A Golden
- Department of Biology, University of Rochester, Rochester, NY 14627, USA
| | | | | | | | | | | | | | | |
Collapse
|
19
|
Lim LP, Burge CB. A computational analysis of sequence features involved in recognition of short introns. Proc Natl Acad Sci U S A 2001; 98:11193-8. [PMID: 11572975 PMCID: PMC58706 DOI: 10.1073/pnas.201407298] [Citation(s) in RCA: 258] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2001] [Accepted: 08/02/2001] [Indexed: 11/18/2022] Open
Abstract
Splicing of short introns by the nuclear pre-mRNA splicing machinery is thought to proceed via an "intron definition" mechanism, in which the 5' and 3' splice sites (5'ss, 3'ss, respectively) are initially recognized and paired across the intron. Here, we describe a computational analysis of sequence features involved in recognition of short introns by using available transcript data from five eukaryotes with complete or nearly complete genomic sequences. The information content of five different transcript features was measured by using methods from information theory, and Monte Carlo simulations were used to determine the amount of information required for accurate recognition of short introns in each organism. We conclude: (i) that short introns in Drosophila melanogaster and Caenorhabditis elegans contain essentially all of the information for their recognition by the splicing machinery, and computer programs that simulate splicing specificity can predict the exact boundaries of approximately 95% of short introns in both organisms; (ii) that in yeast, the 5'ss, branch signal, and 3'ss can accurately identify intron locations but do not precisely determine the location of 3' cleavage in every intron; and (iii) that the 5'ss, branch signal, and 3'ss are not sufficient to accurately identify short introns in plant and human transcripts, but that specific subsets of candidate intronic enhancer motifs can be identified in both human and Arabidopsis that contribute dramatically to the accuracy of splicing simulators.
Collapse
Affiliation(s)
- L P Lim
- Department of Biology and Center for Cancer Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | |
Collapse
|