1
|
Beckel MS, Kaufman B, Yanovsky M, Chernomoretz A. Conserved and divergent signals in 5' splice site sequences across fungi, metazoa and plants. PLoS Comput Biol 2023; 19:e1011540. [PMID: 37831726 PMCID: PMC10599564 DOI: 10.1371/journal.pcbi.1011540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 10/25/2023] [Accepted: 09/25/2023] [Indexed: 10/15/2023] Open
Abstract
In eukaryotic organisms the ensemble of 5' splice site sequences reflects the balance between natural nucleotide variability and minimal molecular constraints necessary to ensure splicing fidelity. This compromise shapes the underlying statistical patterns in the composition of donor splice site sequences. The scope of this study was to mine conserved and divergent signals in the composition of 5' splice site sequences. Because 5' donor sequences are a major cue for proper recognition of splice sites, we reasoned that statistical regularities in their composition could reflect the biological functionality and evolutionary history associated with splicing mechanisms. Results: We considered a regularized maximum entropy modeling framework to mine for non-trivial two-site correlations in donor sequence datasets corresponding to 30 different eukaryotes. For each analyzed species, we identified minimal sets of two-site coupling patterns that were able to replicate, at a given regularization level, the observed one-site and two-site frequencies in donor sequences. By performing a systematic and comparative analysis of 5'splice sites we showed that lineage information could be traced from joint di-nucleotide probabilities. We were able to identify characteristic two-site coupling patterns for plants and animals, and propose that they may echo differences in splicing regulation previously reported between these groups.
Collapse
Affiliation(s)
- Maximiliano S. Beckel
- Fundación Instituto Leloir, Buenos Aires, Argentina
- Instituto de Investigaciones Bioquímicas de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Bruno Kaufman
- Fundación Instituto Leloir, Buenos Aires, Argentina
- Instituto de Investigaciones Bioquímicas de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Marcelo Yanovsky
- Fundación Instituto Leloir, Buenos Aires, Argentina
- Instituto de Investigaciones Bioquímicas de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Ariel Chernomoretz
- Fundación Instituto Leloir, Buenos Aires, Argentina
- Departamento de Física, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Instituto de Física Interdisciplinaria y Aplicada (INFINA), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| |
Collapse
|
2
|
Sánchez IE, Galpern EA, Garibaldi MM, Ferreiro DU. Molecular Information Theory Meets Protein Folding. J Phys Chem B 2022; 126:8655-8668. [PMID: 36282961 DOI: 10.1021/acs.jpcb.2c04532] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
We propose an application of molecular information theory to analyze the folding of single domain proteins. We analyze results from various areas of protein science, such as sequence-based potentials, reduced amino acid alphabets, backbone configurational entropy, secondary structure content, residue burial layers, and mutational studies of protein stability changes. We found that the average information contained in the sequences of evolved proteins is very close to the average information needed to specify a fold ∼2.2 ± 0.3 bits/(site·operation). The effective alphabet size in evolved proteins equals the effective number of conformations of a residue in the compact unfolded state at around 5. We calculated an energy-to-information conversion efficiency upon folding of around 50%, lower than the theoretical limit of 70%, but much higher than human-built macroscopic machines. We propose a simple mapping between molecular information theory and energy landscape theory and explore the connections between sequence evolution, configurational entropy, and the energetics of protein folding.
Collapse
Affiliation(s)
- Ignacio E Sánchez
- Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos AiresCP1428, Argentina
| | - Ezequiel A Galpern
- Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos AiresCP1428, Argentina
| | - Martín M Garibaldi
- Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos AiresCP1428, Argentina
| | - Diego U Ferreiro
- Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos AiresCP1428, Argentina
| |
Collapse
|
3
|
Nonsense-associated altered splicing of MAP3K1 in two siblings with 46,XY disorders of sex development. Sci Rep 2020; 10:17375. [PMID: 33060765 PMCID: PMC7567082 DOI: 10.1038/s41598-020-74405-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Accepted: 09/29/2020] [Indexed: 01/31/2023] Open
Abstract
Although splicing errors due to single nucleotide variants represent a common cause of monogenic disorders, only a few variants have been shown to create new splice sites in exons. Here, we report an MAP3K1 splice variant identified in two siblings with 46,XY disorder of sex development. The patients carried a maternally derived c.2254C>T variant. The variant was initially recognized as a nonsense substitution leading to nonsense-mediated mRNA decay (p.Gln752Ter); however, RT-PCR for lymphoblastoid cell lines showed that this variant created a new splice donor site and caused 39 amino acid deletion (p.Gln752_Arg790del). All transcripts from the variant allele appeared to undergo altered splicing. The two patients exhibited undermasculinized genitalia with and without hypergonadotropism. Testosterone enanthate injections and dihydrotestosterone ointment applications yielded only slight increase in their penile length. Dihydrotestosterone-induced APOD transactivation was less significant in patients’ genital skin fibroblasts compared with that in control samples. This study provides an example of nonsense-associated altered splicing, in which a highly potent exonic splice site was created. Furthermore, our data, in conjunction with the previous data indicating the association between MAP3K1 and androgen receptor signaling, imply that the combination of testicular dysgenesis and androgen insensitivity may be a unique phenotype of MAP3K1 abnormalities.
Collapse
|
4
|
Transposon expression in the Drosophila brain is driven by neighboring genes and diversifies the neural transcriptome. Genome Res 2020; 30:1559-1569. [PMID: 32973040 PMCID: PMC7605248 DOI: 10.1101/gr.259200.119] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Accepted: 09/22/2020] [Indexed: 12/15/2022]
Abstract
Somatic transposon expression in neural tissue is commonly considered as a measure of mobilization and has therefore been linked to neuropathology and organismal individuality. We combined genome sequencing data with single-cell mRNA sequencing of the same inbred fly strain to map transposon expression in the Drosophila midbrain and found that transposon expression patterns are highly stereotyped. Every detected transposon is resident in at least one cellular gene with a matching expression pattern. Bulk RNA sequencing from fly heads of the same strain revealed that coexpression is a physical link in the form of abundant chimeric transposon-gene mRNAs. We identified 264 genes where transposons introduce cryptic splice sites into the nascent transcript and thereby significantly expand the neural transcript repertoire. Some genes exclusively produce chimeric mRNAs with transposon sequence; on average, 11.6% of the mRNAs produced from a given gene are chimeric. Conversely, most transposon-containing transcripts are chimeric, which suggests that somatic expression of these transposons is largely driven by cellular genes. We propose that chimeric mRNAs produced by alternative splicing into polymorphic transposons, rather than transposon mobilization, may contribute to functional differences between individual cells and animals.
Collapse
|
5
|
Collins RT, Coxam B, Fechner I, Unterweger I, Szymborska A, Meier K, Gerhardt H. Intron with transgenic marker (InTraM) facilitates high-throughput screening of endogenous gene reporter lines. Genesis 2020; 58:e23391. [PMID: 32783355 DOI: 10.1002/dvg.23391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 07/15/2020] [Accepted: 07/16/2020] [Indexed: 11/07/2022]
Abstract
The generation and maintenance of genome edited zebrafish lines is typically labor intensive due to the lack of an easy visual read-out for the modification. To facilitate this process, we have developed a novel method that relies on the inclusion of an artificial intron with a transgenic marker (InTraM) within the knock-in sequence of interest, which upon splicing produces a transcript with a precise and seamless modification. We have demonstrated this technology by replacing the stop codon of the zebrafish fli1a gene with a transcriptional activator KALTA4, using an InTraM that enables red fluorescent protein expression in the heart.
Collapse
Affiliation(s)
- Russell T Collins
- Integrative Vascular Biology Lab, Max-Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Baptiste Coxam
- Integrative Vascular Biology Lab, Max-Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
- DZHK (German Centre for Cardiovascular Research), partner site Berlin, Berlin, Germany
| | - Ines Fechner
- Integrative Vascular Biology Lab, Max-Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
- DZHK (German Centre for Cardiovascular Research), partner site Berlin, Berlin, Germany
| | - Iris Unterweger
- Integrative Vascular Biology Lab, Max-Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
- Novo Nordisk Foundation Center for Stem Cell Biology, Copenhagen, Denmark
| | - Anna Szymborska
- Integrative Vascular Biology Lab, Max-Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
- DZHK (German Centre for Cardiovascular Research), partner site Berlin, Berlin, Germany
| | - Katja Meier
- Integrative Vascular Biology Lab, Max-Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
- DZHK (German Centre for Cardiovascular Research), partner site Berlin, Berlin, Germany
| | - Holger Gerhardt
- Integrative Vascular Biology Lab, Max-Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
- DZHK (German Centre for Cardiovascular Research), partner site Berlin, Berlin, Germany
- Berlin Institute of Health (BIH), Berlin, Germany
| |
Collapse
|
6
|
Dehghannasiri R, Szabo L, Salzman J. Ambiguous splice sites distinguish circRNA and linear splicing in the human genome. Bioinformatics 2020; 35:1263-1268. [PMID: 30192918 DOI: 10.1093/bioinformatics/bty785] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Revised: 08/04/2018] [Accepted: 09/04/2018] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Identification of splice sites is critical to gene annotation and to determine which sequences control circRNA biogenesis. Full-length RNA transcripts could in principle complete annotations of introns and exons in genomes without external ontologies, i.e., ab initio. However, whether it is possible to reconstruct genomic positions where splicing occurs from full-length transcripts, even if sampled in the absence of noise, depends on the genome sequence composition. If it is not, there exist provable limits on the use of RNA-Seq to define splice locations (linear or circular) in the genome. RESULTS We provide a formal definition of splice site ambiguity due to the genomic sequence by introducing equivalent junction, which is the set of local genomic positions resulting in the same RNA sequence when joined through RNA splicing. We show that equivalent junctions are prevalent in diverse eukaryotic genomes and occur in 88.64% and 78.64% of annotated human splice sites in linear and circRNA junctions, respectively. The observed fractions of equivalent junctions and the frequency of many individual motifs are statistically significant when compared against the null distribution computed via simulation or closed-form. The frequency of equivalent junctions establishes a fundamental limit on the possibility of ab initio reconstruction of RNA transcripts without appealing to the ontology of "GT-AG" boundaries defining introns. Said differently, completely ab initio is impossible in the vast majority of splice sites in annotated circRNAs and linear transcripts. AVAILABILITY AND IMPLEMENTATION Two python scripts generating an equivalent junction sequence per junction are available at: https://github.com/salzmanlab/Equivalent-Junctions. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Linda Szabo
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Julia Salzman
- Department of Biochemistry, Stanford University, Stanford, CA, USA.,Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| |
Collapse
|
7
|
Schneider TD, Jejjala V. Restriction enzymes use a 24 dimensional coding space to recognize 6 base long DNA sequences. PLoS One 2019; 14:e0222419. [PMID: 31671158 PMCID: PMC6822723 DOI: 10.1371/journal.pone.0222419] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 08/29/2019] [Indexed: 11/19/2022] Open
Abstract
Restriction enzymes recognize and bind to specific sequences on invading bacteriophage DNA. Like a key in a lock, these proteins require many contacts to specify the correct DNA sequence. Using information theory we develop an equation that defines the number of independent contacts, which is the dimensionality of the binding. We show that EcoRI, which binds to the sequence GAATTC, functions in 24 dimensions. Information theory represents messages as spheres in high dimensional spaces. Better sphere packing leads to better communications systems. The densest known packing of hyperspheres occurs on the Leech lattice in 24 dimensions. We suggest that the single protein EcoRI molecule employs a Leech lattice in its operation. Optimizing density of sphere packing explains why 6 base restriction enzymes are so common.
Collapse
Affiliation(s)
- Thomas D. Schneider
- National Institutes of Health, National Cancer Institute, Center for Cancer Research, RNA Biology Laboratory, Frederick, Maryland, United States of America
| | - Vishnu Jejjala
- Mandelstam Institute for Theoretical Physics, School of Physics, NITheP, and CoE-MaSS, University of the Witwatersrand, Johannesburg, South Africa
- David Rittenhouse Laboratory, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
8
|
Xu Z, Lo WS, Beck DB, Schuch LA, Oláhová M, Kopajtich R, Chong YE, Alston CL, Seidl E, Zhai L, Lau CF, Timchak D, LeDuc CA, Borczuk AC, Teich AF, Juusola J, Sofeso C, Müller C, Pierre G, Hilliard T, Turnpenny PD, Wagner M, Kappler M, Brasch F, Bouffard JP, Nangle LA, Yang XL, Zhang M, Taylor RW, Prokisch H, Griese M, Chung WK, Schimmel P. Bi-allelic Mutations in Phe-tRNA Synthetase Associated with a Multi-system Pulmonary Disease Support Non-translational Function. Am J Hum Genet 2018; 103:100-114. [PMID: 29979980 PMCID: PMC6035289 DOI: 10.1016/j.ajhg.2018.06.006] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Accepted: 06/12/2018] [Indexed: 11/16/2022] Open
Abstract
The tRNA synthetases catalyze the first step of protein synthesis and have increasingly been studied for their nuclear and extra-cellular ex-translational activities. Human genetic conditions such as Charcot-Marie-Tooth have been attributed to dominant gain-of-function mutations in some tRNA synthetases. Unlike dominantly inherited gain-of-function mutations, recessive loss-of-function mutations can potentially elucidate ex-translational activities. We present here five individuals from four families with a multi-system disease associated with bi-allelic mutations in FARSB that encodes the beta chain of the alpha2beta2 phenylalanine-tRNA synthetase (FARS). Collectively, the mutant alleles encompass a 5'-splice junction non-coding variant (SJV) and six missense variants, one of which is shared by unrelated individuals. The clinical condition is characterized by interstitial lung disease, cerebral aneurysms and brain calcifications, and cirrhosis. For the SJV, we confirmed exon skipping leading to a frameshift associated with noncatalytic activity. While the bi-allelic combination of the SJV with a p.Arg305Gln missense mutation in two individuals led to severe disease, cells from neither the asymptomatic heterozygous carriers nor the compound heterozygous affected individual had any defect in protein synthesis. These results support a disease mechanism independent of tRNA synthetase activities in protein translation and suggest that this FARS activity is essential for normal function in multiple organs.
Collapse
Affiliation(s)
- Zhiwen Xu
- IAS HKUST - Scripps R&D Laboratory, Institute for Advanced Study, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China; Pangu Biopharma, Edinburgh Tower, The Landmark, 15 Queen's Road Central, Hong Kong, China; aTyr Pharma, 3545 John Hopkins Court, Suite 250, San Diego, CA 92121, USA
| | - Wing-Sze Lo
- IAS HKUST - Scripps R&D Laboratory, Institute for Advanced Study, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China; Pangu Biopharma, Edinburgh Tower, The Landmark, 15 Queen's Road Central, Hong Kong, China
| | - David B Beck
- Department of Medicine, Columbia University, New York, NY 10032, USA; National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Luise A Schuch
- Dr. von Hauner Children's Hospital, Division of Pediatric Pneumology, University Hospital Munich, German Center for Lung Research (DZL), Lindwurmstr. 4, 80337 München, Germany
| | - Monika Oláhová
- Wellcome Centre for Mitochondrial Research, Institute of Neuroscience, The Medical School, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Robert Kopajtich
- Institute of Human Genetics, Technical University Munich, 81675 Munich, Germany; Institute of Human Genetics, Helmholtz Zentrum München, Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Yeeting E Chong
- aTyr Pharma, 3545 John Hopkins Court, Suite 250, San Diego, CA 92121, USA
| | - Charlotte L Alston
- Wellcome Centre for Mitochondrial Research, Institute of Neuroscience, The Medical School, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Elias Seidl
- Dr. von Hauner Children's Hospital, Division of Pediatric Pneumology, University Hospital Munich, German Center for Lung Research (DZL), Lindwurmstr. 4, 80337 München, Germany
| | - Liting Zhai
- IAS HKUST - Scripps R&D Laboratory, Institute for Advanced Study, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| | - Ching-Fun Lau
- IAS HKUST - Scripps R&D Laboratory, Institute for Advanced Study, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China; Pangu Biopharma, Edinburgh Tower, The Landmark, 15 Queen's Road Central, Hong Kong, China
| | - Donna Timchak
- Department of Pediatrics, Columbia University, New York, NY 10032, USA; Goryeb Children's Hospital, Atlantic Health System, Morristown, NJ 07960, USA
| | - Charles A LeDuc
- Department of Pediatrics, Columbia University, New York, NY 10032, USA
| | - Alain C Borczuk
- Department of Pathology, Weill Cornell Medicine, New York, NY 10065, USA
| | - Andrew F Teich
- Department of Pathology and Cell Biology, Columbia University, New York, NY 10032, USA; Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University, New York, NY 10032, USA
| | | | - Christina Sofeso
- Center for Human Genetics and Laboratory Diagnostics (AHC) Dr. Klein, Dr. Rost and Colleagues, Lochhamer Str. 29, 82152 Martinsried, Germany
| | - Christoph Müller
- Department of Pediatrics and Adolescent Medicine, University Medical Center, Medical Faculty, University of Freiburg, 79085 Freiburg, Germany
| | - Germaine Pierre
- Bristol Royal Hospital for Children, University Hospitals Bristol NHS Foundation Trust, Bristol BS2 8BJ, UK
| | - Tom Hilliard
- Bristol Royal Hospital for Children, University Hospitals Bristol NHS Foundation Trust, Bristol BS2 8BJ, UK
| | | | - Matias Wagner
- Institute of Human Genetics, Technical University Munich, 81675 Munich, Germany; Institute of Human Genetics, Helmholtz Zentrum München, Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Ingolstädter Landstr. 1, 85764 Neuherberg, Germany; Institut für Neurogenomik, Helmholtz Zentrum München, Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Matthias Kappler
- Dr. von Hauner Children's Hospital, Division of Pediatric Pneumology, University Hospital Munich, German Center for Lung Research (DZL), Lindwurmstr. 4, 80337 München, Germany
| | - Frank Brasch
- Klinikum Bielefeld Mitte, Institute for Pathology, Teutoburger Straße 50, 33604 Bielefeld, Germany
| | - John Paul Bouffard
- Department Pathology, Morristown Memorial Hospital, Morristown, NJ 07960, USA
| | - Leslie A Nangle
- aTyr Pharma, 3545 John Hopkins Court, Suite 250, San Diego, CA 92121, USA
| | - Xiang-Lei Yang
- IAS HKUST - Scripps R&D Laboratory, Institute for Advanced Study, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China; The Scripps Laboratories for tRNA Synthetase Research, The Scripps Research Institute, 10650 North Torrey Pines Road, La Jolla, CA 92037, USA; Department of Molecular Medicine, The Scripps Research Insitute, La Jolla, CA 92037, USA
| | - Mingjie Zhang
- IAS HKUST - Scripps R&D Laboratory, Institute for Advanced Study, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China; Division of Life Science, State Key Laboratory of Molecular Neuroscience, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| | - Robert W Taylor
- Wellcome Centre for Mitochondrial Research, Institute of Neuroscience, The Medical School, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Holger Prokisch
- Institute of Human Genetics, Technical University Munich, 81675 Munich, Germany; Institute of Human Genetics, Helmholtz Zentrum München, Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Matthias Griese
- Dr. von Hauner Children's Hospital, Division of Pediatric Pneumology, University Hospital Munich, German Center for Lung Research (DZL), Lindwurmstr. 4, 80337 München, Germany
| | - Wendy K Chung
- Department of Medicine, Columbia University, New York, NY 10032, USA; Department of Pediatrics, Columbia University, New York, NY 10032, USA.
| | - Paul Schimmel
- IAS HKUST - Scripps R&D Laboratory, Institute for Advanced Study, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China; The Scripps Laboratories for tRNA Synthetase Research, The Scripps Research Institute, 10650 North Torrey Pines Road, La Jolla, CA 92037, USA; The Scripps Laboratories for tRNA Synthetase Research, Scripps Florida, 130 Scripps Way, Jupiter, FL 33458, USA.
| |
Collapse
|
9
|
Zhang S, Samocha KE, Rivas MA, Karczewski KJ, Daly E, Schmandt B, Neale BM, MacArthur DG, Daly MJ. Base-specific mutational intolerance near splice sites clarifies the role of nonessential splice nucleotides. Genome Res 2018; 28:968-974. [PMID: 29858273 PMCID: PMC6028136 DOI: 10.1101/gr.231902.117] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 05/31/2018] [Indexed: 12/20/2022]
Abstract
Variation in RNA splicing (i.e., alternative splicing) plays an important role in many diseases. Variants near 5' and 3' splice sites often affect splicing, but the effects of these variants on splicing and disease have not been fully characterized beyond the two "essential" splice nucleotides flanking each exon. Here we provide quantitative measurements of tolerance to mutational disruptions by position and reference allele-alternative allele combinations. We show that certain reference alleles are particularly sensitive to mutations, regardless of the alternative alleles into which they are mutated. Using public RNA-seq data, we demonstrate that individuals carrying such variants have significantly lower levels of the correctly spliced transcript, compared to individuals without them, and confirm that these specific substitutions are highly enriched for known Mendelian mutations. Our results propose a more refined definition of the "splice region" and offer a new way to prioritize and provide functional interpretation of variants identified in diagnostic sequencing and association studies.
Collapse
Affiliation(s)
- Sidi Zhang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Kaitlin E Samocha
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, Massachusetts 02115, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Manuel A Rivas
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Konrad J Karczewski
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Emma Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Ben Schmandt
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Benjamin M Neale
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Daniel G MacArthur
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Mark J Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Institute for Molecular Medicine Finland (FIMM), 00290 Helsinki, Finland
| |
Collapse
|
10
|
Zuallaert J, Godin F, Kim M, Soete A, Saeys Y, De Neve W. SpliceRover: interpretable convolutional neural networks for improved splice site prediction. Bioinformatics 2018; 34:4180-4188. [DOI: 10.1093/bioinformatics/bty497] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Accepted: 06/19/2018] [Indexed: 11/13/2022] Open
Affiliation(s)
- Jasper Zuallaert
- Center for Biotech Data Science, Department of Environmental Technology, Food Technology and Molecular Biotechnology, Ghent University Global Campus, Songdo, Incheon, South Korea
- IDLab, Department for Electronics and Information Systems, Ghent University, Ghent, Belgium
| | - Fréderic Godin
- IDLab, Department for Electronics and Information Systems, Ghent University, Ghent, Belgium
| | - Mijung Kim
- Center for Biotech Data Science, Department of Environmental Technology, Food Technology and Molecular Biotechnology, Ghent University Global Campus, Songdo, Incheon, South Korea
- IDLab, Department for Electronics and Information Systems, Ghent University, Ghent, Belgium
| | - Arne Soete
- Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium
- Data Mining and Modeling for Biomedicine, VIB Inflammation Research Center, Ghent, Belgium
| | - Yvan Saeys
- Data Mining and Modeling for Biomedicine, VIB Inflammation Research Center, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Wesley De Neve
- Center for Biotech Data Science, Department of Environmental Technology, Food Technology and Molecular Biotechnology, Ghent University Global Campus, Songdo, Incheon, South Korea
- IDLab, Department for Electronics and Information Systems, Ghent University, Ghent, Belgium
| |
Collapse
|
11
|
A novel pathogenic splice acceptor site germline mutation in intron 14 of the APC gene in a Chinese family with familial adenomatous polyposis. Oncotarget 2017; 8:21327-21335. [PMID: 28423518 PMCID: PMC5400587 DOI: 10.18632/oncotarget.15570] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Accepted: 01/27/2017] [Indexed: 11/25/2022] Open
Abstract
Familial adenomatous polyposis (FAP) is an autosomal dominant precancerous condition, clinically characterized by the presence of multiple colorectal adenomas or polyps. Patients with FAP has a high risk of developing colorectal cancer (CRC) from these colorectal adenomatous polyps by the mean age of diagnosis at 40 years. Germline mutations of the APC gene cause familial adenomatous polyposis (FAP). Colectomy has recommended for the FAP patients with significant polyposis. Here, we present a clinical molecular study of a four generation Chinese family with FAP. Clinical diagnosis of FAP has been done according to the phenotype, family history and medical records. Patient's blood samples were collected and genomic DNA was extracted. In order to identify the pathogenic mutation underlying the disease phenotype targeted next-generation sequencing and confirmatory sanger sequencing has undertaken. Targeted next generation sequencing identified a novel heterozygous splice-acceptor site mutation [c.1744-1G>A] in intron 14 of APC gene, which is co-segregated with the FAP phenotypes in the proband and amongst all the affected family members. This mutation is not present in unaffected family members and in normal healthy controls of same ethnic origin. According to the LOVD database for Chinese colorectal cancer patients, in Chinese population, 60% of the previously reported APC gene mutations causes FAP, are missense mutations. This novel splice-acceptor site mutation causing FAP in this Chinese family expands the germline mutation spectrum of the APC gene in the Chinese population.
Collapse
|
12
|
Sanz DJ, Hollywood JA, Scallan MF, Harrison PT. Cas9/gRNA targeted excision of cystic fibrosis-causing deep-intronic splicing mutations restores normal splicing of CFTR mRNA. PLoS One 2017; 12:e0184009. [PMID: 28863137 PMCID: PMC5581164 DOI: 10.1371/journal.pone.0184009] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Accepted: 08/16/2017] [Indexed: 12/27/2022] Open
Abstract
Cystic Fibrosis is an autosomal recessive disorder caused by mutations in the CFTR gene. CRISPR mediated, template-dependent homology-directed gene editing has been used to correct the most common mutation, c.1521_1523delCTT / p.Phe508del (F508del) which affects ~70% of individuals, but the efficiency was relatively low. Here, we describe a high efficiency strategy for editing of three different rare CFTR mutations which together account for about 3% of individuals with Cystic Fibrosis. The mutations cause aberrant splicing of CFTR mRNA due to the creation of cryptic splice signals that result in the formation of pseudoexons containing premature stop codons c.1679+1634A>G (1811+1.6kbA>G) and c.3718-2477C>T (3849+10kbC>T), or an out-of-frame 5' extension to an existing exon c.3140-26A>G (3272-26A>G). We designed pairs of Cas9 guide RNAs to create targeted double-stranded breaks in CFTR either side of each mutation which resulted in high efficiency excision of the target genomic regions via non-homologous end-joining repair. When evaluated in a mini-gene splicing assay, we showed that targeted excision restored normal splicing for all three mutations. This approach could be used to correct aberrant splicing signals or remove disruptive transcription regulatory motifs caused by deep-intronic mutations in a range of other genetic disorders.
Collapse
Affiliation(s)
- David J. Sanz
- Department of Physiology, BioSciences Institute, University College Cork, Cork, Ireland
| | - Jennifer A. Hollywood
- Department of Physiology, BioSciences Institute, University College Cork, Cork, Ireland
- School of Microbiology, University College Cork, Cork, Ireland
| | | | - Patrick T. Harrison
- Department of Physiology, BioSciences Institute, University College Cork, Cork, Ireland
| |
Collapse
|
13
|
The roles of RNA processing in translating genotype to phenotype. NATURE REVIEWS. MOLECULAR CELL BIOLOGY 2016. [PMID: 27847391 DOI: 10.1038/nrm.2016.139.] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
A goal of human genetics studies is to determine the mechanisms by which genetic variation produces phenotypic differences that affect human health. Efforts in this respect have previously focused on genetic variants that affect mRNA levels by altering epigenetic and transcriptional regulation. Recent studies show that genetic variants that affect RNA processing are at least equally as common as, and are largely independent from, those variants that affect transcription. We highlight the impact of genetic variation on pre-mRNA splicing and polyadenylation, and on the stability, translation and structure of mRNAs as mechanisms that produce phenotypic traits. These results emphasize the importance of including RNA processing signals in analyses to identify functional variants.
Collapse
|
14
|
Manning KS, Cooper TA. The roles of RNA processing in translating genotype to phenotype. Nat Rev Mol Cell Biol 2016; 18:102-114. [PMID: 27847391 DOI: 10.1038/nrm.2016.139] [Citation(s) in RCA: 139] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
A goal of human genetics studies is to determine the mechanisms by which genetic variation produces phenotypic differences that affect human health. Efforts in this respect have previously focused on genetic variants that affect mRNA levels by altering epigenetic and transcriptional regulation. Recent studies show that genetic variants that affect RNA processing are at least equally as common as, and are largely independent from, those variants that affect transcription. We highlight the impact of genetic variation on pre-mRNA splicing and polyadenylation, and on the stability, translation and structure of mRNAs as mechanisms that produce phenotypic traits. These results emphasize the importance of including RNA processing signals in analyses to identify functional variants.
Collapse
Affiliation(s)
- Kassie S Manning
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas 77030, USA.,Integrative Molecular and Biomedical Sciences Program, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Thomas A Cooper
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas 77030, USA.,Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas 77030, USA.,Department of Molecular Physiology and Biophysics, Baylor College of Medicine, Houston, Texas 77030, USA.,Integrative Molecular and Biomedical Sciences Program, Baylor College of Medicine, Houston, Texas 77030, USA
| |
Collapse
|
15
|
Leigh S, Futema M, Whittall R, Taylor-Beadling A, Williams M, den Dunnen JT, Humphries SE. The UCL low-density lipoprotein receptor gene variant database: pathogenicity update. J Med Genet 2016; 54:217-223. [PMID: 27821657 PMCID: PMC5502305 DOI: 10.1136/jmedgenet-2016-104054] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Accepted: 10/06/2016] [Indexed: 12/04/2022]
Abstract
Background Familial hypercholesterolaemia (OMIM 143890) is most frequently caused by variations in the low-density lipoprotein receptor (LDLR) gene. Predicting whether novel variants are pathogenic may not be straightforward, especially for missense and synonymous variants. In 2013, the Association of Clinical Genetic Scientists published guidelines for the classification of variants, with categories 1 and 2 representing clearly not or unlikely pathogenic, respectively, 3 representing variants of unknown significance (VUS), and 4 and 5 representing likely to be or clearly pathogenic, respectively. Here, we update the University College London (UCL) LDLR variant database according to these guidelines. Methods PubMed searches and alerts were used to identify novel LDLR variants for inclusion in the database. Standard in silico tools were used to predict potential pathogenicity. Variants were designated as class 4/5 only when the predictions from the different programs were concordant and as class 3 when predictions were discordant. Results The updated database (http://www.lovd.nl/LDLR) now includes 2925 curated variants, representing 1707 independent events. All 129 nonsense variants, 337 small frame-shifting and 117/118 large rearrangements were classified as 4 or 5. Of the 795 missense variants, 115 were in classes 1 and 2, 605 in class 4 and 75 in class 3. 111/181 intronic variants, 4/34 synonymous variants and 14/37 promoter variants were assigned to classes 4 or 5. Overall, 112 (7%) of reported variants were class 3. Conclusions This study updates the LDLR variant database and identifies a number of reported VUS where additional family and in vitro studies will be required to confirm or refute their pathogenicity.
Collapse
Affiliation(s)
- Sarah Leigh
- Centre for Cardiovascular Genetics, Institute of Cardiovascular Sciences, University College London, London, UK
| | - Marta Futema
- Centre for Cardiovascular Genetics, Institute of Cardiovascular Sciences, University College London, London, UK
| | - Ros Whittall
- Centre for Cardiovascular Genetics, Institute of Cardiovascular Sciences, University College London, London, UK
| | | | - Maggie Williams
- Bristol Genetics Laboratory, Pathology Sciences, Blood Sciences and Bristol Genetics, Southmead Hospital, Bristol, UK
| | - Johan T den Dunnen
- Clinical Genetics and Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Steve E Humphries
- Centre for Cardiovascular Genetics, Institute of Cardiovascular Sciences, University College London, London, UK
| |
Collapse
|
16
|
Molecular characterization of novel splice site mutation causing protein C deficiency. Blood Coagul Fibrinolysis 2015; 27:585-8. [PMID: 26656900 DOI: 10.1097/mbc.0000000000000490] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Congenital protein C deficiency is an inherited coagulation disorder associated with an elevated risk of venous thromboembolism. A Saudi Arabian male from a consanguineous family was admitted to neonatal intensive care unit in his first days of life because of transient tachypnea and hematuria. Laboratory investigations determined low platelet and protein C deficiency. Direct sequencing of PROC gene and RNA analysis were performed. Analysis of factor V Leiden (G1691A) and factor II (G20210A) mutations was also done. Novel homozygous splice site mutation c.796+3A>T was detected in the index case and segregation was confirmed in the family. RNA analysis revealed the pathogenicity of the mutation by skipping exon 8 of PROC gene and changing the donor splice site of the exon. Detection of the molecular cause of protein C deficiency reduces life threatening and facilitates inductive carrier testing, prenatal and preimplantation genetic diagnosis for families.
Collapse
|
17
|
Fabian P, Kozmikova I, Kozmik Z, Pantzartzi CN. Pax2/5/8 and Pax6 alternative splicing events in basal chordates and vertebrates: a focus on paired box domain. Front Genet 2015; 6:228. [PMID: 26191073 PMCID: PMC4488758 DOI: 10.3389/fgene.2015.00228] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Accepted: 06/15/2015] [Indexed: 12/19/2022] Open
Abstract
Paired box transcription factors play important role in development and tissue morphogenesis. The number of Pax homologs varies among species studied so far, due to genome and gene duplications that have affected PAX family to a great extent. Based on sequence similarity and functional domains, four Pax classes have been identified in chordates, namely Pax1/9, Pax2/5/8, Pax3/7, and Pax4/6. Numerous splicing events have been reported mainly for Pax2/5/8 and Pax6 genes. Of significant interest are those events that lead to Pax proteins with presumed novel properties, such as altered DNA-binding or transcriptional activity. In the current study, a thorough analysis of Pax2/5/8 splicing events from cephalochordates and vertebrates was performed. We focused more on Pax2/5/8 and Pax6 splicing events in which the paired domain is involved. Three new splicing events were identified in Oryzias latipes, one of which seems to be conserved in Acanthomorphata. Using representatives from deuterostome and protostome phyla, a comparative analysis of the Pax6 exon-intron structure of the paired domain was performed, during an attempt to estimate the time of appearance of the Pax6(5a) mRNA isoform. As shown in our analysis, this splicing event is characteristic of Gnathostomata and is absent in the other chordate subphyla. Moreover, expression pattern of alternative spliced variants was compared between cephalochordates and fish species. In summary, our data indicate expansion of alternative mRNA variants in paired box region of Pax2/5/8 and Pax6 genes during the course of vertebrate evolution.
Collapse
Affiliation(s)
- Peter Fabian
- Department of Transcriptional Regulation, Institute of Molecular Genetics Prague, Czech Republic
| | - Iryna Kozmikova
- Department of Transcriptional Regulation, Institute of Molecular Genetics Prague, Czech Republic
| | - Zbynek Kozmik
- Department of Transcriptional Regulation, Institute of Molecular Genetics Prague, Czech Republic
| | - Chrysoula N Pantzartzi
- Department of Transcriptional Regulation, Institute of Molecular Genetics Prague, Czech Republic
| |
Collapse
|
18
|
Schneider TD. Twenty Years of Delila and Molecular Information Theory: The Altenberg-Austin Workshop in Theoretical Biology Biological Information, Beyond Metaphor: Causality, Explanation, and Unification Altenberg, Austria, 11-14 July 2002. ACTA ACUST UNITED AC 2015; 1:250-260. [PMID: 18084638 DOI: 10.1162/biot.2006.1.3.250] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
A brief personal history is given about how information theory can be applied to binding sites of genetic control molecules on nucleic acids. The primary example used is ribosome binding sites in Escherichia coli. Once the sites are aligned, the information needed to describe the sites can be computed using Claude Shannon's method. This is displayed by a computer graphic called a sequence logo. The logo represents an average binding site, and the mathematics easily allows one to determine the components of this average. That is, given a set of binding sites, the information for individual binding sites can also be computed. One can go further and predict the information of sites that are not in the original data set. Information theory also allows one to model the flexibility of ribosome binding sites, and this led us to a simple model for ribosome translational initiation in which the molecular components fit together only when the ribosome is at a good ribosome binding site. Since information theory is general, the same mathematics applies to human splice junctions, where we can predict the effect of sequence changes that cause human genetic diseases and cancer. The second example given is the Pribnow 'box' which, when viewed by the information theory method, reveals a mechanism for initiation of both transcription and DNA replication. Replication, transcription, splicing, and translation into protein represent the central dogma, so these examples show how molecular information theory is contributing to our knowledge of basic biology.
Collapse
Affiliation(s)
- Thomas D Schneider
- National Cancer Institute at Frederick, Laboratory of Experimental and Computational Biology, P. O. Box B, Frederick, MD 21702-1201. (301) 846-5581 (-5532 for messages), fax: (301) 846-5598, . http://www.lecb.ncifcrf.gov/ toms/
| |
Collapse
|
19
|
Caminsky NG, Mucaki EJ, Rogan PK. Interpretation of mRNA splicing mutations in genetic disease: review of the literature and guidelines for information-theoretical analysis. F1000Res 2015. [DOI: 10.12688/f1000research.5654.2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The interpretation of genomic variants has become one of the paramount challenges in the post-genome sequencing era. In this review we summarize nearly 20 years of research on the applications of information theory (IT) to interpret coding and non-coding mutations that alter mRNA splicing in rare and common diseases. We compile and summarize the spectrum of published variants analyzed by IT, to provide a broad perspective of the distribution of deleterious natural and cryptic splice site variants detected, as well as those affecting splicing regulatory sequences. Results for natural splice site mutations can be interrogated dynamically with Splicing Mutation Calculator, a companion software program that computes changes in information content for any splice site substitution, linked to corresponding publications containing these mutations. The accuracy of IT-based analysis was assessed in the context of experimentally validated mutations. Because splice site information quantifies binding affinity, IT-based analyses can discern the differences between variants that account for the observed reduced (leaky) versus abolished mRNA splicing. We extend this principle by comparing predicted mutations in natural, cryptic, and regulatory splice sites with observed deleterious phenotypic and benign effects. Our analysis of 1727 variants revealed a number of general principles useful for ensuring portability of these analyses and accurate input and interpretation of mutations. We offer guidelines for optimal use of IT software for interpretation of mRNA splicing mutations.
Collapse
|
20
|
Pino S, Sponer JE, Costanzo G, Saladino R, Mauro ED. From formamide to RNA, the path is tenuous but continuous. Life (Basel) 2015; 5:372-84. [PMID: 25647486 PMCID: PMC4390857 DOI: 10.3390/life5010372] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Revised: 01/20/2015] [Accepted: 01/22/2015] [Indexed: 01/11/2023] Open
Abstract
Reactions of formamide (NH2COH) in the presence of catalysts of both terrestrial and meteoritic origin yield, in plausible and variegated conditions, a large panel of precursors of (pre)genetic and (pre)metabolic interest. Formamide chemistry potentially satisfies all of the steps from the very initial precursors to RNA. Water chemistry enters the scene in RNA non-enzymatic synthesis and recombination.
Collapse
Affiliation(s)
- Samanta Pino
- Fondazione "Istituto Pasteur-Fondazione Cenci-Bolognetti" c/o Dipartimento di Biologia e Biotecnologie "Charles Darwin", "Sapienza" Università di Roma, P.le Aldo Moro, 5, 00185 Rome, Italy.
| | - Judit E Sponer
- Institute of Biophysics, Academy of Sciences of the Czech Republic, Královopolská 135, 61265 Brno, Czech Republic.
- CEITEC-Central European Institute of Technology, Masaryk University, Campus Bohunice, Kamenice 5, CZ-62500 Brno, Czech Republic.
| | - Giovanna Costanzo
- Istituto di Biologia e Patologia Molecolari, CNR, P.le Aldo Moro, 5, 00185 Rome, Italy.
| | - Raffaele Saladino
- Dipartimento di Scienze Ecologiche e Biologiche Università della Tuscia Via San Camillo De Lellis, 01100 Viterbo, Italy.
| | - Ernesto Di Mauro
- Fondazione "Istituto Pasteur-Fondazione Cenci-Bolognetti" c/o Dipartimento di Biologia e Biotecnologie "Charles Darwin", "Sapienza" Università di Roma, P.le Aldo Moro, 5, 00185 Rome, Italy.
| |
Collapse
|
21
|
Caminsky N, Mucaki EJ, Rogan PK. Interpretation of mRNA splicing mutations in genetic disease: review of the literature and guidelines for information-theoretical analysis. F1000Res 2014; 3:282. [PMID: 25717368 PMCID: PMC4329672 DOI: 10.12688/f1000research.5654.1] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/10/2014] [Indexed: 12/14/2022] Open
Abstract
The interpretation of genomic variants has become one of the paramount challenges in the post-genome sequencing era. In this review we summarize nearly 20 years of research on the applications of information theory (IT) to interpret coding and non-coding mutations that alter mRNA splicing in rare and common diseases. We compile and summarize the spectrum of published variants analyzed by IT, to provide a broad perspective of the distribution of deleterious natural and cryptic splice site variants detected, as well as those affecting splicing regulatory sequences. Results for natural splice site mutations can be interrogated dynamically with Splicing Mutation Calculator, a companion software program that computes changes in information content for any splice site substitution, linked to corresponding publications containing these mutations. The accuracy of IT-based analysis was assessed in the context of experimentally validated mutations. Because splice site information quantifies binding affinity, IT-based analyses can discern the differences between variants that account for the observed reduced (leaky) versus abolished mRNA splicing. We extend this principle by comparing predicted mutations in natural, cryptic, and regulatory splice sites with observed deleterious phenotypic and benign effects. Our analysis of 1727 variants revealed a number of general principles useful for ensuring portability of these analyses and accurate input and interpretation of mutations. We offer guidelines for optimal use of IT software for interpretation of mRNA splicing mutations.
Collapse
Affiliation(s)
- Natasha Caminsky
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON, N6A 2C1, Canada
| | - Eliseos J Mucaki
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON, N6A 2C1, Canada
| | - Peter K Rogan
- Departments of Biochemistry and Computer Science, Western University, London, ON, N6A 2C1, Canada
| |
Collapse
|
22
|
Fiorentino A, O'Brien NL, Locke DP, McQuillin A, Jarram A, Anjorin A, Kandaswamy R, Curtis D, Blizard RA, Gurling HMD. Analysis of ANK3 and CACNA1C variants identified in bipolar disorder whole genome sequence data. Bipolar Disord 2014; 16:583-91. [PMID: 24716743 PMCID: PMC4227602 DOI: 10.1111/bdi.12203] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/19/2013] [Accepted: 12/27/2013] [Indexed: 12/30/2022]
Abstract
OBJECTIVES Genetic markers in the genes encoding ankyrin 3 (ANK3) and the α-calcium channel subunit (CACNA1C) are associated with bipolar disorder (BP). The associated variants in the CACNA1C gene are mainly within intron 3 of the gene. ANK3 BP-associated variants are in two distinct clusters at the ends of the gene, indicating disease allele heterogeneity. METHODS In order to screen both coding and non-coding regions to identify potential aetiological variants, we used whole-genome sequencing in 99 BP cases. Variants with markedly different allele frequencies in the BP samples and the 1,000 genomes project European data were genotyped in 1,510 BP cases and 1,095 controls. RESULTS We found that the CACNA1C intron 3 variant, rs79398153, potentially affecting an ENCyclopedia of DNA Elements (ENCODE)-defined region, showed an association with BP (p = 0.015). We also found the ANK3 BP-associated variant rs139972937, responsible for an asparagine to serine change (p = 0.042). However, a previous study had not found support for an association between rs139972937 and BP. The variants at ANK3 and CACNA1C previously known to be associated with BP were not in linkage disequilibrium with either of the two variants that we identified and these are therefore independent of the previous haplotypes implicated by genome-wide association. CONCLUSIONS Sequencing in additional BP samples is needed to find the molecular pathology that explains the previous association findings. If changes similar to those we have found can be shown to have an effect on the expression and function of ANK3 and CACNA1C, they might help to explain the so-called 'missing heritability' of BP.
Collapse
Affiliation(s)
- Alessia Fiorentino
- Molecular Psychiatry Laboratory, Division of Psychiatry, University College LondonLondon, UK
| | - Niamh Louise O'Brien
- Molecular Psychiatry Laboratory, Division of Psychiatry, University College LondonLondon, UK
| | | | - Andrew McQuillin
- Molecular Psychiatry Laboratory, Division of Psychiatry, University College LondonLondon, UK
| | - Alexandra Jarram
- Molecular Psychiatry Laboratory, Division of Psychiatry, University College LondonLondon, UK
| | - Adebayo Anjorin
- Molecular Psychiatry Laboratory, Division of Psychiatry, University College LondonLondon, UK
| | - Radhika Kandaswamy
- Molecular Psychiatry Laboratory, Division of Psychiatry, University College LondonLondon, UK
| | - David Curtis
- Department of Psychological Medicine, Queen Mary University of LondonLondon, UK
| | - Robert Alan Blizard
- Molecular Psychiatry Laboratory, Division of Psychiatry, University College LondonLondon, UK
| | | |
Collapse
|
23
|
Identification and expression analysis of diapause hormone and pheromone biosynthesis activating neuropeptide (DH-PBAN) in the legume pod borer, Maruca vitrata Fabricius. PLoS One 2014; 9:e84916. [PMID: 24409312 PMCID: PMC3883689 DOI: 10.1371/journal.pone.0084916] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2013] [Accepted: 11/27/2013] [Indexed: 11/29/2022] Open
Abstract
Neuropeptides play essential roles in a variety of physiological responses that contribute to the development and reproduction of insects. Both the diapause hormone (DH) and pheromone biosynthesis activating neuropeptide (PBAN) belong to the PBAN/pyrokinin neuropeptide family, which has a conserved pentapeptide motif FXPRL at the C-terminus. We identified the full-length cDNA encoding DH-PBAN in Maruca vitrata, a major lepidopteran pest of leguminous crops. The open reading frame of Marvi-DH-PBAN is 591 bp in length, encoding 197 amino acids, from which five putative neuropeptides [DH, PBAN, α-subesophageal ganglion neuropeptide (SGNP), β-SGNP and γ-SGNP] are derived. Marvi-DH-PBAN was highly similar (83%) to DH-PBAN of Omphisa fuscidentalis (Lepidoptera: Crambidae), but possesses a unique C-terminal FNPRL motif, where asparagine has replaced a serine residue present in other lepidopteran PBAN peptides. The genomic DNA sequence of Marvi-DH-PBAN is 6,231 bp in size and is composed of six exons. Phylogenetic analysis has revealed that the Marvi-DH-PBAN protein sequence is closest to its homolog in Crambidae, but distant from Diptera, Coleoptera and Hymenoptera DH-PBAN, which agrees with the current taxonomy. DH-PBAN transcripts were present in the head and thoracic complex, but absent in the abdomen of M. vitrata. Real-time quantitative PCR assays have demonstrated a relatively higher expression of Marvi-DH-PBAN mRNA in the latter half of the pupal stages and in adults. These findings represent a significant step forward in our understanding of the DH-PBAN gene architecture and phylogeny, and raise the possibility of using Marvi-DH-PBAN to manage M. vitrata populations through molecular techniques.
Collapse
|
24
|
Ribozyme Activity of RNA Nonenzymatically Polymerized from 3′,5′-Cyclic GMP. ENTROPY 2013. [DOI: 10.3390/e15125362] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
25
|
Clark DN, Read RD, Mayhew V, Petersen SC, Argueta LB, Stutz LA, Till RE, Bergsten SM, Robinson BS, Baumann DG, Heap JC, Poole BD. Four Promoters of IRF5 Respond Distinctly to Stimuli and are Affected by Autoimmune-Risk Polymorphisms. Front Immunol 2013; 4:360. [PMID: 24223576 PMCID: PMC3819785 DOI: 10.3389/fimmu.2013.00360] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2013] [Accepted: 10/23/2013] [Indexed: 01/18/2023] Open
Abstract
Introduction: Autoimmune diseases such as systemic lupus erythematosus, rheumatoid arthritis, and multiple sclerosis affect millions of people worldwide. Interferon regulatory factor 5 (IRF5) contains polymorphisms associated with these autoimmune diseases. Two of these functional polymorphisms are found upstream of the IRF5 gene. rs2004640, which is a single nucleotide polymorphism and the CGGGG insertion/deletion (indel) were studied. IRF5 uses four different promoters for its four first exons: 1A, 1B, 1C, and 1D. Each promoter was analyzed, including functional differences due to the autoimmune-risk polymorphisms. Results: IRF5 promoters were analyzed using ChIP-Seq data (ENCODE database) and the FactorBook database to define transcription factor binding sites. To verify promoter activity, the promoters were cloned into luciferase plasmids. Each construct exhibited luciferase activity. Exons 1A and 1D contain putative PU.1 and NFkB binding sites. Imiquimod, a Toll-like receptor 7 (TLR7) ligand, was used to activate these transcription factors. IRF5 levels were doubled after imiquimod treatment (p < 0.001), with specific increases in the 1A promoter (2.2-fold, p = 0.03) and 1D promoter (2.8-fold, p = 0.03). A putative binding site for p53, which affects apoptosis, was found in the promoter for exon 1B. However, site-directed mutagenesis of the p53 site showed no effect in a reporter assay. Conclusion: The IRF5 exon 1B promoter has been characterized, and the responses of each IRF5 promoter to TLR7 stimulation have been determined. Changes in promoter activity and gene expression are likely due to specific and distinct transcription factors that bind to each promoter. Since high expression of IRF5 contributes to the development of autoimmune disease, understanding the source of increased IRF5 levels is key to understanding autoimmune etiology.
Collapse
Affiliation(s)
- Daniel N Clark
- Department of Microbiology and Molecular Biology, Brigham Young University , Provo, UT , USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Duan J, Xu H, Guo H, O'Brochta DA, Wang F, Ma S, Zhang L, Zha X, Zhao P, Xia Q. New insights into the genomic organization and splicing of the doublesex gene, a terminal regulator of sexual differentiation in the silkworm Bombyx mori. PLoS One 2013; 8:e79703. [PMID: 24244545 PMCID: PMC3820697 DOI: 10.1371/journal.pone.0079703] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2013] [Accepted: 09/24/2013] [Indexed: 11/18/2022] Open
Abstract
Sex-determination mechanisms differ among organisms. The primary mechanism is diverse, whereas the terminal regulator is relatively-conserved. We analyzed the transcripts of the Bombyx mori doublesex gene (Bmdsx), and reported novel results concerning the genomic organization and expression of Bmdsx. Bmdsx consists of nine exons and eight introns, of which two exons are novel and have not been reported previously. Bmdsx transcripts are spliced to generate seventeen alternatively-spliced forms and eleven putative trans-spliced variants. Thirteen of the alternatively-spliced forms and five of the putative trans-spliced forms are reported here for the first time. Sequence analysis predicts that ten female-specific, six male-specific splice forms and one splice form found in males and females will result in four female-specific, two male-specific Dsx proteins and one Dsx protein common to males and females. The Dsx proteins are expected to be functional and regulate downstream target genes. Some of the predicted Dsx proteins are described here for the first time. Therefore the expression of the dsx gene in B. mori results in a variety of cis- and trans-spliced transcripts and multiple Dsx proteins. These findings show that in B. mori there is a complicated pattern of dsx splicing, and that the regulation of splicing and sex-specific functions of lepidopteran dsx have evolved complexity.
Collapse
Affiliation(s)
- Jianping Duan
- State Key Laboratory of Silkworm Genome Biology (Southwest University), Chongqing, PR China
- Henan Provincial Key Laboratory of Funiu Mountain Insect Biology, Nanyang Normal University, Nanyang, PR China
| | - Hanfu Xu
- State Key Laboratory of Silkworm Genome Biology (Southwest University), Chongqing, PR China
| | - Huizhen Guo
- State Key Laboratory of Silkworm Genome Biology (Southwest University), Chongqing, PR China
| | - David A. O'Brochta
- Department of Entomology, University of Maryland, College Park, United States of America
| | - Feng Wang
- State Key Laboratory of Silkworm Genome Biology (Southwest University), Chongqing, PR China
| | - Sanyuan Ma
- State Key Laboratory of Silkworm Genome Biology (Southwest University), Chongqing, PR China
| | - Liying Zhang
- State Key Laboratory of Silkworm Genome Biology (Southwest University), Chongqing, PR China
| | - Xingfu Zha
- State Key Laboratory of Silkworm Genome Biology (Southwest University), Chongqing, PR China
| | - Ping Zhao
- State Key Laboratory of Silkworm Genome Biology (Southwest University), Chongqing, PR China
| | - Qingyou Xia
- State Key Laboratory of Silkworm Genome Biology (Southwest University), Chongqing, PR China
- * E-mail:
| |
Collapse
|
27
|
Tesar D, Hötzel I. A dual host vector for Fab phage display and expression of native IgG in mammalian cells. Protein Eng Des Sel 2013; 26:655-62. [PMID: 24065833 DOI: 10.1093/protein/gzt050] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A significant bottleneck in antibody discovery by phage display is the transfer of immunoglobulin variable regions from phage clones to vectors that express immunoglobulin G (IgG) in mammalian cells for screening. Here, we describe a novel phagemid vector for Fab phage display that allows expression of native IgG in mammalian cells without sub-cloning. The vector uses an optimized mammalian signal sequence that drives robust expression of Fab fragments fused to an M13 phage coat protein in Escherichia coli and IgG expression in mammalian cells. To allow the expression of Fab fragments fused to a phage coat protein in E.coli and full-length IgG in mammalian cells from the same vector without sub-cloning, the sequence encoding the phage coat protein was embedded in an optimized synthetic intron within the immunoglobulin heavy chain gene. This intron is removed from transcripts in mammalian cells by RNA splicing. Using this vector, we constructed a synthetic Fab phage display library with diversity in the heavy chain only and selected for clones binding different antigens. Co-transfection of mammalian cells with DNA from individual phage clones and a plasmid expressing the invariant light chain resulted in the expression of native IgG that was used to assay affinity, ligand blocking activity and specificity.
Collapse
Affiliation(s)
- Devin Tesar
- Department of Antibody Engineering, Genentech, 1 DNA Way, South San Francisco, CA 94080, USA
| | | |
Collapse
|
28
|
Chen G, Liu X, Zhang Y, Lin S, Yang Z, Johansson J, Rising A, Meng Q. Full-length minor ampullate spidroin gene sequence. PLoS One 2012; 7:e52293. [PMID: 23251707 PMCID: PMC3522626 DOI: 10.1371/journal.pone.0052293] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2012] [Accepted: 11/12/2012] [Indexed: 11/18/2022] Open
Abstract
Spider silk includes seven protein based fibers and glue-like substances produced by glands in the spider's abdomen. Minor ampullate silk is used to make the auxiliary spiral of the orb-web and also for wrapping prey, has a high tensile strength and does not supercontract in water. So far, only partial cDNA sequences have been obtained for minor ampullate spidroins (MiSps). Here we describe the first MiSp full-length gene sequence from the spider species Araneus ventricosus, using a multidimensional PCR approach. Comparative analysis of the sequence reveals regulatory elements, as well as unique spidroin gene and protein architecture including the presence of an unusually large intron. The spliced full-length transcript of MiSp gene is 5440 bp in size and encodes 1766 amino acid residues organized into conserved nonrepetitive N- and C-terminal domains and a central predominantly repetitive region composed of four units that are iterated in a non regular manner. The repeats are more conserved within A. ventricosus MiSp than compared to repeats from homologous proteins, and are interrupted by two nonrepetitive spacer regions, which have 100% identity even at the nucleotide level.
Collapse
Affiliation(s)
- Gefei Chen
- Institute of Biological Sciences and Biotechnology, Donghua University, Shanghai, People's Republic of China
| | - Xiangqin Liu
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Yunlong Zhang
- Institute of Biological Sciences and Biotechnology, Donghua University, Shanghai, People's Republic of China
| | - Senzhu Lin
- Institute of Biological Sciences and Biotechnology, Donghua University, Shanghai, People's Republic of China
| | - Zijiang Yang
- Institute of Biological Sciences and Biotechnology, Donghua University, Shanghai, People's Republic of China
| | - Jan Johansson
- KI-Alzheimer Disease Research Center, NVS (Neurobiology, Care Sciences, and Society) Department, Karolinska Institutet, Stockholm, Sweden
- Department of Anatomy Physiology and Biochemistry, The Biomedical Centre, Swedish University of Agricultural Sciences, Uppsala, Sweden
- Institute of Mathematics and Natural Sciences, Tallinn University, Tallinn, Estonia
| | - Anna Rising
- KI-Alzheimer Disease Research Center, NVS (Neurobiology, Care Sciences, and Society) Department, Karolinska Institutet, Stockholm, Sweden
- Department of Anatomy Physiology and Biochemistry, The Biomedical Centre, Swedish University of Agricultural Sciences, Uppsala, Sweden
- * E-mail: (AR); (QM)
| | - Qing Meng
- Institute of Biological Sciences and Biotechnology, Donghua University, Shanghai, People's Republic of China
- * E-mail: (AR); (QM)
| |
Collapse
|
29
|
Horowitz DS. The mechanism of the second step of pre-mRNA splicing. WILEY INTERDISCIPLINARY REVIEWS-RNA 2011; 3:331-50. [PMID: 22012849 DOI: 10.1002/wrna.112] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
The molecular mechanisms of the second step of pre-mRNA splicing in yeast and higher eukaryotes are reviewed. The important elements in the pre-mRNA, the participating proteins, and the proposed secondary structures and roles of the snRNAs are described. The sequence of events in the second step is presented, focusing on the actions of the proteins in setting up and facilitating the second reaction. Mechanisms for avoiding errors in splicing are discussed.
Collapse
Affiliation(s)
- David S Horowitz
- Department of Biochemistry and Molecular Biology, Uniformed Services University of the Health Sciences, Bethesda, MD 20814, USA.
| |
Collapse
|
30
|
Ellis JR, Heinrich B, Mautner VF, Kluwe L. Effects of splicing mutations on NF2-transcripts: transcript analysis and information theoretic predictions. Genes Chromosomes Cancer 2011; 50:571-84. [PMID: 21563229 DOI: 10.1002/gcc.20876] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2010] [Revised: 03/03/2011] [Accepted: 03/07/2011] [Indexed: 11/07/2022] Open
Abstract
This study examined the effects of 22 putative splicing mutations in the NF2 gene by means of transcript analysis and information theory based prediction. Fourteen mutations were within the dinucleotide acceptor and donor regions, often referred to as (AG/GT) sequences. Six were outside these dinucleotide regions but within the more broadly defined splicing regions used in the information theory based model. Two others were in introns and outside the broadly defined regions. Transcript analysis revealed exon skipping or activation of one or more cryptic splicing sites for 17 mutations. No alterations were found for the two intronic mutations and for three mutations in the broadly defined splicing regions. Concordance and partial concordance between the calculated predictions and the results of transcript analysis were found for 14 and 6 mutations, respectively. For two mutations, the predicted alteration was not found in the transcripts. Our results demonstrate that the effects of splicing mutations in NF2 are often complex and that information theory based analysis is helpful in elucidating the consequences of these mutations.
Collapse
Affiliation(s)
- James R Ellis
- Laboratory of Bioengineering and Physical Science, National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Bethesda, MD 20892-5766, USA.
| | | | | | | |
Collapse
|
31
|
Salvemini M, Mauro U, Lombardo F, Milano A, Zazzaro V, Arcà B, Polito LC, Saccone G. Genomic organization and splicing evolution of the doublesex gene, a Drosophila regulator of sexual differentiation, in the dengue and yellow fever mosquito Aedes aegypti. BMC Evol Biol 2011; 11:41. [PMID: 21310052 PMCID: PMC3045327 DOI: 10.1186/1471-2148-11-41] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2010] [Accepted: 02/10/2011] [Indexed: 01/01/2023] Open
Abstract
Background In the model system Drosophila melanogaster, doublesex (dsx) is the double-switch gene at the bottom of the somatic sex determination cascade that determines the differentiation of sexually dimorphic traits. Homologues of dsx are functionally conserved in various dipteran species, including the malaria vector Anopheles gambiae. They show a striking conservation of sex-specific regulation, based on alternative splicing, and of the encoded sex-specific proteins, which are transcriptional regulators of downstream terminal genes that influence sexual differentiation of cells, tissues and organs. Results In this work, we report on the molecular characterization of the dsx homologue in the dengue and yellow fever vector Aedes aegypti (Aeadsx). Aeadsx produces sex-specific transcripts by alternative splicing, which encode isoforms with a high degree of identity to Anopheles gambiae and Drosophila melanogaster homologues. Interestingly, Aeadsx produces an additional novel female-specific splicing variant. Genomic comparative analyses between the Aedes and Anopheles dsx genes revealed a partial conservation of the exon organization and extensive divergence in the intron lengths. An expression analysis showed that Aeadsx transcripts were present from early stages of development and that sex-specific regulation starts at least from late larval stages. The analysis of the female-specific untranslated region (UTR) led to the identification of putative regulatory cis-elements potentially involved in the sex-specific splicing regulation. The Aedes dsx sex-specific splicing regulation seems to be more complex with the respect of other dipteran species, suggesting slightly novel evolutionary trajectories for its regulation and hence for the recruitment of upstream splicing regulators. Conclusions This study led to uncover the molecular evolution of Aedes aegypti dsx splicing regulation with the respect of the more closely related Culicidae Anopheles gambiae orthologue. In Aedes aegypti, the dsx gene is sex-specifically regulated and encodes two female-specific and one male-specific isoforms, all sharing a doublesex/mab-3 (DM) domain-containing N-terminus and different C-termini. The sex-specific regulation is based on a combination of exon skipping, 5' alternative splice site choice and, most likely, alternative polyadenylation. Interestingly, when the Aeadsx gene is compared to the Anopheles dsx ortholog, there are differences in the in silico predicted default and regulated sex-specific splicing events, which suggests that the upstream regulators either are different or act in a slightly different manner. Furthermore, this study is a premise for the future development of transgenic sexing strains in mosquitoes useful for sterile insect technique (SIT) programs.
Collapse
Affiliation(s)
- Marco Salvemini
- Department of Biological Sciences, Section of Genetics and Molecular Biology, University of Naples Federico II, Italy
| | | | | | | | | | | | | | | |
Collapse
|
32
|
TRII: A Probabilistic Scoring of Drosophila melanogaster Translation Initiation Sites. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2010; 2010:814127. [PMID: 21318134 PMCID: PMC3171364 DOI: 10.1155/2010/814127] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2010] [Revised: 08/23/2010] [Accepted: 10/14/2010] [Indexed: 11/17/2022]
Abstract
Relative individual information is a measurement that scores the quality of DNA- and RNA-binding sites for biological machines. The development of analytical approaches to increase the power of this scoring method will improve its utility in evaluating the functions of motifs. In this study, the scoring method was applied to potential translation initiation sites in Drosophila to compute Translation Relative Individual Information (TRII) scores. The weight matrix at the core of the scoring method was optimized based on high-confidence translation initiation sites identified by using a progressive partitioning approach. Comparing the distributions of TRII scores for sites of interest with those for high-confidence translation initiation sites and random sequences provides a new methodology for assessing the quality of translation initiation sites. The optimized weight matrices can also be used to describe the consensus at translation initiation sites, providing a quantitative measure of preferred and avoided nucleotides at each position.
Collapse
|
33
|
Abstract
The idea that we could build molecular communications systems can be advanced by investigating how actual molecules from living organisms function. Information theory provides tools for such an investigation. This review describes how we can compute the average information in the DNA binding sites of any genetic control protein and how this can be extended to analyze its individual sites. A formula equivalent to Claude Shannon's channel capacity can be applied to molecular systems and used to compute the efficiency of protein binding. This efficiency is often 70% and a brief explanation for that is given. The results imply that biological systems have evolved to function at channel capacity, which means that we should be able to build molecular communications that are just as robust as our macroscopic ones.
Collapse
Affiliation(s)
- Thomas D. Schneider
- National Institutes of Health, National Cancer Institute at Frederick, P.O. Box B, Frederick, MD 21702-1201, United States
| |
Collapse
|
34
|
Schneider TD. 70% efficiency of bistate molecular machines explained by information theory, high dimensional geometry and evolutionary convergence. Nucleic Acids Res 2010; 38:5995-6006. [PMID: 20562221 PMCID: PMC2952855 DOI: 10.1093/nar/gkq389] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The relationship between information and energy is key to understanding biological systems. We can display the information in DNA sequences specifically bound by proteins by using sequence logos, and we can measure the corresponding binding energy. These can be compared by noting that one of the forms of the second law of thermodynamics defines the minimum energy dissipation required to gain one bit of information. Under the isothermal conditions that molecular machines function this is Emin = Kb T ln 2 joules per bit (kB is Boltzmann's constant and T is the absolute temperature). Then an efficiency of binding can be computed by dividing the information in a logo by the free energy of binding after it has been converted to bits. The isothermal efficiencies of not only genetic control systems, but also visual pigments are near 70%. From information and coding theory, the theoretical efficiency limit for bistate molecular machines is ln 2=0.6931. Evolutionary convergence to maximum efficiency is limited by the constraint that molecular states must be distinct from each other. The result indicates that natural molecular machines operate close to their information processing maximum (the channel capacity), and implies that nanotechnology can attain this goal.
Collapse
Affiliation(s)
- Thomas D Schneider
- Center for Cancer Research Nanobiology Program, National Cancer Institute, Frederick, MD 21702-1201, USA.
| |
Collapse
|
35
|
Dunham I, Beare DM, Collins JE. The characteristics of human genes: analysis of human chromosome 22. Comp Funct Genomics 2010; 4:635-46. [PMID: 18629020 PMCID: PMC2447302 DOI: 10.1002/cfg.335] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2003] [Revised: 09/04/2003] [Accepted: 09/08/2003] [Indexed: 11/11/2022] Open
Affiliation(s)
- Ian Dunham
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
| | | | | |
Collapse
|
36
|
Fraser HI, Dendrou CA, Healy B, Rainbow DB, Howlett S, Smink LJ, Gregory S, Steward CA, Todd JA, Peterson LB, Wicker LS. Nonobese diabetic congenic strain analysis of autoimmune diabetes reveals genetic complexity of the Idd18 locus and identifies Vav3 as a candidate gene. THE JOURNAL OF IMMUNOLOGY 2010; 184:5075-84. [PMID: 20363978 DOI: 10.4049/jimmunol.0903734] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
We have used the public sequencing and annotation of the mouse genome to delimit the previously resolved type 1 diabetes (T1D) insulin-dependent diabetes (Idd)18 interval to a region on chromosome 3 that includes the immunologically relevant candidate gene, Vav3. To test the candidacy of Vav3, we developed a novel congenic strain that enabled the resolution of Idd18 to a 604-kb interval, designated Idd18.1, which contains only two annotated genes: the complete sequence of Vav3 and the last exon of the gene encoding NETRIN G1, Ntng1. Targeted sequencing of Idd18.1 in the NOD mouse strain revealed that allelic variation between NOD and C57BL/6J (B6) occurs in noncoding regions with 138 single nucleotide polymorphisms concentrated in the introns between exons 20 and 27 and immediately after the 3' untranslated region. We observed differential expression of VAV3 RNA transcripts in thymocytes when comparing congenic mouse strains with B6 or NOD alleles at Idd18.1. The T1D protection associated with B6 alleles of Idd18.1/Vav3 requires the presence of B6 protective alleles at Idd3, which are correlated with increased IL-2 production and regulatory T cell function. In the absence of B6 protective alleles at Idd3, we detected a second T1D protective B6 locus, Idd18.3, which is closely linked to, but distinct from, Idd18.1. Therefore, genetic mapping, sequencing, and gene expression evidence indicate that alteration of VAV3 expression is an etiological factor in the development of autoimmune beta-cell destruction in NOD mice. This study also demonstrates that a congenic strain mapping approach can isolate closely linked susceptibility genes.
Collapse
Affiliation(s)
- Heather I Fraser
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Cambridge
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Ogino H, Nakayama R, Sakamoto H, Yoshida T, Sugimura T, Masutani M. Analysis of poly(ADP-ribose) polymerase-1 (PARP1) gene alteration in human germ cell tumor cell lines. ACTA ACUST UNITED AC 2010; 197:8-15. [PMID: 20113831 DOI: 10.1016/j.cancergencyto.2009.10.012] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2009] [Revised: 10/17/2009] [Accepted: 10/17/2009] [Indexed: 10/19/2022]
Abstract
The poly(ADP-ribose) polymerase-1 protein (PARP-1) functions in DNA repair, maintenance of genomic stability, induction of cell death, and transcriptional regulation. We previously analyzed alterations of the PARP1 gene in 16 specimens of human germ cell tumors, and found a heterozygous sequence alteration that causes the amino acid substitution Met129Thr (M129T) in both tumor and normal tissues in a single patient. In this study, aberration of the PARP1 gene and protein was further analyzed in human germ cell tumor cell lines. We found a nonheterozygous sequence alteration that causes the amino acid substitution Glu251Lys (E251K) located at a conserved peptide stretch of PARP-1 in cell line NEC8. Sequencing of 95 samples from Japanese healthy volunteers revealed that all the samples were homozygous for the wild-type alleles at M129T and E251K. The M129T allele is thus suggested to be a rare single-nucleotide polymorphism (SNP). We observed a decrease in auto-poly(ADP-ribosyl)ation activity of PARP-1 proteins harboring M129T or E251K amino acid substitution, but the difference was not statistically significant. The levels of PARP-1 and poly(ADP-ribosyl)ation were heterogeneous among germ cell tumor cell lines. The SNPs of the PARP1 gene, as well as differences in the levels of PARP-1 and poly(ADP-ribosyl)ation of proteins, may influence germ cell tumor development and responses to chemotherapy and radiotherapy.
Collapse
Affiliation(s)
- Hideki Ogino
- Biochemistry Division, National Cancer Center Research Institute, 1-1 Tsukiji 5-chome, Chuo-ku, Tokyo, Japan
| | | | | | | | | | | |
Collapse
|
38
|
Nonsense-mediated decay enables intron gain in Drosophila. PLoS Genet 2010; 6:e1000819. [PMID: 20107520 PMCID: PMC2809761 DOI: 10.1371/journal.pgen.1000819] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2009] [Accepted: 12/18/2009] [Indexed: 12/03/2022] Open
Abstract
Intron number varies considerably among genomes, but despite their fundamental importance, the mutational mechanisms and evolutionary processes underlying the expansion of intron number remain unknown. Here we show that Drosophila, in contrast to most eukaryotic lineages, is still undergoing a dramatic rate of intron gain. These novel introns carry significantly weaker splice sites that may impede their identification by the spliceosome. Novel introns are more likely to encode a premature termination codon (PTC), indicating that nonsense-mediated decay (NMD) functions as a backup for weak splicing of new introns. Our data suggest that new introns originate when genomic insertions with weak splice sites are hidden from selection by NMD. This mechanism reduces the sequence requirement imposed on novel introns and implies that the capacity of the spliceosome to recognize weak splice sites was a prerequisite for intron gain during eukaryotic evolution. The surprising observation 30 years ago that genes are interrupted by non-coding introns changed our view of gene architecture. Intron number varies dramatically among species; ranging from nine introns/gene in humans to less than one in some simple eukyarotes. Here we ask where new introns come from and how they are maintained in a population. We find that novel introns do not arise from pre-existing introns, although the mechanisms that generate novel introns remain unclear. We also show that novel introns carry only weak signals for their identification and removal, and therefore depend on nonsense-mediated decay (NMD). NMD maintains RNA quality control by degrading transcripts that have not been spliced properly. We propose that NMD shelters novel introns from natural selection. This increases the likelihood that a novel intron will rise in frequency and be maintained within a population, thus increasing the rate of intron gain.
Collapse
|
39
|
Cloning and characterization of the monkey histamine H3 receptor isoforms. Eur J Pharmacol 2008; 601:8-15. [PMID: 18977214 DOI: 10.1016/j.ejphar.2008.10.026] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2008] [Revised: 09/26/2008] [Accepted: 10/13/2008] [Indexed: 11/21/2022]
Abstract
We have recently identified three splice isoforms of the histamine H(3) receptor in multiple brain regions of cynomolgus monkey (Macaca fascicularis). Two of the novel isoforms displayed a deletion in the third intracellular loop (H(3)(413) and H(3)(410)), the third isoform H(3)(335) displayed a deletion in the i3 intracellular loop and a complete deletion of the putative fifth transmembrane domain TM5. We have confirmed by RT-PCR the expression of full-length H(3)(445) mRNA as well as H(3)(413), H(3)(410), and H(3)(335) splice isoform mRNA in multiple monkey brain regions including the frontal, parietal and occipital cortex, parahippocampal gyrus, hippocampus, amygdala, caudate nucleus, putamen, thalamus, hypothalamus, and cerebellum. The full-length isoform H(3)(445) was predominant in all of the regions tested, followed by H(3)(335), with the H(3)(413) and H(3)(410) being of low abundance. When expressed in C6 cells, H(3)(445), H(3)(413), and H(3)(410) exhibit high affinity binding to the agonist ligand [(3)H]-(N)-alpha-methylhistamine with respective pK(D) values of 9.7, 9.7, and 9.6. As expected, the H(3)(335) isoform did not display any saturable binding with [(3)H]-(N)-alpha-methylhistamine. The histamine H(3) receptor agonists histamine, (R)-alpha-methylhistamine, imetit and proxyfan were able to activate calcium mobilization responses through H(3)(445), H(3)(413) and H(3)(410) receptors when they were co-expressed with the chimeric G alpha(qi5)-protein in HEK293 cells, while no response was elicited in cells expressing the H(3)(335) isoform. The existence of multiple H(3) receptor splice isoforms across species raises the possibility that isoform specific properties including ligand affinity, signal transduction coupling, and brain localization may differentially contribute to observed in vivo effects of histamine H(3) receptor antagonists.
Collapse
|
40
|
Stewart ME, Desport M, Setiyaningsih S, Hartaningsih N, Wilcox GE. Analysis of Jembrana disease virus mRNA transcripts produced during acute infection demonstrates a complex transcription pattern. Virus Res 2008; 135:336-9. [DOI: 10.1016/j.virusres.2008.03.017] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2008] [Revised: 03/19/2008] [Accepted: 03/26/2008] [Indexed: 10/22/2022]
|
41
|
Lee JH, Culver G, Carpenter S, Dobbs D. Analysis of the EIAV Rev-responsive element (RRE) reveals a conserved RNA motif required for high affinity Rev binding in both HIV-1 and EIAV. PLoS One 2008; 3:e2272. [PMID: 18523581 PMCID: PMC2386976 DOI: 10.1371/journal.pone.0002272] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2008] [Accepted: 04/15/2008] [Indexed: 11/29/2022] Open
Abstract
A cis-acting RNA regulatory element, the Rev-responsive element (RRE), has essential roles in replication of lentiviruses, including human immunodeficiency virus (HIV-1) and equine infection anemia virus (EIAV). The RRE binds the viral trans-acting regulatory protein, Rev, to mediate nucleocytoplasmic transport of incompletely spliced mRNAs encoding viral structural genes and genomic RNA. Because of its potential as a clinical target, RRE-Rev interactions have been well studied in HIV-1; however, detailed molecular structures of Rev-RRE complexes in other lentiviruses are still lacking. In this study, we investigate the secondary structure of the EIAV RRE and interrogate regulatory protein-RNA interactions in EIAV Rev-RRE complexes. Computational prediction and detailed chemical probing and footprinting experiments were used to determine the RNA secondary structure of EIAV RRE-1, a 555 nt region that provides RRE function in vivo. Chemical probing experiments confirmed the presence of several predicted loop and stem-loop structures, which are conserved among 140 EIAV sequence variants. Footprinting experiments revealed that Rev binding induces significant structural rearrangement in two conserved domains characterized by stable stem-loop structures. Rev binding region-1 (RBR-1) corresponds to a genetically-defined Rev binding region that overlaps exon 1 of the EIAV rev gene and contains an exonic splicing enhancer (ESE). RBR-2, characterized for the first time in this study, is required for high affinity binding of EIAV Rev to the RRE. RBR-2 contains an RNA structural motif that is also found within the high affinity Rev binding site in HIV-1 (stem-loop IIB), and within or near mapped RRE regions of four additional lentiviruses. The powerful integration of computational and experimental approaches in this study has generated a validated RNA secondary structure for the EIAV RRE and provided provocative evidence that high affinity Rev binding sites of HIV-1 and EIAV share a conserved RNA structural motif. The presence of this motif in phylogenetically divergent lentiviruses suggests that it may play a role in highly conserved interactions that could be targeted in novel anti-lentiviral therapies.
Collapse
Affiliation(s)
- Jae-Hyung Lee
- Bioinformatics and Computational Biology Program, Department of Genetics, Development and Cell Biology, Iowa State University, Ames, Iowa, United States of America.
| | | | | | | |
Collapse
|
42
|
Rekha TS, Mitra CK. Comparative analysis of splice site regions by information content. GENOMICS PROTEOMICS & BIOINFORMATICS 2007; 4:230-7. [PMID: 17531798 PMCID: PMC5054069 DOI: 10.1016/s1672-0229(07)60003-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
We have applied concepts from information theory for a comparative analysis of donor (gt) and acceptor (ag) splice site regions in the genes of five different organisms by calculating their mutual information content (relative entropy) over a selected block of nucleotides. A similar pattern that the information content decreases as the block size increases was observed for both regions in all the organisms studied. This result suggests that the information required for splicing might be contained in the consensus of ~6–8 nt at both regions. We assume from our study that even though the nucleotides are showing some degrees of conservation in the flanking regions of the splice sites, certain level of variability is still tolerated, which leads the splicing process to occur normally even if the extent of base pairing is not fully satisfied. We also suggest that this variability can be compensated by recognizing different splice sites with different spliceosomal factors.
Collapse
|
43
|
Kawase T, Akatsuka Y, Torikai H, Morishima S, Oka A, Tsujimura A, Miyazaki M, Tsujimura K, Miyamura K, Ogawa S, Inoko H, Morishima Y, Kodera Y, Kuzushima K, Takahashi T. Alternative splicing due to an intronic SNP in HMSD generates a novel minor histocompatibility antigen. Blood 2007; 110:1055-63. [PMID: 17409267 DOI: 10.1182/blood-2007-02-075911] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Here we report the identification of a novel human leukocyte antigen (HLA)-B44-restricted minor histocompatibility antigen (mHA) with expression limited to hematopoietic cells. cDNA expression cloning studies demonstrated that the cytotoxic T lymphocyte (CTL) epitope of interest was encoded by a novel allelic splice variant of HMSD, hereafter designated as HMSD-v. The immunogenicity of the epitope was generated by differential protein expression due to alternative splicing, which was completely controlled by 1 intronic single-nucleotide polymorphism located in the consensus 5' splice site adjacent to an exon. Both HMSD-v and HMSD transcripts were selectively expressed at higher levels in mature dendritic cells and primary leukemia cells, especially those of myeloid lineage. Engraftment of mHA(+) myeloid leukemia stem cells in nonobese diabetic/severe combined immunodeficient (NOD/SCID)/gammac(null) mice was completely inhibited by in vitro preincubation with the mHA-specific CTL clone, suggesting that this mHA is expressed on leukemic stem cells. The patient from whom the CTL clone was isolated demonstrated a significant increase of the mHA-specific T cells in posttransplantation peripheral blood, whereas mHA-specific T cells were undetectable in pretransplantation peripheral blood and in peripheral blood from his donor. These findings suggest that the HMSD-v-encoded mHA (designated ACC-6) could serve as a target antigen for immunotherapy against hematologic malignancies.
Collapse
MESH Headings
- Alternative Splicing/genetics
- Alternative Splicing/immunology
- Animals
- Cell Line, Tumor
- DNA, Complementary/genetics
- DNA, Complementary/immunology
- Epitopes, T-Lymphocyte/genetics
- Epitopes, T-Lymphocyte/immunology
- HLA-B Antigens/genetics
- HLA-B Antigens/immunology
- HLA-B44 Antigen
- Humans
- Immunotherapy
- Leukemia, Myeloid/genetics
- Leukemia, Myeloid/immunology
- Leukemia, Myeloid/therapy
- Mice
- Mice, Inbred NOD
- Mice, SCID
- Minor Histocompatibility Antigens/genetics
- Minor Histocompatibility Antigens/immunology
- Neoplastic Stem Cells
- Polymorphism, Single Nucleotide
- RNA Splice Sites
- T-Lymphocytes, Cytotoxic/immunology
Collapse
Affiliation(s)
- Takakazu Kawase
- Division of Immunology, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya 464-8681, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Keren B, Suzuki OT, Gérard-Blanluet M, Brémond-Gignac D, Elmaleh M, Titomanlio L, Delezoide AL, Passos-Bueno MR, Verloes A. CNS malformations in Knobloch syndrome with splice mutation inCOL18A1 gene. Am J Med Genet A 2007; 143A:1514-8. [PMID: 17546652 DOI: 10.1002/ajmg.a.31784] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Boris Keren
- Clinical Genetic Unit, Department of Medical Genetics, APHP, Robert Debré University Hospital, 48 boulevard Serurier, 75019 Paris, France
| | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Abstract
Information theory was used to build a promoter model that accounts for the -10, the -35 and the uncertainty of the gap between them on a common scale. Helical face assignment indicated that base -7, rather than -11, of the -10 may be flipping to initiate transcription. We found that the sequence conservation of sigma70 binding sites is 6.5 +/- 0.1 bits. Some promoters lack a -35 region, but have a 6.7 +/- 0.2 bit extended -10, almost the same information as the bipartite promoter. These results and similarities between the contacts in the extended -10 binding and the -35 suggest that the flexible bipartite sigma factor evolved from a simpler polymerase. Binding predicted by the bipartite model is enriched around 35 bases upstream of the translational start. This distance is the smallest 5' mRNA leader necessary for ribosome binding, suggesting that selective pressure minimizes transcript length. The promoter model was combined with models of the transcription factors Fur and Lrp to locate new promoters, to quantify promoter strengths, and to predict activation and repression. Finally, the DNA-bending proteins Fis, H-NS and IHF frequently have sites within one DNA persistence length from the -35, so bending allows distal activators to reach the polymerase.
Collapse
Affiliation(s)
| | | | | | - Thomas D. Schneider
- To whom correspondence should be addressed. Tel: +1 301 846 5581; Fax: +1 301 846 5598;
| |
Collapse
|
46
|
Segovia-Juarez JL, Colombano S, Kirschner D. Identifying DNA splice sites using hypernetworks with artificial molecular evolution. Biosystems 2006; 87:117-24. [PMID: 17116361 DOI: 10.1016/j.biosystems.2006.09.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2005] [Revised: 07/08/2006] [Accepted: 07/15/2006] [Indexed: 11/28/2022]
Abstract
Identifying DNA splice sites is a main task of gene hunting. We introduce the hyper-network architecture as a novel method for finding DNA splice sites. The hypernetwork architecture is a biologically inspired information processing system composed of networks of molecules forming cells, and a number of cells forming a tissue or organism. Its learning is based on molecular evolution. DNA examples taken from GenBank were translated into binary strings and fed into a hypernetwork for training. We performed experiments to explore the generalization performance of hypernetwork learning in this data set by two-fold cross validation. The hypernetwork generalization performance was comparable to well known classification algorithms. With the best hypernetwork obtained, including local information and heuristic rules, we built a system (HyperExon) to obtain splice site candidates. The HyperExon system outperformed leading splice recognition systems in the list of sequences tested.
Collapse
Affiliation(s)
- Jose L Segovia-Juarez
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
| | | | | |
Collapse
|
47
|
Platzer M, Hiller M, Szafranski K, Jahn N, Hampe J, Schreiber S, Backofen R, Huse K. Sequencing errors or SNPs at splice-acceptor guanines in dbSNP? Nat Biotechnol 2006; 24:1068-70. [PMID: 16964207 DOI: 10.1038/nbt0906-1068b] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
48
|
Nielsen H, Wernersson R. An overabundance of phase 0 introns immediately after the start codon in eukaryotic genes. BMC Genomics 2006; 7:256. [PMID: 17034638 PMCID: PMC1626468 DOI: 10.1186/1471-2164-7-256] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2006] [Accepted: 10/11/2006] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND A knowledge of the positions of introns in eukaryotic genes is important for understanding the evolution of introns. Despite this, there has been relatively little focus on the distribution of intron positions in genes. RESULTS In proteins with signal peptides, there is an overabundance of phase 1 introns around the region of the signal peptide cleavage site. This has been described before. But in proteins without signal peptides, a novel phenomenon is observed: There is a sharp peak of phase 0 intron positions immediately following the start codon, i.e. between codons 1 and 2. This effect is seen in a wide range of eukaryotes: Vertebrates, arthropods, fungi, and flowering plants. Proteins carrying this start codon intron are found to comprise a special class of relatively short, lysine-rich and conserved proteins with an overrepresentation of ribosomal proteins. In addition, there is a peak of phase 0 introns at position 5 in Drosophila genes with signal peptides, predominantly representing cuticle proteins. CONCLUSION There is an overabundance of phase 0 introns immediately after the start codon in eukaryotic genes, which has been described before only for human ribosomal proteins. We give a detailed description of these start codon introns and the proteins that contain them.
Collapse
Affiliation(s)
- Henrik Nielsen
- Center for Biological Sequence Analysis, Technical University of Denmark, Building 208, 2800 Lyngby, Denmark
| | - Rasmus Wernersson
- Center for Biological Sequence Analysis, Technical University of Denmark, Building 208, 2800 Lyngby, Denmark
| |
Collapse
|
49
|
Bindewald E, Schneider TD, Shapiro BA. CorreLogo: an online server for 3D sequence logos of RNA and DNA alignments. Nucleic Acids Res 2006; 34:W405-11. [PMID: 16845037 PMCID: PMC1538790 DOI: 10.1093/nar/gkl269] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
We present an online server that generates a 3D representation of properties of user-submitted RNA or DNA alignments. The visualized properties are information of single alignment columns, mutual information of two alignment positions as well as the position-specific fraction of gaps. The nucleotide composition of both single columns and column pairs is visualized with the help of color-coded 3D bars labeled with letters. The server generates both VRML and JVX output that can be viewed with a VRML viewer or the JavaView applet, respectively. We show that combining these different features of an alignment into one 3D representation is helpful in identifying correlations between bases and potential RNA and DNA base pairs. Significant known correlations between the tRNA 3′ anticodon cardinal nucleotide and the extended anticodon were observed, as were correlations within the amino acid acceptor stem and between the cardinal nucleotide and the acceptor stem. The online server can be accessed using the URL .
Collapse
Affiliation(s)
| | - Thomas D. Schneider
- Center for Cancer Research Nanobiology Program, NCI-FrederickFrederick, MD 21702, USA
| | - Bruce A. Shapiro
- Center for Cancer Research Nanobiology Program, NCI-FrederickFrederick, MD 21702, USA
- To whom correspondence should be addressed. Tel: +1 301 846 5536; Fax: +1 301 846 5598;
| |
Collapse
|
50
|
Sheth N, Roca X, Hastings ML, Roeder T, Krainer AR, Sachidanandam R. Comprehensive splice-site analysis using comparative genomics. Nucleic Acids Res 2006; 34:3955-67. [PMID: 16914448 PMCID: PMC1557818 DOI: 10.1093/nar/gkl556] [Citation(s) in RCA: 270] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2006] [Revised: 07/13/2006] [Accepted: 07/17/2006] [Indexed: 11/12/2022] Open
Abstract
We have collected over half a million splice sites from five species-Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans and Arabidopsis thaliana-and classified them into four subtypes: U2-type GT-AG and GC-AG and U12-type GT-AG and AT-AC. We have also found new examples of rare splice-site categories, such as U12-type introns without canonical borders, and U2-dependent AT-AC introns. The splice-site sequences and several tools to explore them are available on a public website (SpliceRack). For the U12-type introns, we find several features conserved across species, as well as a clustering of these introns on genes. Using the information content of the splice-site motifs, and the phylogenetic distance between them, we identify: (i) a higher degree of conservation in the exonic portion of the U2-type splice sites in more complex organisms; (ii) conservation of exonic nucleotides for U12-type splice sites; (iii) divergent evolution of C.elegans 3' splice sites (3'ss) and (iv) distinct evolutionary histories of 5' and 3'ss. Our study proves that the identification of broad patterns in naturally-occurring splice sites, through the analysis of genomic datasets, provides mechanistic and evolutionary insights into pre-mRNA splicing.
Collapse
Affiliation(s)
- Nihar Sheth
- Cold Spring Harbor Laboratory1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Xavier Roca
- Cold Spring Harbor Laboratory1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | | | - Ted Roeder
- Cold Spring Harbor Laboratory1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Adrian R. Krainer
- Cold Spring Harbor Laboratory1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Ravi Sachidanandam
- Cold Spring Harbor Laboratory1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| |
Collapse
|