1
|
Shen A, Hencel K, Parker M, Scott R, Skukan R, Adesina A, Metheringham C, Miska E, Nam Y, Haerty W, Simpson G, Akay A. U6 snRNA m6A modification is required for accurate and efficient splicing of C. elegans and human pre-mRNAs. Nucleic Acids Res 2024; 52:9139-9160. [PMID: 38808663 PMCID: PMC11347140 DOI: 10.1093/nar/gkae447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 05/08/2024] [Accepted: 05/28/2024] [Indexed: 05/30/2024] Open
Abstract
pre-mRNA splicing is a critical feature of eukaryotic gene expression. Both cis- and trans-splicing rely on accurately recognising splice site sequences by spliceosomal U snRNAs and associated proteins. Spliceosomal snRNAs carry multiple RNA modifications with the potential to affect different stages of pre-mRNA splicing. Here, we show that the conserved U6 snRNA m6A methyltransferase METT-10 is required for accurate and efficient cis- and trans-splicing of C. elegans pre-mRNAs. The absence of METT-10 in C. elegans and METTL16 in humans primarily leads to alternative splicing at 5' splice sites with an adenosine at +4 position. In addition, METT-10 is required for splicing of weak 3' cis- and trans-splice sites. We identified a significant overlap between METT-10 and the conserved splicing factor SNRNP27K in regulating 5' splice sites with +4A. Finally, we show that editing endogenous 5' splice site +4A positions to +4U restores splicing to wild-type positions in a mett-10 mutant background, supporting a direct role for U6 snRNA m6A modification in 5' splice site recognition. We conclude that the U6 snRNA m6A modification is important for accurate and efficient pre-mRNA splicing.
Collapse
Affiliation(s)
- Aykut Shen
- School of Biological Sciences, University of East Anglia, NR4 7TJ Norwich, UK
| | - Katarzyna Hencel
- School of Biological Sciences, University of East Anglia, NR4 7TJ Norwich, UK
| | - Matthew T Parker
- School of Life Sciences, University of Dundee, Dow Street, Dundee DD1 5EH, UK
| | - Robyn Scott
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Roberta Skukan
- School of Biological Sciences, University of East Anglia, NR4 7TJ Norwich, UK
| | | | | | - Eric A Miska
- Wellcome/CRUK Gurdon Institute, University of Cambridge, Tennis Court Rd, Cambridge CB2 1QN, UK
| | - Yunsun Nam
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Wilfried Haerty
- School of Biological Sciences, University of East Anglia, NR4 7TJ Norwich, UK
- Earlham Institute, Norwich Research Park, Norwich, UK
| | - Gordon G Simpson
- School of Life Sciences, University of Dundee, Dow Street, Dundee DD1 5EH, UK
- Cell & Molecular Sciences, James Hutton Institute, Invergowrie, DD2 5DA, UK
| | - Alper Akay
- School of Biological Sciences, University of East Anglia, NR4 7TJ Norwich, UK
| |
Collapse
|
2
|
Shen A, Hencel K, Parker MT, Scott R, Skukan R, Adesina AS, Metheringham CL, Miska EA, Nam Y, Haerty W, Simpson GG, Akay A. U6 snRNA m6A modification is required for accurate and efficient cis- and trans-splicing of C. elegans mRNAs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.16.558044. [PMID: 37745402 PMCID: PMC10516052 DOI: 10.1101/2023.09.16.558044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
pre-mRNA splicing is a critical feature of eukaryotic gene expression. Many eukaryotes use cis-splicing to remove intronic sequences from pre-mRNAs. In addition to cis-splicing, many organisms use trans-splicing to replace the 5' ends of mRNAs with a non-coding spliced-leader RNA. Both cis- and trans-splicing rely on accurately recognising splice site sequences by spliceosomal U snRNAs and associated proteins. Spliceosomal snRNAs carry multiple RNA modifications with the potential to affect different stages of pre-mRNA splicing. Here, we show that m6A modification of U6 snRNA A43 by the RNA methyltransferase METT-10 is required for accurate and efficient cis- and trans-splicing of C. elegans pre-mRNAs. The absence of U6 snRNA m6A modification primarily leads to alternative splicing at 5' splice sites. Furthermore, weaker 5' splice site recognition by the unmodified U6 snRNA A43 affects splicing at 3' splice sites. U6 snRNA m6A43 and the splicing factor SNRNP27K function to recognise an overlapping set of 5' splice sites with an adenosine at +4 position. Finally, we show that U6 snRNA m6A43 is required for efficient SL trans-splicing at weak 3' trans-splice sites. We conclude that the U6 snRNA m6A modification is important for accurate and efficient cis- and trans-splicing in C. elegans.
Collapse
Affiliation(s)
- Aykut Shen
- School of Biological Sciences, University of East Anglia, NR4 7TJ, Norwich
| | - Katarzyna Hencel
- School of Biological Sciences, University of East Anglia, NR4 7TJ, Norwich
- These authors contributed equally
| | - Matthew T Parker
- School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK
- These authors contributed equally
| | - Robyn Scott
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Roberta Skukan
- School of Biological Sciences, University of East Anglia, NR4 7TJ, Norwich
| | | | | | - Eric A Miska
- Wellcome/CRUK Gurdon Institute, University of Cambridge, Tennis Court Rd, Cambridge, CB2 1QN, UK
| | - Yunsun Nam
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Wilfried Haerty
- School of Biological Sciences, University of East Anglia, NR4 7TJ, Norwich
- Earlham Institute, Norwich Research Park, Norwich, UK
| | - Gordon G Simpson
- School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK
- Cell & Molecular Sciences, James Hutton Institute, Invergowrie, DD2 5DA, UK
| | - Alper Akay
- School of Biological Sciences, University of East Anglia, NR4 7TJ, Norwich
| |
Collapse
|
3
|
Bernard F, Dargère D, Rechavi O, Dupuy D. Quantitative analysis of C. elegans transcripts by Nanopore direct-cDNA sequencing reveals terminal hairpins in non trans-spliced mRNAs. Nat Commun 2023; 14:1229. [PMID: 36869073 PMCID: PMC9984361 DOI: 10.1038/s41467-023-36915-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 02/23/2023] [Indexed: 03/05/2023] Open
Abstract
In nematodes and kinetoplastids, mRNA processing involves a trans-splicing step through which a short sequence from a snRNP replaces the original 5' end of the primary transcript. It has long been held that 70% of C. elegans mRNAs are submitted to trans-splicing. Our recent work suggested that the mechanism is more pervasive but not fully captured by mainstream transcriptome sequencing methods. Here we use Oxford Nanopore's long-read amplification-free sequencing technology to perform a comprehensive analysis of trans-splicing in worms. We demonstrate that spliced leader (SL) sequences at the 5' end of the mRNAs affect library preparation and generate sequencing artefacts due to their self-complementarity. Consistent with our previous observations, we find evidence of trans-splicing for most genes. However, a subset of genes appears to be only marginally trans-spliced. These mRNAs all share the capacity to generate a 5' terminal hairpin structure mimicking the SL structure and offering a mechanistic explanation for their non conformity. Altogether, our data provide a comprehensive quantitative analysis of SL usage in C. elegans.
Collapse
Affiliation(s)
- Florian Bernard
- Université de Bordeaux, Inserm U1212, CNRS UMR5320, Institut Européen de Chimie et Biologie (IECB), 2, rue Robert Escarpit, 33607, Pessac, France.,Department of Neurobiology, Wise Faculty of Life Sciences & Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Delphine Dargère
- Université de Bordeaux, Inserm U1212, CNRS UMR5320, Institut Européen de Chimie et Biologie (IECB), 2, rue Robert Escarpit, 33607, Pessac, France
| | - Oded Rechavi
- Department of Neurobiology, Wise Faculty of Life Sciences & Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Denis Dupuy
- Université de Bordeaux, Inserm U1212, CNRS UMR5320, Institut Européen de Chimie et Biologie (IECB), 2, rue Robert Escarpit, 33607, Pessac, France.
| |
Collapse
|
4
|
Abstract
BACKGROUND The evolution of spliceosomal introns has been widely studied among various eukaryotic groups. Researchers nearly reached the consensuses on the pattern and the mechanisms of intron losses and gains across eukaryotes. However, according to previous studies that analyzed a few genes or genomes, Nematoda seems to be an eccentric group. RESULTS Taking advantage of the recent accumulation of sequenced genomes, we extensively analyzed the intron losses and gains using 104 nematode genomes across all the five Clades of the phylum. Nematodes have a wide range of intron density, from less than one to more than nine per kbp coding sequence. The rates of intron losses and gains exhibit significant heterogeneity both across different nematode lineages and across different evolutionary stages of the same lineage. The frequency of intron losses far exceeds that of intron gains. Five pieces of evidence supporting the model of cDNA-mediated intron loss have been observed in ten Caenorhabditis species, the dominance of the precise intron losses, frequent loss of adjacent introns, high-level expression of the intron-lost genes, preferential losses of short introns, and the preferential losses of introns close to 3'-ends of genes. Like studies in most eukaryotic groups, we cannot find the source sequences for the limited number of intron gains detected in the Caenorhabditis genomes. CONCLUSIONS These results indicate that nematodes are a typical eukaryotic group rather than an outlier in intron evolution.
Collapse
Affiliation(s)
- Ming-Yue Ma
- Chongqing Key Laboratory of Big Data for Bio Intelligence, School of Bioinformatics, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Ji Xia
- Chongqing Key Laboratory of Big Data for Bio Intelligence, School of Bioinformatics, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Kun-Xian Shu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, School of Bioinformatics, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Deng-Ke Niu
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China.
| |
Collapse
|
5
|
Evolutionary Dynamics of the SKN-1 → MED → END-1,3 Regulatory Gene Cascade in Caenorhabditis Endoderm Specification. G3-GENES GENOMES GENETICS 2020; 10:333-356. [PMID: 31740453 PMCID: PMC6945043 DOI: 10.1534/g3.119.400724] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Gene regulatory networks and their evolution are important in the study of animal development. In the nematode, Caenorhabditis elegans, the endoderm (gut) is generated from a single embryonic precursor, E. Gut is specified by the maternal factor SKN-1, which activates the MED → END-1,3 → ELT-2,7 cascade of GATA transcription factors. In this work, genome sequences from over two dozen species within the Caenorhabditis genus are used to identify MED and END-1,3 orthologs. Predictions are validated by comparison of gene structure, protein conservation, and putative cis-regulatory sites. All three factors occur together, but only within the Elegans supergroup, suggesting they originated at its base. The MED factors are the most diverse and exhibit an unexpectedly extensive gene amplification. In contrast, the highly conserved END-1 orthologs are unique in nearly all species and share extended regions of conservation. The END-1,3 proteins share a region upstream of their zinc finger and an unusual amino-terminal poly-serine domain exhibiting high codon bias. Compared with END-1, the END-3 proteins are otherwise less conserved as a group and are typically found as paralogous duplicates. Hence, all three factors are under different evolutionary constraints. Promoter comparisons identify motifs that suggest the SKN-1, MED, and END factors function in a similar gut specification network across the Elegans supergroup that has been conserved for tens of millions of years. A model is proposed to account for the rapid origin of this essential kernel in the gut specification network, by the upstream intercalation of duplicate genes into a simpler ancestral network.
Collapse
|
6
|
Zhou C, Gao X, Hu S, Gan W, Xu J, Ma YC, Ma L. RBM-5 modulates U2AF large subunit-dependent alternative splicing in C. elegans. RNA Biol 2018; 15:1295-1308. [PMID: 30295127 PMCID: PMC6284560 DOI: 10.1080/15476286.2018.1526540] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Revised: 08/23/2018] [Accepted: 09/11/2018] [Indexed: 01/06/2023] Open
Abstract
A key step in pre-mRNA splicing is the recognition of 3' splicing sites by the U2AF large and small subunits, a process regulated by numerous trans-acting splicing factors. How these trans-acting factors interact with U2AF in vivo is unclear. From a screen for suppressors of the temperature-sensitive (ts) lethality of the C. elegans U2AF large subunit gene uaf-1(n4588) mutants, we identified mutations in the RNA binding motif gene rbm-5, a homolog of the tumor suppressor gene RBM5. rbm-5 mutations can suppress uaf-1(n4588) ts-lethality by loss of function and neuronal expression of rbm-5 was sufficient to rescue the suppression. Transcriptome analyses indicate that uaf-1(n4588) affected the expression of numerous genes and rbm-5 mutations can partially reverse the abnormal gene expression to levels similar to that of wild type. Though rbm-5 mutations did not obviously affect alternative splicing per se, they can suppress or enhance, in a gene-specific manner, the altered splicing of genes in uaf-1(n4588) mutants. Specifically, the recognition of a weak 3' splice site was more susceptible to the effect of rbm-5. Our findings provide novel in vivo evidence that RBM-5 can modulate UAF-1-dependent RNA splicing and suggest that RBM5 might interact with U2AF large subunit to affect tumor formation.
Collapse
Affiliation(s)
- Chuanman Zhou
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Xiaoyang Gao
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Surong Hu
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Wenjing Gan
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Jing Xu
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Yongchao C. Ma
- Departments of Pediatrics, Neurology and Physiology, Northwestern University Feinberg School of Medicine, Anne & Robert H. Lurie Children’s Hospital of Chicago, Chicago, Illinois, USA
| | - Long Ma
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| |
Collapse
|
7
|
Parvathaneni RK, DeLeo VL, Spiekerman JJ, Chakraborty D, Devos KM. Parallel loss of introns in the ABCB1 gene in angiosperms. BMC Evol Biol 2017; 17:238. [PMID: 29202710 PMCID: PMC5716013 DOI: 10.1186/s12862-017-1077-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Accepted: 11/16/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The presence of non-coding introns is a characteristic feature of most eukaryotic genes. While the size of the introns, number of introns per gene and the number of intron-containing genes can vary greatly between sequenced eukaryotic genomes, the structure of a gene with reference to intron presence and positions is typically conserved in closely related species. Unexpectedly, the ABCB1 (ATP-Binding Cassette Subfamily B Member 1) gene which encodes a P-glycoprotein and underlies dwarfing traits in maize (br2), sorghum (dw3) and pearl millet (d2) displayed considerable variation in intron composition. RESULTS An analysis of the ABCB1 gene structure in 80 angiosperms revealed that the number of introns ranged from one to nine. All introns in ABCB1 underwent either a one-time loss (single loss in one lineage/species) or multiple independent losses (parallel loss in two or more lineages/species) with the majority of losses occurring within the grass family. In contrast, the structure of the closest homolog to ABCB1, ABCB19, remained constant in the majority of angiosperms analyzed. Using known phylogenetic relationships within the grasses, we determined the ancestral branch-points where the losses occurred. Intron 7, the longest intron, was lost in only a single species, Mimulus guttatus, following duplication of ABCB1. Semiquantitative PCR showed that the M. guttatus ABCB1 gene copy without intron 7 had significantly lower transcript levels than the gene copy with intron 7. We further demonstrated that intron 7 carried two motifs that were highly conserved across the monocot-dicot divide. CONCLUSIONS The ABCB1 gene structure is highly dynamic, while the structure of ABCB19 remained largely conserved through evolution. Precise removal of introns, preferential removal of smaller introns and presence of at least 2 bp of microhomology flanking most introns indicated that intron loss may have predominantly occurred through non-homologous end-joining (NHEJ) repair of double strand breaks. Lack of microhomology in the exon upstream of lost phase I introns was likely due to release of the selective constraint on the penultimate base (3rd base in codon) of the terminal codon by the splicing machinery. In addition to size, the presence of regulatory motifs will make introns recalcitrant to loss.
Collapse
Affiliation(s)
- Rajiv K Parvathaneni
- Institute of Plant Breeding, Genetics and Genomics, University of Georgia, 30602, Athens, Georgia, United States.,Current address: Donald Danforth Plant Science Center, St. Louis, MO, 63132, United States
| | - Victoria L DeLeo
- Department of Genetics, University of Georgia, 30602, Athens, GA, United States.,Current address: Department of Biology, Pennsylvania State University, University Park, PA, 16802, United States
| | - John J Spiekerman
- Department of Plant Biology, University of Georgia, 30602, Athens, GA, United States
| | - Debkanta Chakraborty
- Institute of Bioinformatics, University of Georgia, 30602, Athens, GA, United States
| | - Katrien M Devos
- Institute of Plant Breeding, Genetics and Genomics, University of Georgia, 30602, Athens, Georgia, United States. .,Department of Plant Biology, University of Georgia, 30602, Athens, GA, United States. .,Institute of Bioinformatics, University of Georgia, 30602, Athens, GA, United States.
| |
Collapse
|
8
|
Tan JH, Fraser AG. The combinatorial control of alternative splicing in C. elegans. PLoS Genet 2017; 13:e1007033. [PMID: 29121637 PMCID: PMC5697891 DOI: 10.1371/journal.pgen.1007033] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Revised: 11/21/2017] [Accepted: 09/19/2017] [Indexed: 12/31/2022] Open
Abstract
Normal development requires the right splice variants to be made in the right tissues at the right time. The core splicing machinery is engaged in all splicing events, but which precise splice variant is made requires the choice between alternative splice sites—for this to occur, a set of splicing factors (SFs) must recognize and bind to short RNA motifs in the pre-mRNA. In C. elegans, there is known to be extensive variation in splicing patterns across development, but little is known about the targets of each SF or how multiple SFs combine to regulate splicing. Here we combine RNA-seq with in vitro binding assays to study how 4 different C. elegans SFs, ASD-1, FOX-1, MEC-8, and EXC-7, regulate splicing. The 4 SFs chosen all have well-characterised biology and well-studied loss-of-function genetic alleles, and all contain RRM domains. Intriguingly, while the SFs we examined have varied roles in C. elegans development, they show an unexpectedly high overlap in their targets. We also find that binding sites for these SFs occur on the same pre-mRNAs more frequently than expected suggesting extensive combinatorial control of splicing. We confirm that regulation of splicing by multiple SFs is often combinatorial and show that this is functionally significant. We also find that SFs appear to combine to affect splicing in two modes—they either bind in close proximity within the same intron or they appear to bind to separate regions of the intron in a conserved order. Finally, we find that the genes whose splicing are regulated by multiple SFs are highly enriched for genes involved in the cytoskeleton and in ion channels that are key for neurotransmission. Together, this shows that specific classes of genes have complex combinatorial regulation of splicing and that this combinatorial regulation is critical for normal development to occur. Alternative splicing (AS) is a highly regulated process that is crucial for normal development. It requires the core splicing machinery, but the specific choice of splice site during AS is controlled by splicing factors (SFs) such as ELAV or RBFOX proteins that bind to specific sequences in pre-mRNAs to regulate usage of different splice sites. AS varies across the C. elegans life cycle and here we study how diverse SFs combine to regulate AS during C. elegans development. We selected 4 RRM-containing SFs that are all well studied and that have well-characterised loss-of-function genetic alleles. We find that these SFs regulate many of the same targets, and that combinatorial interactions between these SFs affect both individual splicing events and organism-level phenotypes including specific effects on the neuromuscular system. We further show that SFs combine to regulate splicing of an individual pre-mRNA in two distinct modes—either by binding in close proximity or by binding in a defined order on the pre-mRNA. Finally, we find that the genes whose splicing are most likely to be regulated by multiple SFs are genes that are required for the proper function of the neuromuscular system. These genes are also most likely to have changing AS patterns across development, suggesting that their splicing regulation is highly complex and developmentally regulated. Taken together, our data show that the precise splice variant expressed at any point in development is often the outcome of regulation by multiple SFs.
Collapse
Affiliation(s)
- June H. Tan
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, 1 King’s College Circle, Toronto, ON, Canada
| | - Andrew G. Fraser
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, 1 King’s College Circle, Toronto, ON, Canada
- * E-mail:
| |
Collapse
|
9
|
Fradin H, Kiontke K, Zegar C, Gutwein M, Lucas J, Kovtun M, Corcoran DL, Baugh LR, Fitch DHA, Piano F, Gunsalus KC. Genome Architecture and Evolution of a Unichromosomal Asexual Nematode. Curr Biol 2017; 27:2928-2939.e6. [PMID: 28943090 PMCID: PMC5659720 DOI: 10.1016/j.cub.2017.08.038] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 08/14/2017] [Accepted: 08/15/2017] [Indexed: 10/24/2022]
Abstract
Asexual reproduction in animals, though rare, is the main or exclusive mode of reproduction in some long-lived lineages. The longevity of asexual clades may be correlated with the maintenance of heterozygosity by mechanisms that rearrange genomes and reduce recombination. Asexual species thus provide an opportunity to gain insight into the relationship between molecular changes, genome architecture, and cellular processes. Here we report the genome sequence of the parthenogenetic nematode Diploscapter pachys with only one chromosome pair. We show that this unichromosomal architecture is shared by a long-lived clade of asexual nematodes closely related to the genetic model organism Caenorhabditis elegans. Analysis of the genome assembly reveals that the unitary chromosome arose through fusion of six ancestral chromosomes, with extensive rearrangement among neighboring regions. Typical nematode telomeres and telomeric protection-encoding genes are lacking. Most regions show significant heterozygosity; homozygosity is largely concentrated to one region and attributed to gene conversion. Cell-biological and molecular evidence is consistent with the absence of key features of meiosis I, including synapsis and recombination. We propose that D. pachys preserves heterozygosity and produces diploid embryos without fertilization through a truncated meiosis. As a prelude to functional studies, we demonstrate that D. pachys is amenable to experimental manipulation by RNA interference.
Collapse
Affiliation(s)
- Hélène Fradin
- Department of Biology, New York University, New York, NY 10003, USA; Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Karin Kiontke
- Department of Biology, New York University, New York, NY 10003, USA
| | - Charles Zegar
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Michelle Gutwein
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Jessica Lucas
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Mikhail Kovtun
- Duke Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA
| | - David L Corcoran
- Duke Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA
| | - L Ryan Baugh
- Department of Biology, Duke University, Durham, NC 27708, USA
| | - David H A Fitch
- Department of Biology, New York University, New York, NY 10003, USA.
| | - Fabio Piano
- Department of Biology, New York University, New York, NY 10003, USA; Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA; Center for Genomics and Systems Biology, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates.
| | - Kristin C Gunsalus
- Department of Biology, New York University, New York, NY 10003, USA; Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA; Center for Genomics and Systems Biology, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates.
| |
Collapse
|
10
|
Abstract
There are millions of sequences deposited in genomic databases, and it is an important task to categorize them according to their structural and functional roles. Sequence comparison is a prerequisite for proper categorization of both DNA and protein sequences, and helps in assigning a putative or hypothetical structure and function to a given sequence. There are various methods available for comparing sequences, alignment being first and foremost for sequences with a small number of base pairs as well as for large-scale genome comparison. Various tools are available for performing pairwise large sequence comparison. The best known tools either perform global alignment or generate local alignments between the two sequences. In this chapter we first provide basic information regarding sequence comparison. This is followed by the description of the PAM and BLOSUM matrices that form the basis of sequence comparison. We also give a practical overview of currently available methods such as BLAST and FASTA, followed by a description and overview of tools available for genome comparison including LAGAN, MumMER, BLASTZ, and AVID.
Collapse
|
11
|
Making sense of genomes of parasitic worms: Tackling bioinformatic challenges. Biotechnol Adv 2016; 34:663-686. [DOI: 10.1016/j.biotechadv.2016.03.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Revised: 02/25/2016] [Accepted: 03/01/2016] [Indexed: 01/25/2023]
|
12
|
N-Ethyl-N-Nitrosourea (ENU) Mutagenesis Reveals an Intronic Residue Critical for Caenorhabditis elegans 3' Splice Site Function in Vivo. G3-GENES GENOMES GENETICS 2016; 6:1751-6. [PMID: 27172199 PMCID: PMC4889670 DOI: 10.1534/g3.116.028662] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Metazoan introns contain a polypyrimidine tract immediately upstream of the AG dinucleotide that defines the 3' splice site. In the nematode Caenorhabditis elegans, 3' splice sites are characterized by a highly conserved UUUUCAG/R octamer motif. While the conservation of pyrimidines in this motif is strongly suggestive of their importance in pre-mRNA splicing, in vivo evidence in support of this is lacking. In an N-ethyl-N-nitrosourea (ENU) mutagenesis screen in Caenorhabditis elegans, we have isolated a strain containing a point mutation in the octamer motif of a 3' splice site in the daf-12 gene. This mutation, a single base T-to-G transversion at the -5 position relative to the splice site, causes a strong daf-12 loss-of-function phenotype by abrogating splicing. The resulting transcript is predicted to encode a truncated DAF-12 protein generated by translation into the retained intron, which contains an in-frame stop codon. Other than the perfectly conserved AG dinucleotide at the site of splicing, G at the -5 position of the octamer motif is the most uncommon base in C. elegans 3' splice sites, occurring at closely paired sites where the better match to the splicing consensus is a few bases downstream. Our results highlight both the biological importance of the highly conserved -5 uridine residue in the C. elegans 3' splice site octamer motif as well as the utility of using ENU as a mutagen to study the function of polypyrimidine tracts and other AU- or AT-rich motifs in vivo.
Collapse
|
13
|
Sohail M, Xie J. Diverse regulation of 3' splice site usage. Cell Mol Life Sci 2015; 72:4771-93. [PMID: 26370726 PMCID: PMC11113787 DOI: 10.1007/s00018-015-2037-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2015] [Revised: 08/12/2015] [Accepted: 09/03/2015] [Indexed: 01/13/2023]
Abstract
The regulation of splice site (SS) usage is important for alternative pre-mRNA splicing and thus proper expression of protein isoforms in cells; its disruption causes diseases. In recent years, an increasing number of novel regulatory elements have been found within or nearby the 3'SS in mammalian genes. The diverse elements recruit a repertoire of trans-acting factors or form secondary structures to regulate 3'SS usage, mostly at the early steps of spliceosome assembly. Their mechanisms of action mainly include: (1) competition between the factors for RNA elements, (2) steric hindrance between the factors, (3) direct interaction between the factors, (4) competition between two splice sites, or (5) local RNA secondary structures or longer range loops, according to the mode of protein/RNA interactions. Beyond the 3'SS, chromatin remodeling/transcription, posttranslational modifications of trans-acting factors and upstream signaling provide further layers of regulation. Evolutionarily, some of the 3'SS elements seem to have emerged in mammalian ancestors. Moreover, other possibilities of regulation such as that by non-coding RNA remain to be explored. It is thus likely that there are more diverse elements/factors and mechanisms that influence the choice of an intron end. The diverse regulation likely contributes to a more complex but refined transcriptome and proteome in mammals.
Collapse
Affiliation(s)
- Muhammad Sohail
- Department of Physiology and Pathophysiology, College of Medicine, Faculty of Health Sciences, University of Manitoba, Winnipeg, MB, R3E 0J9, Canada
| | - Jiuyong Xie
- Department of Physiology and Pathophysiology, College of Medicine, Faculty of Health Sciences, University of Manitoba, Winnipeg, MB, R3E 0J9, Canada.
- Department of Biochemistry and Medical Genetics, College of Medicine, Faculty of Health Sciences, University of Manitoba, Winnipeg, MB, R3E 0J9, Canada.
| |
Collapse
|
14
|
Gao X, Teng Y, Luo J, Huang L, Li M, Zhang Z, Ma YC, Ma L. The survival motor neuron gene smn-1 interacts with the U2AF large subunit gene uaf-1 to regulate Caenorhabditis elegans lifespan and motor functions. RNA Biol 2015; 11:1148-60. [PMID: 25483032 DOI: 10.4161/rna.36100] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Spinal muscular atrophy (SMA), the most frequent human congenital motor neuron degenerative disease, is caused by loss-of-function mutations in the highly conserved survival motor neuron gene SMN1. Mutations in SMN could affect several molecular processes, among which aberrant pre-mRNA splicing caused by defective snRNP biogenesis is hypothesized as a major cause of SMA. To date little is known about the interactions of SMN with other splicing factor genes and how SMN affects splicing in vivo. The nematode Caenorhabditis elegans carries a single ortholog of SMN, smn-1, and has been used as a model for studying the molecular functions of SMN. We analyzed RNA splicing of reporter genes in an smn-1 deletion mutant and found that smn-1 is required for efficient splicing at weak 3' splice sites. Genetic studies indicate that the defective lifespan and motor functions of the smn-1 deletion mutants could be significantly improved by mutations of the splicing factor U2AF large subunit gene uaf-1. In smn-1 mutants we detected a reduced expression of U1 and U5 snRNAs and an increased expression of U2, U4 and U6 snRNAs. Our study verifies an essential role of smn-1 for RNA splicing in vivo, identifies the uaf-1 gene as a potential genetic modifier of smn-1 mutants, and suggests that SMN-1 has multifaceted effects on the expression of spliceosomal snRNAs.
Collapse
Affiliation(s)
- Xiaoyang Gao
- a State Key Laboratory of Medical Genetics; School of Life Sciences ; Central South University ; Changsha , Hunan , China
| | | | | | | | | | | | | | | |
Collapse
|
15
|
Ma MY, Zhu T, Li XN, Lan XR, Liu HY, Yang YF, Niu DK. Imprecise intron losses are less frequent than precise intron losses but are not rare in plants. Biol Direct 2015; 10:24. [PMID: 27392031 PMCID: PMC4443532 DOI: 10.1186/s13062-015-0056-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2015] [Accepted: 04/24/2015] [Indexed: 11/10/2022] Open
Abstract
Abstract In this study, we identified 19 intron losses, including 11 precise intron losses (PILs), six imprecise intron losses (IILs), one de-exonization, and one exon deletion in tomato and potato, and 17 IILs in Arabidopsis thaliana. Comparative analysis of related genomes confirmed that all of the IILs have been fixed during evolution. Consistent with previous studies, our results indicate that PILs are a major type of intron loss. However, at least in plants, IILs are unlikely to be as rare as previously reported. Reviewers This article was reviewed by Jun Yu and Zhang Zhang. For complete reviews, see the Reviewers’ Reports section. Electronic supplementary material The online version of this article (doi:10.1186/s13062-015-0056-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ming-Yue Ma
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Tao Zhu
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Xue-Nan Li
- Beijing Computing Center, Beijing, 10094, China
| | - Xin-Ran Lan
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Heng-Yuan Liu
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Yu-Fei Yang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China.,Present address: Institute of Genetics & Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Deng-Ke Niu
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China.
| |
Collapse
|
16
|
Abstract
The essential neurotransmitter acetylcholine functions throughout the animal kingdom. In Caenorhabditis elegans, the acetylcholine biosynthetic enzyme [choline acetyltransferase (ChAT)] and vesicular transporter [vesicular acetylcholine transporter (VAChT)] are encoded by the cha-1 and unc-17 genes, respectively. These two genes compose a single complex locus in which the unc-17 gene is nested within the first intron of cha-1, and the two gene products arise from a common pre-messenger RNA (pre-mRNA) by alternative splicing. This genomic organization, known as the cholinergic gene locus (CGL), is conserved throughout the animal kingdom, suggesting that the structure is important for the regulation and function of these genes. However, very little is known about CGL regulation in any species. We now report the identification of an unusual type of splicing regulation in the CGL of C. elegans, mediated by two pairs of complementary sequence elements within the locus. We show that both pairs of elements are required for efficient splicing to the distal acceptor, and we also demonstrate that proper distal splicing depends more on sequence complementarity within each pair of elements than on the sequences themselves. We propose that these sequence elements are able to form stem-loop structures in the pre-mRNA; such structures would favor specific splicing alternatives and thus regulate CGL splicing. We have identified complementary elements at comparable locations in the genomes of representative species of other animal phyla; we suggest that this unusual regulatory mechanism may be a general feature of CGLs.
Collapse
|
17
|
Husson SJ, Reumer A, Temmerman L, De Haes W, Schoofs L, Mertens I, Baggerman G. Worm peptidomics. EUPA OPEN PROTEOMICS 2014. [DOI: 10.1016/j.euprot.2014.04.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
18
|
Abstract
How introns are lost from eukaryotic genomes during evolution remains an enigmatic question in biology. By comparative genome analysis of five Caenorhabditis and eight Drosophila species, we found that the likelihood of intron loss is highly influenced by the degree of sequence homology at exon–intron junctions: a significant elevated degree of microhomology was observed for sequences immediately flanking those introns that were eliminated from the genome of one or more subspecies. This determinant was significant even at individual nucleotides. We propose that microhomology-mediated DNA repair underlies this phenomenon, which we termed microhomology-mediated intron loss. This hypothesis is further supported by the observations that in both species 1) smaller introns are preferentially lost over longer ones and 2) genes that are highly transcribed in germ cells, and are thus more prone to DNA double strand breaks, display elevated frequencies of intron loss. Our data also testify against a prominent role for reverse transcriptase-mediated intron loss in metazoans.
Collapse
Affiliation(s)
- Robin van Schendel
- Department of Toxicogenetics, Leiden University Medical Center, The Netherlands
| | | |
Collapse
|
19
|
Abstract
The University of California Santa Cruz (UCSC) Genome Browser is a popular Web-based tool for quickly displaying a requested portion of a genome at any scale, accompanied by a series of aligned annotation "tracks." The annotations generated by the UCSC Genome Bioinformatics Group and external collaborators include gene predictions, mRNA and expressed sequence tag alignments, simple nucleotide polymorphisms, expression and regulatory data, phenotype and variation data, and pairwise and multiple-species comparative genomics data. All information relevant to a region is presented in one window, facilitating biological analysis and interpretation. The database tables underlying the Genome Browser tracks can be viewed, downloaded, and manipulated using another Web-based application, the UCSC Table Browser. Users can upload personal datasets in a wide variety of formats as custom annotation tracks in both browsers for research or educational purposes. This unit describes how to use the Genome Browser and Table Browser for genome analysis, download the underlying database tables, and create and display custom annotation tracks.
Collapse
Affiliation(s)
- Donna Karolchik
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, California, USA
| | | | | |
Collapse
|
20
|
Menconi G, Battaglia G, Grossi R, Pisanti N, Marangoni R. Mobilomics in Saccharomyces cerevisiae strains. BMC Bioinformatics 2013; 14:102. [PMID: 23514613 PMCID: PMC3684551 DOI: 10.1186/1471-2105-14-102] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2012] [Accepted: 02/11/2013] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Mobile Genetic Elements (MGEs) are selfish DNA integrated in the genomes. Their detection is mainly based on consensus-like searches by scanning the investigated genome against the sequence of an already identified MGE. Mobilomics aims at discovering all the MGEs in a genome and understanding their dynamic behavior: The data for this kind of investigation can be provided by comparative genomics of closely related organisms. The amount of data thus involved requires a strong computational effort, which should be alleviated. RESULTS Our approach proposes to exploit the high similarity among homologous chromosomes of different strains of the same species, following a progressive comparative genomics philosophy. We introduce a software tool based on our new fast algorithm, called regender, which is able to identify the conserved regions between chromosomes. Our case study is represented by a unique recently available dataset of 39 different strains of S.cerevisiae, which regender is able to compare in few minutes. By exploring the non-conserved regions, where MGEs are mainly retrotransposons called Tys, and marking the candidate Tys based on their length, we are able to locate a priori and automatically all the already known Tys and map all the putative Tys in all the strains. The remaining putative mobile elements (PMEs) emerging from this intra-specific comparison are sharp markers of inter-specific evolution: indeed, many events of non-conservation among different yeast strains correspond to PMEs. A clustering based on the presence/absence of the candidate Tys in the strains suggests an evolutionary interconnection that is very similar to classic phylogenetic trees based on SNPs analysis, even though it is computed without using phylogenetic information. CONCLUSIONS The case study indicates that the proposed methodology brings two major advantages: (a) it does not require any template sequence for the wanted MGEs and (b) it can be applied to infer MGEs also for low coverage genomes with unresolved bases, where traditional approaches are largely ineffective.
Collapse
Affiliation(s)
- Giulia Menconi
- Istituto Nazionale di Alta Matematica, Città Universitaria, Roma, Italia
| | | | | | | | | |
Collapse
|
21
|
Abstract
Applications of clustering algorithms in biomedical research are ubiquitous, with typical examples including gene expression data analysis, genomic sequence analysis, biomedical document mining, and MRI image analysis. However, due to the diversity of cluster analysis, the differing terminologies, goals, and assumptions underlying different clustering algorithms can be daunting. Thus, determining the right match between clustering algorithms and biomedical applications has become particularly important. This paper is presented to provide biomedical researchers with an overview of the status quo of clustering algorithms, to illustrate examples of biomedical applications based on cluster analysis, and to help biomedical researchers select the most suitable clustering algorithms for their own applications.
Collapse
Affiliation(s)
- Rui Xu
- Industrial Artificial Intelligence Laboratory, GE Global Research Center, Niskayuna, NY 12309, USA.
| | | |
Collapse
|
22
|
Hsieh YW, Chang C, Chuang CF. The microRNA mir-71 inhibits calcium signaling by targeting the TIR-1/Sarm1 adaptor protein to control stochastic L/R neuronal asymmetry in C. elegans. PLoS Genet 2012; 8:e1002864. [PMID: 22876200 PMCID: PMC3410857 DOI: 10.1371/journal.pgen.1002864] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2011] [Accepted: 06/12/2012] [Indexed: 01/06/2023] Open
Abstract
The Caenorhabditis elegans left and right AWC olfactory neurons communicate to establish stochastic asymmetric identities, AWC(ON) and AWC(OFF), by inhibiting a calcium-mediated signaling pathway in the future AWC(ON) cell. NSY-4/claudin-like protein and NSY-5/innexin gap junction protein are the two parallel signals that antagonize the calcium signaling pathway to induce the AWC(ON) fate. However, it is not known how the calcium signaling pathway is downregulated by nsy-4 and nsy-5 in the AWC(ON) cell. Here we identify a microRNA, mir-71, that represses the TIR-1/Sarm1 adaptor protein in the calcium signaling pathway to promote the AWC(ON) identity. Similar to tir-1 loss-of-function mutants, overexpression of mir-71 generates two AWC(ON) neurons. tir-1 expression is downregulated through its 3' UTR in AWC(ON), in which mir-71 is expressed at a higher level than in AWC(OFF). In addition, mir-71 is sufficient to inhibit tir-1 expression in AWC through the mir-71 complementary site in the tir-1 3' UTR. Our genetic studies suggest that mir-71 acts downstream of nsy-4 and nsy-5 to promote the AWC(ON) identity in a cell autonomous manner. Furthermore, the stability of mature mir-71 is dependent on nsy-4 and nsy-5. Together, these results provide insight into the mechanism by which nsy-4 and nsy-5 inhibit calcium signaling to establish stochastic asymmetric AWC differentiation.
Collapse
Affiliation(s)
- Yi-Wen Hsieh
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center Research Foundation, Cincinnati, Ohio, United States of America
| | - Chieh Chang
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center Research Foundation, Cincinnati, Ohio, United States of America
- * E-mail: (CC); (C-FC)
| | - Chiou-Fen Chuang
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center Research Foundation, Cincinnati, Ohio, United States of America
- * E-mail: (CC); (C-FC)
| |
Collapse
|
23
|
Irimia M, Tena JJ, Alexis MS, Fernandez-Miñan A, Maeso I, Bogdanovic O, de la Calle-Mustienes E, Roy SW, Gómez-Skarmeta JL, Fraser HB. Extensive conservation of ancient microsynteny across metazoans due to cis-regulatory constraints. Genome Res 2012; 22:2356-67. [PMID: 22722344 PMCID: PMC3514665 DOI: 10.1101/gr.139725.112] [Citation(s) in RCA: 98] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The order of genes in eukaryotic genomes has generally been assumed to be neutral, since gene order is largely scrambled over evolutionary time. Only a handful of exceptional examples are known, typically involving deeply conserved clusters of tandemly duplicated genes (e.g., Hox genes and histones). Here we report the first systematic survey of microsynteny conservation across metazoans, utilizing 17 genome sequences. We identified nearly 600 pairs of unrelated genes that have remained tightly physically linked in diverse lineages across over 600 million years of evolution. Integrating sequence conservation, gene expression data, gene function, epigenetic marks, and other genomic features, we provide extensive evidence that many conserved ancient linkages involve (1) the coordinated transcription of neighboring genes, or (2) genomic regulatory blocks (GRBs) in which transcriptional enhancers controlling developmental genes are contained within nearby bystander genes. In addition, we generated ChIP-seq data for key histone modifications in zebrafish embryos, which provided further evidence of putative GRBs in embryonic development. Finally, using chromosome conformation capture (3C) assays and stable transgenic experiments, we demonstrate that enhancers within bystander genes drive the expression of genes such as Otx and Islet, critical regulators of central nervous system development across bilaterians. These results suggest that ancient genomic functional associations are far more common than previously thought—involving ∼12% of the ancestral bilaterian genome—and that cis-regulatory constraints are crucial in determining metazoan genome architecture.
Collapse
Affiliation(s)
- Manuel Irimia
- Department of Biology, Stanford University, Stanford, California 94305, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Wang C, Wilson-Berry L, Schedl T, Hansen D. TEG-1 CD2BP2 regulates stem cell proliferation and sex determination in the C. elegans germ line and physically interacts with the UAF-1 U2AF65 splicing factor. Dev Dyn 2012; 241:505-21. [PMID: 22275078 PMCID: PMC3466600 DOI: 10.1002/dvdy.23735] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/03/2012] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND For a stem cell population to exist over an extended period, a balance must be maintained between self-renewing (proliferating) and differentiating daughter cells. Within the Caenorhabditis elegans germ line, this balance is controlled by a genetic regulatory pathway, which includes the canonical Notch signaling pathway. RESULTS Genetic screens identified the gene teg-1 as being involved in regulating the proliferation versus differentiation decision in the C. elegans germ line. Cloning of TEG-1 revealed that it is a homolog of mammalian CD2BP2, which has been implicated in a number of cellular processes, including in U4/U6.U5 tri-snRNP formation in the pre-mRNA splicing reaction. The position of teg-1 in the genetic pathway regulating the proliferation versus differentiation decision, its single mutant phenotype, and its enrichment in nuclei, all suggest TEG-1 also functions as a splicing factor. TEG-1, as well as its human homolog, CD2BP2, directly bind to UAF-1 U2AF65, a component of the U2 auxiliary factor. CONCLUSIONS TEG-1 functions as a splicing factor and acts to regulate the proliferation versus meiosis decision. The interaction of TEG-1 CD2BP2 with UAF-1 U2AF65, combined with its previously described function in U4/U6.U5 tri-snRNP, suggests that TEG-1 CD2BP2 functions in two distinct locations in the splicing cascade.
Collapse
Affiliation(s)
- Chris Wang
- University of Calgary, Department of Biological Sciences, Alberta, Calgary, Canada
| | | | | | | |
Collapse
|
25
|
Ma L, Tan Z, Teng Y, Hoersch S, Horvitz HR. In vivo effects on intron retention and exon skipping by the U2AF large subunit and SF1/BBP in the nematode Caenorhabditis elegans. RNA (NEW YORK, N.Y.) 2011; 17:2201-2211. [PMID: 22033331 PMCID: PMC3222132 DOI: 10.1261/rna.027458.111] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2011] [Accepted: 09/27/2011] [Indexed: 05/31/2023]
Abstract
The in vivo analysis of the roles of splicing factors in regulating alternative splicing in animals remains a challenge. Using a microarray-based screen, we identified a Caenorhabditis elegans gene, tos-1, that exhibited three of the four major types of alternative splicing: intron retention, exon skipping, and, in the presence of U2AF large subunit mutations, the use of alternative 3' splice sites. Mutations in the splicing factors U2AF large subunit and SF1/BBP altered the splicing of tos-1. 3' splice sites of the retained intron or before the skipped exon regulate the splicing pattern of tos-1. Our study provides in vivo evidence that intron retention and exon skipping can be regulated largely by the identities of 3' splice sites.
Collapse
Affiliation(s)
- Long Ma
- State Key Laboratory of Medical Genetics, School of Biological Sciences and Technology, Central South University, Changsha, Hunan 410078, China
| | - Zhiping Tan
- Center for Clinical Gene Diagnosis and Therapy, The Second Xiangya Hospital, State Key Laboratory of Medical Genetics, Central South University, Changsha 410078, China
| | - Yanling Teng
- State Key Laboratory of Medical Genetics, School of Biological Sciences and Technology, Central South University, Changsha, Hunan 410078, China
| | - Sebastian Hoersch
- Koch Institute for Integrative Cancer Research, MIT, Cambridge, Massachusetts 02139, USA
- Max Delbrück Center for Molecular Medicine, 13125 Berlin, Germany
| | - H. Robert Horvitz
- Department of Biology, Howard Hughes Medical Institute, MIT, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
26
|
Brown DG, Li M, Ma B. A TUTORIAL OF RECENT DEVELOPMENTS IN THE SEEDING OF LOCAL ALIGNMENT. J Bioinform Comput Biol 2011; 2:819-42. [PMID: 15617167 DOI: 10.1142/s0219720004000983] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2004] [Revised: 09/28/2004] [Accepted: 09/28/2004] [Indexed: 11/18/2022]
Abstract
We review recent results on local alignment. We begin with a review of classical methods and early heuristic methods, and then focus on more recent work on the seeding of local alignment. We show that these techniques give a vast improvement in both sensitivity and specificity over previous methods, and can achieve sensitivity at the level of classical algorithms while requiring orders of magnitude less runtime.
Collapse
Affiliation(s)
- Daniel G Brown
- Department of Computer Science, University of Waterloo, Waterloo, ON, Canada, N2L 3G1, Canada.
| | | | | |
Collapse
|
27
|
Brejová B, Brown DG, Vinar T. OPTIMAL SPACED SEEDS FOR HOMOLOGOUS CODING REGIONS. J Bioinform Comput Biol 2011; 1:595-610. [PMID: 15290755 DOI: 10.1142/s0219720004000326] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2003] [Revised: 06/23/2003] [Accepted: 06/24/2003] [Indexed: 11/18/2022]
Abstract
Optimal spaced seeds were developed as a method to increase sensitivity of local alignment programs similar to BLASTN. Such seeds have been used before in the program PatternHunter, and have given improved sensitivity and running time relative to BLASTN in genome–genome comparison. We study the problem of computing optimal spaced seeds for detecting homologous coding regions in unannotated genomic sequences. By using well-chosen seeds, we are able to improve the sensitivity of coding sequence alignment over that of TBLASTX, while keeping runtime comparable to BLASTN. We identify good seeds by first giving effective hidden Markov models of conservation in alignments of homologous coding regions. We give an efficient algorithm to compute the optimal spaced seed when conservation patterns are generated by these models. Our results offer the hope of improved gene finding due to fewer missed exons in DNA/DNA comparison, and more effective homology search in general, and may have applications outside of bioinformatics.
Collapse
Affiliation(s)
- Broña Brejová
- School of Computer Science, University of Waterloo, 200 University Ave West, Waterloo, ON N2L3G1, Canada.
| | | | | |
Collapse
|
28
|
Fawcett JA, Rouzé P, Van de Peer Y. Higher intron loss rate in Arabidopsis thaliana than A. lyrata is consistent with stronger selection for a smaller genome. Mol Biol Evol 2011; 29:849-59. [PMID: 21998273 DOI: 10.1093/molbev/msr254] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
The number of introns varies considerably among different organisms. This can be explained by the differences in the rates of intron gain and loss. Two factors that are likely to influence these rates are selection for or against introns and the mutation rate that generates the novel intron or the intronless copy. Although it has been speculated that stronger selection for a compact genome might result in a higher rate of intron loss and a lower rate of intron gain, clear evidence is lacking, and the role of selection in determining these rates has not been established. Here, we studied the gain and loss of introns in the two closely related species Arabidopsis thaliana and A. lyrata as it was recently shown that A. thaliana has been undergoing a faster genome reduction driven by selection. We found that A. thaliana has lost six times more introns than A. lyrata since the divergence of the two species but gained very few introns. We suggest that stronger selection for genome reduction probably resulted in the much higher intron loss rate in A. thaliana, although further analysis is required as we could not find evidence that the loss rate increased in A. thaliana as opposed to having decreased in A. lyrata compared with the rate in the common ancestor. We also examined the pattern of the intron gains and losses to better understand the mechanisms by which they occur. Microsimilarity was detected between the splice sites of several gained and lost introns, suggesting that nonhomologous end joining repair of double-strand breaks might be a common pathway not only for intron gain but also for intron loss.
Collapse
|
29
|
Karolchik D, Hinrichs AS, Kent WJ. The UCSC Genome Browser. CURRENT PROTOCOLS IN HUMAN GENETICS 2011; Chapter 18:18.6.1-18.6.33. [PMID: 21975940 PMCID: PMC3222792 DOI: 10.1002/0471142905.hg1806s71] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
The University of California Santa Cruz (UCSC) Genome Browser is a popular Web-based tool for quickly displaying a requested portion of a genome at any scale, accompanied by a series of aligned annotation "tracks." The annotations generated by the UCSC Genome Bioinformatics Group and external collaborators include gene predictions, mRNA and expressed sequence tag alignments, simple nucleotide polymorphisms, expression and regulatory data, phenotype and variation data, and pairwise and multiple-species comparative genomics data. All information relevant to a region is presented in one window, facilitating biological analysis and interpretation. The database tables underlying the Genome Browser tracks can be viewed, downloaded, and manipulated using another Web-based application, the UCSC Table Browser. Users can upload personal datasets in a wide variety of formats as custom annotation tracks in both browsers for research or educational purposes. This unit describes how to use the Genome Browser and Table Browser for genome analysis, download the underlying database tables, and create and display custom annotation tracks.
Collapse
Affiliation(s)
- Donna Karolchik
- Center for Biomolecular Science and Engineering, University of California Santa Cruz
| | - Angie S. Hinrichs
- Center for Biomolecular Science and Engineering, University of California Santa Cruz
| | - W. James Kent
- Center for Biomolecular Science and Engineering, University of California Santa Cruz
| |
Collapse
|
30
|
Zhan LL, Ding Z, Qian YH, Zeng QT. Convergent Intron Loss of MRP1 in Drosophila and Mosquito Species. J Hered 2011; 103:147-51. [DOI: 10.1093/jhered/esr095] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
|
31
|
Yoon BJ. Hidden Markov Models and their Applications in Biological Sequence Analysis. Curr Genomics 2011; 10:402-15. [PMID: 20190955 PMCID: PMC2766791 DOI: 10.2174/138920209789177575] [Citation(s) in RCA: 134] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2008] [Revised: 02/28/2009] [Accepted: 03/02/2009] [Indexed: 12/21/2022] Open
Abstract
Hidden Markov models (HMMs) have been extensively used in biological sequence analysis. In this paper, we give a tutorial review of HMMs and their applications in a variety of problems in molecular biology. We especially focus on three types of HMMs: the profile-HMMs, pair-HMMs, and context-sensitive HMMs. We show how these HMMs can be used to solve various sequence analysis problems, such as pairwise and multiple sequence alignments, gene annotation, classification, similarity search, and many others.
Collapse
Affiliation(s)
- Byung-Jun Yoon
- Department of Electrical & Computer Engineering, Texas A&M University, College Station, TX 77843-3128, USA
| |
Collapse
|
32
|
Williams GW, Davis PA, Rogers AS, Bieri T, Ozersky P, Spieth J. Methods and strategies for gene structure curation in WormBase. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2011; 2011:baq039. [PMID: 21543339 PMCID: PMC3092607 DOI: 10.1093/database/baq039] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The Caenorhabditis elegans genome sequence was published over a decade ago; this was the first published genome of a multi-cellular organism and now the WormBase project has had a decade of experience in curating this genome's sequence and gene structures. In one of its roles as a central repository for nematode biology, WormBase continues to refine the gene structure annotations using sequence similarity and other computational methods, as well as information from the literature- and community-submitted annotations. We describe the various methods of gene structure curation that have been tried by WormBase and the problems associated with each of them. We also describe the current strategy for gene structure curation, and introduce the WormBase ‘curation tool’, which integrates different data sources in order to identify new and correct gene structures. Database URL: http://www.wormbase.org/
Collapse
Affiliation(s)
- G W Williams
- WormBase Group, The Wellcome Trust Sanger Institute, Hinxton, Cambs, UK.
| | | | | | | | | | | |
Collapse
|
33
|
Hudek AK, Brown DG. FEAST: sensitive local alignment with multiple rates of evolution. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:698-709. [PMID: 20733242 DOI: 10.1109/tcbb.2010.76] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
We present a pairwise local aligner, FEAST, which uses two new techniques: a sensitive extension algorithm for identifying homologous subsequences, and a descriptive probabilistic alignment model. We also present a new procedure for training alignment parameters and apply it to the human and mouse genomes, producing a better parameter set for these sequences. Our extension algorithm identifies homologous subsequences by considering all evolutionary histories. It has higher maximum sensitivity than Viterbi extensions, and better balances specificity. We model alignments with several submodels, each with unique statistical properties, describing strongly similar and weakly similar regions of homologous DNA. Training parameters using two submodels produces superior alignments, even when we align with only the parameters from the weaker submodel. Our extension algorithm combined with our new parameter set achieves sensitivity 0.59 on synthetic tests. In contrast, LASTZ with default settings achieves sensitivity 0.35 with the same false positive rate. Using the weak submodel as parameters for LASTZ increases its sensitivity to 0.59 with high error. FEAST is available at http://monod.uwaterloo.ca/feast/.
Collapse
Affiliation(s)
- Alexander K Hudek
- David R. Cheriton School of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada.
| | | |
Collapse
|
34
|
Hua Z, Zou C, Shiu SH, Vierstra RD. Phylogenetic comparison of F-Box (FBX) gene superfamily within the plant kingdom reveals divergent evolutionary histories indicative of genomic drift. PLoS One 2011; 6:e16219. [PMID: 21297981 PMCID: PMC3030570 DOI: 10.1371/journal.pone.0016219] [Citation(s) in RCA: 108] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2010] [Accepted: 12/07/2010] [Indexed: 11/18/2022] Open
Abstract
The emergence of multigene families has been hypothesized as a major contributor to the evolution of complex traits and speciation. To help understand how such multigene families arose and diverged during plant evolution, we examined the phylogenetic relationships of F-Box (FBX) genes, one of the largest and most polymorphic superfamilies known in the plant kingdom. FBX proteins comprise the target recognition subunit of SCF-type ubiquitin-protein ligases, where they individually recruit specific substrates for ubiquitylation. Through the extensive analysis of 10,811 FBX loci from 18 plant species, ranging from the alga Chlamydomonas reinhardtii to numerous monocots and eudicots, we discovered strikingly diverse evolutionary histories. The number of FBX loci varies widely and appears independent of the growth habit and life cycle of land plants, with a little as 198 predicted for Carica papaya to as many as 1350 predicted for Arabidopsis lyrata. This number differs substantially even among closely related species, with evidence for extensive gains/losses. Despite this extraordinary inter-species variation, one subset of FBX genes was conserved among most species examined. Together with evidence of strong purifying selection and expression, the ligases synthesized from these conserved loci likely direct essential ubiquitylation events. Another subset was much more lineage specific, showed more relaxed purifying selection, and was enriched in loci with little or no evidence of expression, suggesting that they either control more limited, species-specific processes or arose from genomic drift and thus may provide reservoirs for evolutionary innovation. Numerous FBX loci were also predicted to be pseudogenes with their numbers tightly correlated with the total number of FBX genes in each species. Taken together, it appears that the FBX superfamily has independently undergone substantial birth/death in many plant lineages, with its size and rapid evolution potentially reflecting a central role for ubiquitylation in driving plant fitness.
Collapse
Affiliation(s)
- Zhihua Hua
- Department of Genetics, University of Wisconsin, Madison, Wisconsin, United States of America
| | - Cheng Zou
- Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America
| | - Shin-Han Shiu
- Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America
| | - Richard D. Vierstra
- Department of Genetics, University of Wisconsin, Madison, Wisconsin, United States of America
| |
Collapse
|
35
|
Kiełbasa SM, Wan R, Sato K, Horton P, Frith MC. Adaptive seeds tame genomic sequence comparison. Genome Res 2011; 21:487-93. [PMID: 21209072 DOI: 10.1101/gr.113985.110] [Citation(s) in RCA: 867] [Impact Index Per Article: 66.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition.
Collapse
Affiliation(s)
- Szymon M Kiełbasa
- Department of Computational Biology, Max Planck Institute for Molecular Genetics, Berlin D-14195, Germany
| | | | | | | | | |
Collapse
|
36
|
DNA double-strand break repair and the evolution of intron density. Trends Genet 2010; 27:1-6. [PMID: 21106271 PMCID: PMC3020277 DOI: 10.1016/j.tig.2010.10.004] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2010] [Revised: 10/18/2010] [Accepted: 10/18/2010] [Indexed: 01/23/2023]
Abstract
The density of introns is both an important feature of genome architecture and a highly variable trait across eukaryotes. This heterogeneity has posed an evolutionary puzzle for the last 30 years. Recent evidence is consistent with novel introns being the outcome of the error-prone repair of DNA double-stranded breaks (DSBs) via non-homologous end joining (NHEJ). Here we suggest that deletion of pre-existing introns could occur via the same pathway. We propose a novel framework in which species-specific differences in the activity of NHEJ and homologous recombination (HR) during the repair of DSBs underlie changes in intron density.
Collapse
|
37
|
Vergara IA, Chen N. Large synteny blocks revealed between Caenorhabditis elegans and Caenorhabditis briggsae genomes using OrthoCluster. BMC Genomics 2010; 11:516. [PMID: 20868500 PMCID: PMC2997010 DOI: 10.1186/1471-2164-11-516] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2009] [Accepted: 09/24/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate identification of synteny blocks is an important step in comparative genomics towards the understanding of genome architecture and expression. Most computer programs developed in the last decade for identifying synteny blocks have limitations. To address these limitations, we recently developed a robust program called OrthoCluster, and an online database OrthoClusterDB. In this work, we have demonstrated the application of OrthoCluster in identifying synteny blocks between the genomes of Caenorhabditis elegans and Caenorhabditis briggsae, two closely related hermaphrodite nematodes. RESULTS Initial identification and analysis of synteny blocks using OrthoCluster enabled us to systematically improve the genome annotation of C. elegans and C. briggsae, identifying 52 potential novel genes in C. elegans, 582 in C. briggsae, and 949 novel orthologous relationships between these two species. Using the improved annotation, we have detected 3,058 perfect synteny blocks that contain no mismatches between C. elegans and C. briggsae. Among these synteny blocks, the majority are mapped to homologous chromosomes, as previously reported. The largest perfect synteny block contains 42 genes, which spans 201.2 kb in Chromosome V of C. elegans. On average, perfect synteny blocks span 18.8 kb in length. When some mismatches (interruptions) are allowed, synteny blocks ("imperfect synteny blocks") that are much larger in size are identified. We have shown that the majority (80%) of the C. elegans and C. briggsae genomes are covered by imperfect synteny blocks. The largest imperfect synteny block spans 6.14 Mb in Chromosome X of C. elegans and there are 11 synteny blocks that are larger than 1 Mb in size. On average, imperfect synteny blocks span 63.6 kb in length, larger than previously reported. CONCLUSIONS We have demonstrated that OrthoCluster can be used to accurately identify synteny blocks and have found that synteny blocks between C. elegans and C. briggsae are almost three-folds larger than previously identified.
Collapse
Affiliation(s)
- Ismael A Vergara
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, Canada
| | | |
Collapse
|
38
|
Wang F, Huang S, Ma L. Caenorhabditis elegans operons contain a higher proportion of genes with multiple transcripts and use 3' splice sites differentially. PLoS One 2010; 5:e12456. [PMID: 20805997 PMCID: PMC2929210 DOI: 10.1371/journal.pone.0012456] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2010] [Accepted: 08/05/2010] [Indexed: 01/23/2023] Open
Abstract
RNA splicing generates multiple transcript isoforms from a single gene and enhances the complexity of eukaryotic gene expression. In some eukaryotes, operon exists as an ancient regulatory mechanism of gene expression that requires strict positional and regulatory relationships among its genes. It remains unknown whether operonic genes generate transcript isoforms in a similar manner as non-operonic genes do, the expression of which is less likely limited by their positions and relationships with surrounding genes. We analyzed the number of transcript isoforms of Caenorhabditis elegans operonic genes and found that C. elegans operons contain a much higher proportion of genes with multiple transcript isoforms than non-operonic genes do. For genes that express multiple transcript isoforms, there is no apparent difference between the number of isoforms in operonic and non-operonic genes. C. elegans operonic genes also have a different preference of the 20 most common 3′ splice sites compared to non-operonic genes. Our analyses suggest that C. elegans operons enhance expression complexity by increasing the proportion of genes that express multiple transcript isoforms and maintain splicing efficiency by differential use of common 3′ splice sites.
Collapse
Affiliation(s)
- Fei Wang
- State Key Laboratory of Medical Genetics, Central South University, Changsha, China
| | - Shi Huang
- State Key Laboratory of Medical Genetics, Central South University, Changsha, China
| | - Long Ma
- State Key Laboratory of Medical Genetics, Central South University, Changsha, China
- * E-mail:
| |
Collapse
|
39
|
Lasda EL, Allen MA, Blumenthal T. Polycistronic pre-mRNA processing in vitro: snRNP and pre-mRNA role reversal in trans-splicing. Genes Dev 2010; 24:1645-58. [PMID: 20624853 DOI: 10.1101/gad.1940010] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Spliced leader (SL) trans-splicing in Caenorhabditis elegans attaches a 22-nucleotide (nt) exon onto the 5' end of many mRNAs. A particular class of SL, SL2, splices mRNAs of downstream operon genes. Here we use an embryonic extract-based in vitro splicing system to show that SL2 specificity information is encoded within the polycistronic pre-mRNA, and that trans-splicing specificity is recapitulated in vitro. We define an RNA sequence required for SL2 trans-splicing, the U-rich (Ur) element, through mutational analysis and bioinformatics as a short stem-loop followed by a sequence motif, UAYYUU, located approximately 50 nt upstream of the trans-splice site. Furthermore, this element is predicted in intercistronic regions of numerous operons of C. elegans and other species that use SL2 trans-splicing. We propose that the UAYYUU motif hybridizes with the 5' splice site on the SL2 RNA to recruit the SL to the pre-mRNA. In this way, the UAYYUU motif in the pre-mRNA would serve an analogous function to the similar sequence in the U1 snRNA, which binds to the 5' splice site of introns, effectively reversing the roles of snRNP and pre-mRNA in trans-splicing.
Collapse
Affiliation(s)
- Erika L Lasda
- Department of Biochemistry and Molecular Genetics, University of Colorado Denver, Anschutz Medical Campus, Aurora, Colorado 80045, USA
| | | | | |
Collapse
|
40
|
Mahmood K, Konagurthu AS, Song J, Buckle AM, Webb GI, Whisstock JC. EGM: encapsulated gene-by-gene matching to identify gene orthologs and homologous segments in genomes. Bioinformatics 2010; 26:2076-84. [DOI: 10.1093/bioinformatics/btq339] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
|
41
|
Nakato R, Gotoh O. Cgaln: fast and space-efficient whole-genome alignment. BMC Bioinformatics 2010; 11:224. [PMID: 20433723 PMCID: PMC2873541 DOI: 10.1186/1471-2105-11-224] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2010] [Accepted: 04/30/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Whole-genome sequence alignment is an essential process for extracting valuable information about the functions, evolution, and peculiarities of genomes under investigation. As available genomic sequence data accumulate rapidly, there is great demand for tools that can compare whole-genome sequences within practical amounts of time and space. However, most existing genomic alignment tools can treat sequences that are only a few Mb long at once, and no state-of-the-art alignment program can align large sequences such as mammalian genomes directly on a conventional standalone computer. RESULTS We previously proposed the CGAT (Coarse-Grained AlignmenT) algorithm, which performs an alignment job in two steps: first at the block level and then at the nucleotide level. The former is "coarse-grained" alignment that can explore genomic rearrangements and reduce the sizes of the regions to be analyzed in the next step. The latter is detailed alignment within limited regions. In this paper, we present an update of the algorithm and the open-source program, Cgaln, that implements the algorithm. We compared the performance of Cgaln with those of other programs on whole genomic sequences of several bacteria and of some mammalian chromosome pairs. The results showed that Cgaln is several times faster and more memory-efficient than the best existing programs, while its sensitivity and accuracy are comparable to those of the best programs. Cgaln takes less than 13 hours to finish an alignment between the whole genomes of human and mouse in a single run on a conventional desktop computer with a single CPU and 2 GB memory. CONCLUSIONS Cgaln is not only fast and memory efficient but also effective in coping with genomic rearrangements. Our results show that Cgaln is very effective for comparison of large genomes, especially of intact chromosomal sequences. We believe that Cgaln provides novel viewpoint for reducing computational complexity and will contribute to various fields of genome science.
Collapse
Affiliation(s)
- Ryuichiro Nakato
- Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto-shi, Kyoto 606-8501, Japan
| | | |
Collapse
|
42
|
Frith MC, Hamada M, Horton P. Parameters for accurate genome alignment. BMC Bioinformatics 2010; 11:80. [PMID: 20144198 PMCID: PMC2829014 DOI: 10.1186/1471-2105-11-80] [Citation(s) in RCA: 136] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2009] [Accepted: 02/09/2010] [Indexed: 11/25/2022] Open
Abstract
Background Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Moreover, rigorous procedures to control the rate of spurious alignment have not been employed. Results We have assessed 495 combinations of score parameters for alignment of animal, plant, and fungal genomes. As our gold-standard of accuracy, we used genome alignments implied by multiple alignments of proteins and of structural RNAs. We found the HOXD scoring schemes underlying alignments in the UCSC genome database to be far from optimal, and suggest better parameters. Higher values of the X-drop parameter are not always better. E-values accurately indicate the rate of spurious alignment, but only if tandem repeats are masked in a non-standard way. Finally, we show that γ-centroid (probabilistic) alignment can find highly reliable subsets of aligned bases. Conclusions These results enable more accurate genome alignment, with reliability measures for local alignments and for individual aligned bases. This study was made possible by our new software, LAST, which can align vertebrate genomes in a few hours http://last.cbrc.jp/.
Collapse
Affiliation(s)
- Martin C Frith
- Computational Biology Research Center, Institute for Advanced Industrial Science and Technology, Tokyo 135-0064, Japan.
| | | | | |
Collapse
|
43
|
Abstract
The University of California Santa Cruz (UCSC) Genome Browser is a popular Web-based tool for quickly displaying a requested portion of a genome at any scale, accompanied by a series of aligned annotation "tracks." The annotations-generated by the UCSC Genome Bioinformatics Group and external collaborators-display gene predictions, mRNA and expressed sequence tag alignments, simple nucleotide polymorphisms, expression and regulatory data, phenotype and variation data, and pairwise and multiple-species comparative genomics data. All information relevant to a region is presented in one window, facilitating biological analysis and interpretation. The database tables underlying the Genome Browser tracks can be viewed, downloaded, and manipulated using another Web-based application, the UCSC Table Browser. Users can upload data as custom annotation tracks in both browsers for research or educational use. This unit describes how to use the Genome Browser and Table Browser for genome analysis, download the underlying database tables, and create and display custom annotation tracks.
Collapse
Affiliation(s)
- Donna Karolchik
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, Phone: (831) 459-1571, Fax: (831) 459-1809
| | - Angie S. Hinrichs
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, Phone: (831) 459-1544, Fax: (831) 459-1809
| | - W. James Kent
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, Phone: (831) 459-1401, Fax: (831) 459-1809
| |
Collapse
|
44
|
Ma L, Horvitz HR. Mutations in the Caenorhabditis elegans U2AF large subunit UAF-1 alter the choice of a 3' splice site in vivo. PLoS Genet 2009; 5:e1000708. [PMID: 19893607 PMCID: PMC2762039 DOI: 10.1371/journal.pgen.1000708] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2009] [Accepted: 10/05/2009] [Indexed: 11/18/2022] Open
Abstract
The removal of introns from eukaryotic RNA transcripts requires the activities of five multi-component ribonucleoprotein complexes and numerous associated proteins. The lack of mutations affecting splicing factors essential for animal survival has limited the study of the in vivo regulation of splicing. From a screen for suppressors of the Caenorhabditis elegans unc-93(e1500) rubberband Unc phenotype, we identified mutations in genes that encode the C. elegans orthologs of two splicing factors, the U2AF large subunit (UAF-1) and SF1/BBP (SFA-1). The uaf-1(n4588) mutation resulted in temperature-sensitive lethality and caused the unc-93 RNA transcript to be spliced using a cryptic 3′ splice site generated by the unc-93(e1500) missense mutation. The sfa-1(n4562) mutation did not cause the utilization of this cryptic 3′ splice site. We isolated four uaf-1(n4588) intragenic suppressors that restored the viability of uaf-1 mutants at 25°C. These suppressors differentially affected the recognition of the cryptic 3′ splice site and implicated a small region of UAF-1 between the U2AF small subunit-interaction domain and the first RNA recognition motif in affecting the choice of 3′ splice site. We constructed a reporter for unc-93 splicing and using site-directed mutagenesis found that the position of the cryptic splice site affects its recognition. We also identified nucleotides of the endogenous 3′ splice site important for recognition by wild-type UAF-1. Our genetic and molecular analyses suggested that the phenotypic suppression of the unc-93(e1500) Unc phenotype by uaf-1(n4588) and sfa-1(n4562) was likely caused by altered splicing of an unknown gene. Our observations provide in vivo evidence that UAF-1 can act in regulating 3′ splice-site choice and establish a system that can be used to investigate the in vivo regulation of RNA splicing in C. elegans. Eukaryotic genes contain intervening intronic sequences that must be removed from pre-mRNA transcripts by RNA splicing to generate functional messenger RNAs. While studying genes that encode and control a presumptive muscle potassium channel complex in the nematode Caenorhabditis elegans, we found that mutations in two splicing factors, the U2AF large subunit and SF1/BBP suppress the rubberband Unc phenotype caused by a rare missense mutation in the gene unc-93. Mutations affecting the U2AF large subunit caused the recognition of a cryptic 3′ splice site generated by the unc-93 mutation, providing in vivo evidence that the U2AF large subunit can affect splice-site selection. By contrast, an SF1/BBP mutation that suppressed the rubberband Unc phenotype did not cause splicing using this cryptic 3′ splice site. Our genetic studies identified a region of the U2AF large subunit important for its effect on 3′ splice-site choice. Our mutagenesis analysis of in vivo transgene splicing identified a positional effect on weak 3′ splice site selection and nucleotides of the endogenous 3′ splice site important for recognition. The system we have defined should facilitate future in vivo analyses of pre–mRNA splicing.
Collapse
Affiliation(s)
- Long Ma
- Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - H. Robert Horvitz
- Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
45
|
Catania F, Gao X, Scofield DG. Endogenous mechanisms for the origins of spliceosomal introns. J Hered 2009; 100:591-6. [PMID: 19635762 DOI: 10.1093/jhered/esp062] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Over 30 years since their discovery, the origin of spliceosomal introns remains uncertain. One nearly universally accepted hypothesis maintains that spliceosomal introns originated from self-splicing group-II introns that invaded the uninterrupted genes of the last eukaryotic common ancestor (LECA) and proliferated by "insertion" events. Although this is a possible explanation for the original presence of introns and splicing machinery, the emphasis on a high number of insertion events in the genome of the LECA neglects a considerable body of empirical evidence showing that spliceosomal introns can simply arise from coding or, more generally, nonintronic sequences within genes. After presenting a concise overview of some of the most common hypotheses and mechanisms for intron origin, we propose two further hypotheses that are broadly based on central cellular processes: 1) internal gene duplication and 2) the response to aberrant and fortuitously spliced transcripts. These two nonmutually exclusive hypotheses provide a powerful way to explain the establishment of spliceosomal introns in eukaryotes without invoking an exogenous source.
Collapse
Affiliation(s)
- Francesco Catania
- Department of Biology, Indiana University, Bloomington, IN 47405, USA.
| | | | | |
Collapse
|
46
|
Haque W, Aravind A, Reddy B. An efficient algorithm for local sequence alignment. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2009; 2008:1367-72. [PMID: 19162922 DOI: 10.1109/iembs.2008.4649419] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
DNA pairwise sequence alignment has been a subject of great interest in the past and still evokes large interest. Recent algorithms have either been slow and sensitive or fast and less sensitive. Here, we present a new algorithm which is fast and at the same time relatively sensitive. To increase the speed, we first build a suffix tree for both sequences and the alignment is triggered by the maximum matching substring. The algorithm employs mismatch seeds to increase both sensitivity and speed in the later stages. We tested our algorithm on randomly generated sequences of length up to 500 thousand and used Rosetta dataset to test the sensitivity of the algorithm.
Collapse
Affiliation(s)
- Waqar Haque
- Computer Science Program, University of Northern British Columbia, Canada, V2N 4Z9
| | | | | |
Collapse
|
47
|
Cutter AD, Dey A, Murray RL. Evolution of the Caenorhabditis elegans genome. Mol Biol Evol 2009; 26:1199-234. [PMID: 19289596 DOI: 10.1093/molbev/msp048] [Citation(s) in RCA: 85] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
A fundamental problem in genome biology is to elucidate the evolutionary forces responsible for generating nonrandom patterns of genome organization. As the first metazoan to benefit from full-genome sequencing, Caenorhabditis elegans has been at the forefront of research in this area. Studies of genomic patterns, and their evolutionary underpinnings, continue to be augmented by the recent push to obtain additional full-genome sequences of related Caenorhabditis taxa. In the near future, we expect to see major advances with the onset of whole-genome resequencing of multiple wild individuals of the same species. In this review, we synthesize many of the important insights to date in our understanding of genome organization and function that derive from the evolutionary principles made explicit by theoretical population genetics and molecular evolution and highlight fertile areas for future research on unanswered questions in C. elegans genome evolution. We call attention to the need for C. elegans researchers to generate and critically assess nonadaptive hypotheses for genomic and developmental patterns, in addition to adaptive scenarios. We also emphasize the potential importance of evolution in the gonochoristic (female and male) ancestors of the androdioecious (hermaphrodite and male) C. elegans as the source for many of its genomic and developmental patterns.
Collapse
Affiliation(s)
- Asher D Cutter
- Department of Ecology & Evolutionary Biology and the Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario, Canada.
| | | | | |
Collapse
|
48
|
Husson SJ, Landuyt B, Nys T, Baggerman G, Boonen K, Clynen E, Lindemans M, Janssen T, Schoofs L. Comparative peptidomics of Caenorhabditis elegans versus C. briggsae by LC-MALDI-TOF MS. Peptides 2009; 30:449-57. [PMID: 18760316 DOI: 10.1016/j.peptides.2008.07.021] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/29/2008] [Revised: 07/30/2008] [Accepted: 07/30/2008] [Indexed: 11/21/2022]
Abstract
Neuropeptides are important signaling molecules that function in cell-cell communication as neurotransmitters or hormones to orchestrate a wide variety of physiological conditions and behaviors. These endogenous peptides can be monitored by high throughput peptidomics technologies from virtually any tissue or organism. The neuropeptide complement of the soil nematode Caenorhabditis elegans has been characterized by on-line two-dimensional liquid chromatography and quadrupole time-of-flight tandem mass spectrometry (2D-nanoLC Q-TOF MS/MS). Here, we use an alternative peptidomics approach combining liquid chromatography (LC) with matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry to map the peptide content of C. elegans and another Caenorhabditis species, Caenorhabditis briggsae. This study allows a better annotation of neuropeptide-encoding genes from the C. briggsae genome and provides a promising basis for further evolutionary comparisons.
Collapse
Affiliation(s)
- Steven J Husson
- Functional Genomics and Proteomics Unit, Department of Biology, K.U.Leuven, Naamsestraat 59, B-3000 Leuven, Belgium.
| | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Shi B, Guo X, Wu T, Sheng S, Wang J, Skogerbø G, Zhu X, Chen R. Genome-scale identification of Caenorhabditis elegans regulatory elements by tiling-array mapping of DNase I hypersensitive sites. BMC Genomics 2009; 10:92. [PMID: 19243610 PMCID: PMC2651899 DOI: 10.1186/1471-2164-10-92] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2008] [Accepted: 02/25/2009] [Indexed: 11/24/2022] Open
Abstract
Background A major goal of post-genomics research is the integrated analysis of genes, regulatory elements and the chromatin architecture on a genome-wide scale. Mapping DNase I hypersensitive sites within the nuclear chromatin is a powerful and well-established method of identifying regulatory element candidates. Results Here, we report the first genome-wide analysis of DNase I hypersensitive sites (DHSs) in Caenorhabditis elegans. The data was obtained by hybridizing DNase I-treated and end-captured material from young adult worms to a high-resolution tiling microarray. The data show that C. elegans DHSs were significantly enriched within intergenic regions located 2 kb upstream and downstream of coding genes, and also that a considerable fraction of all DHSs mapped to intergenic positions distant to annotated coding genes. Annotated transcribed loci were generally depleted in DHSs relative to intergenic regions, but DHSs were nonetheless enriched in coding exons and UTRs, whereas introns were significantly depleted in DHSs. Many DHSs appeared to be associated with annotated non-coding RNAs and recently detected transcripts of unknown function. It has been reported that nematode highly conserved non-coding elements were associated with cis-regulatory elements, and we also found that DHSs, particularly distal intergenic DHSs, were significantly enriched in regions that were conserved between the C. elegans and C. briggsae genomes. Conclusion We describe the first genome-wide analysis of C. elegans DHSs, and show that the distribution of DHSs is strongly associated with functional elements in the genome.
Collapse
Affiliation(s)
- Baochen Shi
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, PR China.
| | | | | | | | | | | | | | | |
Collapse
|
50
|
Sleumer MC, Bilenky M, He A, Robertson G, Thiessen N, Jones SJM. Caenorhabditis elegans cisRED: a catalogue of conserved genomic elements. Nucleic Acids Res 2009; 37:1323-34. [PMID: 19151087 PMCID: PMC2651782 DOI: 10.1093/nar/gkn1041] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
The availability of completely sequenced genomes from eight species of nematodes has provided an opportunity to identify novel cis-regulatory elements in the promoter regions of Caenorhabditis elegans transcripts using comparative genomics. We determined orthologues for C. elegans transcripts in C. briggsae, C. remanei, C. brenneri, C. japonica, Pristionchus pacificus, Brugia malayi and Trichinella spiralis using the WABA alignment algorithm. We pooled the upstream region of each transcript in C. elegans with the upstream regions of its orthologues and identified conserved DNA sequence elements by de novo motif discovery. In total, we discovered 158 017 novel conserved motifs upstream of 3847 C. elegans transcripts for which three or more orthologues were available, and identified 82% of 44 experimentally proven regulatory elements from ORegAnno. We annotated 26% of the motifs as similar to known binding sequences of transcription factors from ORegAnno, TRANSFAC and JASPAR. This is the first catalogue of annotated conserved upstream elements for nematodes and can be used to find putative regulatory elements, improve gene models, discover novel RNA genes, and understand the evolution of transcription factors and their binding sites in phylum Nematoda. The annotated motifs provide novel binding site candidates for both characterized transcription factors and orthologues of characterized mammalian transcription factors.
Collapse
Affiliation(s)
- Monica C Sleumer
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, Canada
| | | | | | | | | | | |
Collapse
|