1
|
Wang XY, Xu YM, Lau ATY. Proteogenomics in Cancer: Then and Now. J Proteome Res 2023; 22:3103-3122. [PMID: 37725793 DOI: 10.1021/acs.jproteome.3c00196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/21/2023]
Abstract
For years, the paths of sequencing technologies and mass spectrometry have occurred in isolation, with each developing its own unique culture and expertise. These two technologies are crucial for inspecting complementary aspects of the molecular phenotype across the central dogma. Integrative multiomics strives to bridge the analysis gap among different fields to complete more comprehensive mechanisms of life events and diseases. Proteogenomics is one integrated multiomics field. Here in this review, we mainly summarize and discuss three aspects: workflow of proteogenomics, proteogenomics applications in cancer research, and the SWOT (Strengths, Weaknesses, Opportunities, Threats) analysis of proteogenomics in cancer research. In conclusion, proteogenomics has a promising future as it clarifies the functional consequences of many unannotated genomic abnormalities or noncanonical variants and identifies driver genes and novel therapeutic targets across cancers, which would substantially accelerate the development of precision oncology.
Collapse
Affiliation(s)
- Xiu-Yun Wang
- Laboratory of Cancer Biology and Epigenetics, Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, Guangdong 515041, People's Republic of China
| | - Yan-Ming Xu
- Laboratory of Cancer Biology and Epigenetics, Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, Guangdong 515041, People's Republic of China
| | - Andy T Y Lau
- Laboratory of Cancer Biology and Epigenetics, Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, Guangdong 515041, People's Republic of China
| |
Collapse
|
2
|
Tay AP, Hamey JJ, Martyn GE, Wilson LOW, Wilkins MR. Identification of Protein Isoforms Using Reference Databases Built from Long and Short Read RNA-Sequencing. J Proteome Res 2022; 21:1628-1639. [PMID: 35612954 DOI: 10.1021/acs.jproteome.1c00968] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Alternative splicing can lead to distinct protein isoforms. These can have different functions in specific cells and tissues or in different developmental stages. In this study, we explored whether transcripts assembled from long read, nanopore-based, direct RNA-sequencing (RNA-seq) could improve the identification of protein isoforms in human K562 cells. By comparing with Illumina-based short read RNA-seq, we showed that a large proportion of Ensembl transcripts (5949/14,326) and genes expressing alternatively spliced transcripts (486/2981) identified with long direct reads were missed by short paired-end reads. By co-analyzing proteomic and transcriptomic data, we also showed that some peptides (826/35,976), proteins (262/3215), and protein isoforms arising from distinct transcript variants (574/1212) identified with isoform-specific peptides via custom long-read-based databases were missed in Illumina-derived databases. Finally, we generated unequivocal peptide evidence for a set of protein isoforms and showed that long read, direct RNA-seq allows the discovery of novel protein isoforms not already in reference databases or custom databases built from short read RNA-seq data. Our analysis highlights the benefits of long read RNA-seq data in the generation of reference databases to increase tandem mass spectrometry (MS/MS) identification of protein isoforms.
Collapse
Affiliation(s)
- Aidan P Tay
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia.,Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Sydney, New South Wales 2113, Australia.,Applied Biosciences, Macquarie University, Sydney, New South Wales 2109, Australia
| | - Joshua J Hamey
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Gabriella E Martyn
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Laurence O W Wilson
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Sydney, New South Wales 2113, Australia.,Applied Biosciences, Macquarie University, Sydney, New South Wales 2109, Australia
| | - Marc R Wilkins
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia
| |
Collapse
|
3
|
McAninch DS, Heinaman AM, Lang CN, Moss KR, Bassell GJ, Rita Mihailescu M, Evans TL. Fragile X mental retardation protein recognizes a G quadruplex structure within the survival motor neuron domain containing 1 mRNA 5'-UTR. MOLECULAR BIOSYSTEMS 2017; 13:1448-1457. [PMID: 28612854 PMCID: PMC5544254 DOI: 10.1039/c7mb00070g] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
G quadruplex structures have been predicted by bioinformatics to form in the 5'- and 3'-untranslated regions (UTRs) of several thousand mature mRNAs and are believed to play a role in translation regulation. Elucidation of these roles has primarily been focused on the 3'-UTR, with limited focus on characterizing the G quadruplex structures and functions in the 5'-UTR. Investigation of the affinity and specificity of RNA binding proteins for 5'-UTR G quadruplexes and the resulting regulatory effects have also been limited. Among the mRNAs predicted to form a G quadruplex structure within the 5'-UTR is the survival motor neuron domain containing 1 (SMNDC1) mRNA, encoding a protein that is critical to the spliceosome. Additionally, this mRNA has been identified as a potential target of the fragile X mental retardation protein (FMRP), whose loss of expression leads to fragile X syndrome. FMRP is an RNA binding protein involved in translation regulation that has been shown to bind mRNA targets that form G quadruplex structures. In this study we have used biophysical methods to investigate G quadruplex formation in the 5'-UTR of SMNDC1 mRNA and analyzed its interactions with FMRP. Our results show that SMNDC1 mRNA 5'-UTR forms an intramolecular, parallel G quadruplex structure comprised of three G quartet planes, which is bound specifically by FMRP both in vitro and in mouse brain lysates. These findings suggest a model by which FMRP might regulate the translation of a subset of its mRNA targets by recognizing the G quadruplex structure present in their 5'-UTR, and affecting their accessibility by the protein synthesis machinery.
Collapse
Affiliation(s)
- Damian S McAninch
- Department of Chemistry and Biochemistry, Duquesne University, 600 Forbes Avenue, Pittsburgh, Pennsylvania 15282, USA.
| | - Ashley M Heinaman
- Department of Chemistry, University of Pittsburgh at Johnstown, Johnstown, Pennsylvania 15904, USA
| | - Cara N Lang
- Department of Chemistry, University of Pittsburgh at Johnstown, Johnstown, Pennsylvania 15904, USA
| | - Kathryn R Moss
- Department of Cell Biology, Emory University School of Medicine, Atlanta, Georgia 30322, USA
| | - Gary J Bassell
- Department of Cell Biology, Emory University School of Medicine, Atlanta, Georgia 30322, USA
| | - Mihaela Rita Mihailescu
- Department of Chemistry and Biochemistry, Duquesne University, 600 Forbes Avenue, Pittsburgh, Pennsylvania 15282, USA.
| | - Timothy L Evans
- Department of Chemistry and Biochemistry, Duquesne University, 600 Forbes Avenue, Pittsburgh, Pennsylvania 15282, USA. and Department of Chemistry, University of Pittsburgh at Johnstown, Johnstown, Pennsylvania 15904, USA
| |
Collapse
|
4
|
Wang X, Zhang B. Integrating genomic, transcriptomic, and interactome data to improve Peptide and protein identification in shotgun proteomics. J Proteome Res 2014; 13:2715-23. [PMID: 24792918 PMCID: PMC4059263 DOI: 10.1021/pr500194t] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
![]()
Mass spectrometry (MS)-based shotgun
proteomics is an effective
technology for global proteome profiling. The ultimate goal is to
assign tandem MS spectra to peptides and subsequently infer proteins
and their abundance. In addition to database searching and protein
assembly algorithms, computational approaches have been developed
to integrate genomic, transcriptomic, and interactome information
to improve peptide and protein identification. Earlier efforts focus
primarily on making databases more comprehensive using publicly available
genomic and transcriptomic data. More recently, with the increasing
affordability of the Next Generation Sequencing (NGS) technologies,
personalized protein databases derived from sample-specific genomic
and transcriptomic data have emerged as an attractive strategy. In
addition, incorporating interactome data not only improves protein
identification but also puts identified proteins into their functional
context and thus facilitates data interpretation. In this paper, we
survey the major integrative bioinformatics approaches that have been
developed during the past decade and discuss their merits and demerits.
Collapse
Affiliation(s)
- Xiaojing Wang
- Department of Biomedical Informatics, ‡Vanderbilt-Ingram Cancer Center, and §Department of Cancer Biology, Vanderbilt University School of Medicine , Nashville, Tennessee 37232, United States
| | | |
Collapse
|
5
|
Zhu Y, Hultin-Rosenberg L, Forshed J, Branca RMM, Orre LM, Lehtiö J. SpliceVista, a tool for splice variant identification and visualization in shotgun proteomics data. Mol Cell Proteomics 2014; 13:1552-62. [PMID: 24692640 DOI: 10.1074/mcp.m113.031203] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Alternative splicing is a pervasive process in eukaryotic organisms. More than 90% of human genes have alternatively spliced products, and aberrant splicing has been shown to be associated with many diseases. Current methods employed in the detection of splice variants include prediction by clustering of expressed sequence tags, exon microarray, and mRNA sequencing, all methods focusing on RNA-level information. There is a lack of tools for analyzing splice variants at the protein level. Here, we present SpliceVista, a tool for splice variant identification and visualization based on mass spectrometry proteomics data. SpliceVista retrieves gene structure and translated sequences from alternative splicing databases and maps MS-identified peptides to splice variants. The visualization module plots the exon composition of each splice variant and aligns identified peptides with transcript positions. If quantitative mass spectrometry data are used, SpliceVista plots the quantitative patterns for each peptide and provides users with the option to cluster peptides based on their quantitative patterns. SpliceVista can identify splice-variant-specific peptides, providing the possibility for variant-specific analysis. The tool was tested on two experimental datasets (PXD000065 and PXD000134). In A431 cells treated with gefitinib, 2983 splice-variant-specific peptides corresponding to 939 splice variants were identified. Through comparison of splice-variant-centric, protein-centric, and gene-centric quantification, several genes (e.g. EIF4H) were found to have differentially regulated splice variants after gefitinib treatment. The same discrepancy between protein-centric and splice-centric quantification was detected in the other dataset, in which induced pluripotent stem cells were compared with parental fibroblast and human embryotic stem cells. In addition, SpliceVista can be used to visualize novel splice variants inferred from peptide-level evidence. In summary, SpliceVista enables visualization, detection, and differential quantification of protein splice variants that are often missed in current proteomics pipelines.
Collapse
Affiliation(s)
- Yafeng Zhu
- From the ‡Cancer Proteomics Mass Spectrometry, Science for Life Laboratory, Karolinska Institutet, 171 65 Stockholm, Sweden
| | - Lina Hultin-Rosenberg
- From the ‡Cancer Proteomics Mass Spectrometry, Science for Life Laboratory, Karolinska Institutet, 171 65 Stockholm, Sweden
| | - Jenny Forshed
- From the ‡Cancer Proteomics Mass Spectrometry, Science for Life Laboratory, Karolinska Institutet, 171 65 Stockholm, Sweden
| | - Rui M M Branca
- From the ‡Cancer Proteomics Mass Spectrometry, Science for Life Laboratory, Karolinska Institutet, 171 65 Stockholm, Sweden
| | - Lukas M Orre
- From the ‡Cancer Proteomics Mass Spectrometry, Science for Life Laboratory, Karolinska Institutet, 171 65 Stockholm, Sweden
| | - Janne Lehtiö
- From the ‡Cancer Proteomics Mass Spectrometry, Science for Life Laboratory, Karolinska Institutet, 171 65 Stockholm, Sweden
| |
Collapse
|
6
|
Genomics of alternative splicing: evolution, development and pathophysiology. Hum Genet 2014; 133:679-87. [PMID: 24378600 DOI: 10.1007/s00439-013-1411-3] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2013] [Accepted: 12/15/2013] [Indexed: 12/11/2022]
Abstract
Alternative splicing is a major cellular mechanism in metazoans for generating proteomic diversity. A large proportion of protein-coding genes in multicellular organisms undergo alternative splicing, and in humans, it has been estimated that nearly 90 % of protein-coding genes-much larger than expected-are subject to alternative splicing. Genomic analyses of alternative splicing have illuminated its universal role in shaping the evolution of genomes, in the control of developmental processes, and in the dynamic regulation of the transcriptome to influence phenotype. Disruption of the splicing machinery has been found to drive pathophysiology, and indeed reprogramming of aberrant splicing can provide novel approaches to the development of molecular therapy. This review focuses on the recent progress in our understanding of alternative splicing brought about by the unprecedented explosive growth of genomic data and highlights the relevance of human splicing variation on disease and therapy.
Collapse
|
7
|
Jeong J, Park J, Lee DY, Kim J. C-terminal truncation of a bovine B(12) trafficking chaperone enhances the sensitivity of the glutathione-regulated thermostability. BMB Rep 2013; 46:169-74. [PMID: 23527861 PMCID: PMC4133868 DOI: 10.5483/bmbrep.2013.46.3.158] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The human B12 trafficking chaperone hCblC is well conserved in mammals and non-mammalian eukaryotes. However, the C-terminal ∼40 amino acids of hCblC vary significantly and are predicted to be deleted by alternative splicing of the encoding gene. In this study, we examined the thermostability of the bovine CblC truncated at the C-terminal variable region (t-bCblC) and its regulation by glutathione. t-bCblC is highly thermolabile (Tm = ∼42℃) similar to the full-length protein (f-bCblC). However, t-bCblC is stabilized to a greater extent than f-bCblC by binding of reduced glutathione (GSH) with increased sensitivity to GSH. In addition, binding of oxidized glutathione (GSSG) destabilizes t-bCblC to a greater extent and with increased sensitivity as compared to f-bCblC. These results indicate that t-bCblC is a more sensitive form to be regulated by glutathione than the full-length form of the protein. [BMB Reports 2013; 46(3): 169-174]
Collapse
Affiliation(s)
- Jinju Jeong
- School of Biotechnology, Yeungnam University, Gyeongsan 712-749, Korea
| | | | | | | |
Collapse
|
8
|
Omenn GS, Menon R, Zhang Y. Innovations in proteomic profiling of cancers: alternative splice variants as a new class of cancer biomarker candidates and bridging of proteomics with structural biology. J Proteomics 2013; 90:28-37. [PMID: 23603631 DOI: 10.1016/j.jprot.2013.04.007] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2013] [Revised: 04/05/2013] [Accepted: 04/07/2013] [Indexed: 01/05/2023]
Abstract
Alternative splicing allows a single gene to generate multiple RNA transcripts which can be translated into functionally diverse protein isoforms. Current knowledge of splicing is derived mainly from RNA transcripts, with very little known about the expression level, 3D structures, and functional differences of the proteins. Splicing is a remarkable phenomenon of molecular and biological evolution. Studies which simply report up-regulation or down-regulation of protein or mRNA expression are confounded by the effects of mixtures of these isoforms. Besides understanding the net biological effects of the mixtures, we may be able to develop biomarker tests based on the observable differential expression of particular splice variants or combinations of splice variants in specific disease states. Here we review our work on differential expression of splice variant proteins in cancers and the feasibility of integrating proteomic analysis with structure-based conformational predictions of the differences between such isoforms.
Collapse
Affiliation(s)
- Gilbert S Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA.
| | | | | |
Collapse
|
9
|
Koutmos M, Gherasim C, Smith JL, Banerjee R. Structural basis of multifunctionality in a vitamin B12-processing enzyme. J Biol Chem 2011; 286:29780-7. [PMID: 21697092 DOI: 10.1074/jbc.m111.261370] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
An early step in the intracellular processing of vitamin B(12) involves CblC, which exhibits dual reactivity, catalyzing the reductive decyanation of cyanocobalamin (vitamin B(12)), and the dealkylation of alkylcobalamins (e.g. methylcobalamin; MeCbl). Insights into how the CblC scaffold supports this chemical dichotomy have been unavailable despite it being the most common locus of patient mutations associated with inherited cobalamin disorders that manifest in both severe homocystinuria and methylmalonic aciduria. Herein, we report structures of human CblC, with and without bound MeCbl, which provide novel biochemical insights into its mechanism of action. Our results reveal that CblC is the most divergent member of the NADPH-dependent flavin reductase family and can use FMN or FAD as a prosthetic group to catalyze reductive decyanation. Furthermore, CblC is the first example of an enzyme with glutathione transferase activity that has a sequence and structure unrelated to the GST superfamily. CblC thus represents an example of evolutionary adaptation of a common structural platform to perform diverse chemistries. The CblC structure allows us to rationalize the biochemical basis of a number of pathological mutations associated with severe clinical phenotypes.
Collapse
Affiliation(s)
- Markos Koutmos
- Department of Biological Chemistry and the Life Sciences Institute, University of Michigan Medical Center, Ann Arbor, Michigan 48109-0600, USA
| | | | | | | |
Collapse
|
10
|
Zhang G, Lukoszek R, Mueller-Roeber B, Ignatova Z. Different sequence signatures in the upstream regions of plant and animal tRNA genes shape distinct modes of regulation. Nucleic Acids Res 2010; 39:3331-9. [PMID: 21138970 PMCID: PMC3082873 DOI: 10.1093/nar/gkq1257] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
In eukaryotes, the transcription of tRNA genes is initiated by the concerted action of transcription factors IIIC (TFIIIC) and IIIB (TFIIIB) which direct the recruitment of polymerase III. While TFIIIC recognizes highly conserved, intragenic promoter elements, TFIIIB binds to the non-coding 5'-upstream regions of the tRNA genes. Using a systematic bioinformatic analysis of 11 multicellular eukaryotic genomes we identified a highly conserved TATA motif followed by a CAA-motif in the tRNA upstream regions of all plant genomes. Strikingly, the 5'-flanking tRNA regions of the animal genomes are highly heterogeneous and lack a common conserved sequence signature. Interestingly, in the animal genomes the tRNA species that read the same codon share conserved motifs in their upstream regions. Deep-sequencing analysis of 16 human tissues revealed multiple splicing variants of two of the TFIIIB subunits, Bdp1 and Brf1, with tissue-specific expression patterns. These multiple forms most likely modulate the TFIIIB-DNA interactions and explain the lack of a uniform signature motif in the tRNA upstream regions of animal genomes. The anticodon-dependent 5'-flanking motifs provide a possible mechanism for independent regulation of the tRNA transcription in various human tissues.
Collapse
Affiliation(s)
- Gong Zhang
- Department of Biochemistry, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str 24-25, 14476 Potsdam, Potsdam, Germany
| | | | | | | |
Collapse
|
11
|
Zhang Z, Stamm S. Analysis of mutations that influence pre-mRNA splicing. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2010; 703:137-60. [PMID: 21125488 DOI: 10.1007/978-1-59745-248-9_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
A rapidly increasing number of human diseases are now recognized as being caused by the selection of wrong splice sites. In most cases, these changes in alternative splice site selection are due to single nucleotide exchanges in splicing regulatory elements. This chapter describes the use of bioinformatics tools to predict the influence of a mutation on alternative pre-mRNA splicing and the experimental testing of these predictions. The bioinformatic analysis determines the influence of a mutation on splicing enhancers and silencers, splice sites and RNA secondary structures. This approach generates hypotheses that are tested using splicing reporter constructs, which are then analyzed in transfection assays. We describe a recombination-based system that allows for the generation of splicing reporter constructs in the first week and their subsequent analysis in the second week.
Collapse
Affiliation(s)
- Zhaiyi Zhang
- Department of Molecular and Cellular Biochemistry, Biomedical Biological Sciences Research Building, College of Medicine, University of Kentucky, Lexington, KY, USA.
| | | |
Collapse
|
12
|
Zhou A, Zhang F, Chen JY. PEPPI: a peptidomic database of human protein isoforms for proteomics experiments. BMC Bioinformatics 2010; 11 Suppl 6:S7. [PMID: 20946618 PMCID: PMC3026381 DOI: 10.1186/1471-2105-11-s6-s7] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Abstract
Collapse
Affiliation(s)
- Ao Zhou
- School of Informatics, Indiana University, Indianapolis, IN 46202, USA
| | | | | |
Collapse
|
13
|
Chang KY, Georgianna DR, Heber S, Payne GA, Muddiman DC. Detection of alternative splice variants at the proteome level in Aspergillus flavus. J Proteome Res 2010; 9:1209-17. [PMID: 20047314 DOI: 10.1021/pr900602d] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Identification of proteins from proteolytic peptides or intact proteins plays an essential role in proteomics. Researchers use search engines to match the acquired peptide sequences to the target proteins. However, search engines depend on protein databases to provide candidates for consideration. Alternative splicing (AS), the mechanism where the exon of pre-mRNAs can be spliced and rearranged to generate distinct mRNA and therefore protein variants, enable higher eukaryotic organisms, with only a limited number of genes, to have the requisite complexity and diversity at the proteome level. Multiple alternative isoforms from one gene often share common segments of sequences. However, many protein databases only include a limited number of isoforms to keep minimal redundancy. As a result, the database search might not identify a target protein even with high quality tandem MS data and accurate intact precursor ion mass. We computationally predicted an exhaustive list of putative isoforms of Aspergillus flavus proteins from 20 371 expressed sequence tags to investigate whether an alternative splicing protein database can assign a greater proportion of mass spectrometry data. The newly constructed AS database provided 9807 new alternatively spliced variants in addition to 12 832 previously annotated proteins. The searches of the existing tandem MS spectra data set using the AS database identified 29 new proteins encoded by 26 genes. Nine fungal genes appeared to have multiple protein isoforms. In addition to the discovery of splice variants, AS database also showed potential to improve genome annotation. In summary, the introduction of an alternative splicing database helps identify more proteins and unveils more information about a proteome.
Collapse
Affiliation(s)
- Kung-Yen Chang
- Bioinformatics Research Center, Center for Integrated Fungal Research, and W.M. Keck FT-ICR-MS Laboratory, Department of Chemistry, North Carolina State University, Raleigh, North Carolina 27695, USA
| | | | | | | | | |
Collapse
|
14
|
Wu J. Testing the coding potential of conserved short genomic sequences. Adv Bioinformatics 2010; 2010:287070. [PMID: 20224812 PMCID: PMC2834954 DOI: 10.1155/2010/287070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2009] [Accepted: 01/02/2010] [Indexed: 11/25/2022] Open
Abstract
Proposed is a procedure to test whether a genomic sequence contains coding DNA, called a coding potential region. The procedure tests the coding potential of conserved short genomic sequence, in which the assumptions on the probability models of gene structures are relaxed. Thus, it is expected to provide additional candidate regions that contain coding DNAs to the current genomic database. The procedure was applied to the set of highly conserved human-mouse sequences in the genome database at the University of California at Santa Cruz. For sequences containing RefSeq coding exons, the procedure detected 91.3% regions having coding potential in this set, which covers 83% of the human RefSeq coding exons, at a 2.6% false positive rate. The procedure detected 12,688 novel short regions with coding potential at the false discovery rate <0.05; 65.7% of the novel regions are between annotated genes.
Collapse
Affiliation(s)
- Jing Wu
- Department of Statistics, Carnegie Mellon University, PA 15213, USA
| |
Collapse
|
15
|
Omenn GS. Bioinformatics and systems biology of cancers. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2010; 95:159-91. [PMID: 21075332 DOI: 10.1016/b978-0-12-385071-3.00007-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Molecular databases and bioinformatics methods and tools are essential for modern cancer research. Multilevel analyses of all the protein-coding genes, thousands of proteins, and hundreds of metabolites require integration in terms of signaling and metabolic pathways and networks. This chapter provides background and examples of genomic, gene expression, epigenomic, proteomic, and metabolomic investigations of cancer progression and emergence of invasive and metastatic properties of cancers.
Collapse
Affiliation(s)
- Gilbert S Omenn
- Department of Internal Medicine, School of Public Health, Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
16
|
Chacko E, Ranganathan S. Comprehensive splicing graph analysis of alternative splicing patterns in chicken, compared to human and mouse. BMC Genomics 2009; 10 Suppl 1:S5. [PMID: 19594882 PMCID: PMC2709266 DOI: 10.1186/1471-2164-10-s1-s5] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Background Alternative transcript diversity manifests itself as a prime cause of complexity in higher eukaryotes. Recently, transcript diversity studies have suggested that 60–80% of human genes are alternatively spliced. We have used a splicing pattern approach for the bioinformatics analysis of Alternative Splicing (AS) in chicken, human and mouse. Exons involved in splicing are subdivided into distinct and variant exons, based on the prevalence of the exons across the transcripts. Four possible permutations of these two different groups of exons were categorised as class I (distinct-variant), class II (distinct-variant), class III (variant-distinct) and class IV (variant-variant). This classification quantifies the variation in transcript diversity in the three species. Results In all, 3901 chicken AS genes have been compared with 16,715 human and 16,491 mouse AS genes, with 23% of chicken genes being alternatively spliced, compared to 68% in humans and 57% in mice. To minimize any gene structure bias in the input data, comparative genome analysis has been carried out on the orthologous subset of AS genes for the three species. Gene-level analysis suggested that chicken genes show fewer AS events compared to human and mouse. An event-level analysis showed that the percentage of AS events in chicken is similar to that of human, which implies that a smaller number of chicken genes show greater transcript diversity. Overall, chicken genes were found to have fewer transcripts per gene and shorter introns than human and mouse genes. Conclusion In chicken, the majority of genes generate only two or three isoforms, compared to almost eight in human and six in mouse. We observed that intron definition is expressed strongly when compared to exon definition for chicken genome, based on 3% intron retention in chicken, compared to 2% in human and mouse. Splicing patterns with variant exons account for 33% of AS chicken orthologous genes compared to 24% in human and 27% in mouse, providing a novel measure to describe the species-wise complexity due to alternative transcript diversity.
Collapse
Affiliation(s)
- Elsa Chacko
- Department of Chemistry and Biomolecular Sciences, Macquarie University, NSW, Australia.
| | | |
Collapse
|
17
|
van Hooff SR, Koster J, Hulsen T, van Schaik BDC, Roos M, van Batenburg MF, Versteeg R, van Kampen AHC. The construction of genome-based transcriptional units. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2009; 13:105-14. [PMID: 19320556 DOI: 10.1089/omi.2008.0036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Gene-oriented sequence clusters (transcriptional units) have found many applications in genomics research including the construction of transcriptome maps and identification of splice variants. We developed a new method to construct transcriptional that uses the genomic sequence as a template. We present and discuss our method in detail together with an evaluation of the transcriptional units for human. We constructed 33,007 and 27,792 transcriptional units for human and mouse, respectively. The sensitivity (81%) and specificity (90%) of our method compares favorably to other established methods. We evaluated the representation of experimentally validated and predicted intergenic spliced transcripts in humans and show that we correctly represent a large fraction of these cases by single transcriptional units. Our method performs well, but the evaluation of the final set of transcriptional units show that improvements to the algorithm are still possible. However, because the precise number and types of errors are difficult to track, it is not obvious how to significantly improve the algorithm. We believe that ongoing research efforts are necessary to further improve current methods. This should include detailed documentation, comparison, and evaluation of current methods.
Collapse
Affiliation(s)
- Sander R van Hooff
- Bioinformatics Laboratory, Academic Medical Center, Meibergdreef 9, Amsterdam, The Netherlands
| | | | | | | | | | | | | | | |
Collapse
|
18
|
Menon R, Zhang Q, Zhang Y, Fermin D, Bardeesy N, DePinho RA, Lu C, Hanash SM, Omenn GS, States DJ. Identification of novel alternative splice isoforms of circulating proteins in a mouse model of human pancreatic cancer. Cancer Res 2009; 69:300-9. [PMID: 19118015 PMCID: PMC2613545 DOI: 10.1158/0008-5472.can-08-2145] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
To assess the potential of tumor-associated, alternatively spliced gene products as a source of biomarkers in biological fluids, we have analyzed a large data set of mass spectra derived from the plasma proteome of a mouse model of human pancreatic ductal adenocarcinoma. MS/MS spectra were interrogated for novel splice isoforms using a nonredundant database containing an exhaustive three-frame translation of Ensembl transcripts and gene models from ECgene. This integrated analysis identified 420 distinct splice isoforms, of which 92 did not match any previously annotated mouse protein sequence. We chose seven of those novel variants for validation by reverse transcription-PCR. The results were concordant with the proteomic analysis. All seven novel peptides were successfully amplified in pancreas specimens from both wild-type and mutant mice. Isotopic labeling of cysteine-containing peptides from tumor-bearing mice and wild-type controls enabled relative quantification of the proteins. Differential expression between tumor-bearing and control mice was notable for peptides from novel variants of muscle pyruvate kinase, malate dehydrogenase 1, glyceraldehyde-3-phosphate dehydrogenase, proteoglycan 4, minichromosome maintenance, complex component 9, high mobility group box 2, and hepatocyte growth factor activator. Our results show that, in a mouse model for human pancreatic cancer, novel and differentially expressed alternative splice isoforms are detectable in plasma and may be a source of candidate biomarkers.
Collapse
Affiliation(s)
- Rajasree Menon
- Center for Computational Medicine and Biology and Pediatric Endocrinology, University of Michigan, 100 Washtenaw Avenue, Palmer Commons, Ann Arbor, MI 48109, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Mo F, Hong X, Gao F, Du L, Wang J, Omenn GS, Lin B. A compatible exon-exon junction database for the identification of exon skipping events using tandem mass spectrum data. BMC Bioinformatics 2008; 9:537. [PMID: 19087293 PMCID: PMC2636810 DOI: 10.1186/1471-2105-9-537] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2008] [Accepted: 12/16/2008] [Indexed: 01/26/2023] Open
Abstract
Background Alternative splicing is an important gene regulation mechanism. It is estimated that about 74% of multi-exon human genes have alternative splicing. High throughput tandem (MS/MS) mass spectrometry provides valuable information for rapidly identifying potentially novel alternatively-spliced protein products from experimental datasets. However, the ability to identify alternative splicing events through tandem mass spectrometry depends on the database against which the spectra are searched. Results We wrote scripts in perl, Bioperl, mysql and Ensembl API and built a theoretical exon-exon junction protein database to account for all possible combinations of exons for a gene while keeping the frame of translation (i.e., keeping only in-phase exon-exon combinations) from the Ensembl Core Database. Using our liver cancer MS/MS dataset, we identified a total of 488 non-redundant peptides that represent putative exon skipping events. Conclusion Our exon-exon junction database provides the scientific community with an efficient means to identify novel alternatively spliced (exon skipping) protein isoforms using mass spectrometry data. This database will be useful in annotating genome structures using rapidly accumulating proteomics data.
Collapse
Affiliation(s)
- Fan Mo
- Systems Biology Division, Zhejiang-California Nanosystems Institute (ZCNI) of Zhejiang University, Zhejiang University Huajiachi Campus, Hangzhou, PR China.
| | | | | | | | | | | | | |
Collapse
|
20
|
Ryan MC, Zeeberg BR, Caplen NJ, Cleland JA, Kahn AB, Liu H, Weinstein JN. SpliceCenter: a suite of web-based bioinformatic applications for evaluating the impact of alternative splicing on RT-PCR, RNAi, microarray, and peptide-based studies. BMC Bioinformatics 2008; 9:313. [PMID: 18638396 PMCID: PMC2491637 DOI: 10.1186/1471-2105-9-313] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2008] [Accepted: 07/18/2008] [Indexed: 11/10/2022] Open
Abstract
Background Over 60% of protein-coding genes in vertebrates express mRNAs that undergo alternative splicing. The resulting collection of transcript isoforms poses significant challenges for contemporary biological assays. For example, RT-PCR validation of gene expression microarray results may be unsuccessful if the two technologies target different splice variants. Effective use of sequence-based technologies requires knowledge of the specific splice variant(s) that are targeted. In addition, the critical roles of alternative splice forms in biological function and in disease suggest that assay results may be more informative if analyzed in the context of the targeted splice variant. Results A number of contemporary technologies are used for analyzing transcripts or proteins. To enable investigation of the impact of splice variation on the interpretation of data derived from those technologies, we have developed SpliceCenter. SpliceCenter is a suite of user-friendly, web-based applications that includes programs for analysis of RT-PCR primer/probe sets, effectors of RNAi, microarrays, and protein-targeting technologies. Both interactive and high-throughput implementations of the tools are provided. The interactive versions of SpliceCenter tools provide visualizations of a gene's alternative transcripts and probe target positions, enabling the user to identify which splice variants are or are not targeted. The high-throughput batch versions accept user query files and provide results in tabular form. When, for example, we used SpliceCenter's batch siRNA-Check to process the Cancer Genome Anatomy Project's large-scale shRNA library, we found that only 59% of the 50,766 shRNAs in the library target all known splice variants of the target gene, 32% target some but not all, and 9% do not target any currently annotated transcript. Conclusion SpliceCenter provides unique, user-friendly applications for assessing the impact of transcript variation on the design and interpretation of RT-PCR, RNAi, gene expression microarrays, antibody-based detection, and mass spectrometry proteomics. The tools are intended for use by bench biologists as well as bioinformaticists.
Collapse
Affiliation(s)
- Michael C Ryan
- Genomics & Bioinformatics Group, Laboratory of Molecular Pharmacology, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA.
| | | | | | | | | | | | | |
Collapse
|
21
|
Abstract
Most alternative splicing events in human and other eukaryotic genomes are detected using sequence fragments produced by high throughput genomic technologies, such as EST sequencing and oligonucleotide microarrays. Reconstructing full-length transcript isoforms from such sequence fragments is a major interest and challenge for computational analyses of pre-mRNA alternative splicing. This chapter describes a general graph-based approach for computational inference of full-length isoforms.
Collapse
Affiliation(s)
- Yi Xing
- Department of Internal Medicine, Carver College of Medicine, University of Iowa, Iowa City, IA, USA
| | | |
Collapse
|
22
|
Bingham JL, Carrigan PE, Miller LJ, Srinivasan S. Extent and diversity of human alternative splicing established by complementary database annotation and microarray analysis. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2008; 12:83-92. [PMID: 18266558 DOI: 10.1089/omi.2007.0041] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Alternative splicing generates functional diversity in higher organisms through alternative first and last exons, skipped and included exons, intron retentions and alternative donor, and acceptor sites. In large-scale microarray studies in humans and the mouse, emphasis so far has been placed on exon-skip events, leaving the prevalence and importance of other splice types largely unexplored. Using a new human splice variant database and a genome-wide microarray to probes thousands of splice events of each type, we measured differential expression of splice types across six pair of diverse cell lines and validated the database annotation process. Results suggest that splicing in humans is more complex than simple exon-skip events, which account for a minority of splicing differences. The relative frequency of differential expression of the splice types correlates with what is found by our annotation efforts. In conclusion, alternative splicing in human cells is considerably more complex than the canonical example of the exon skip. The complementary approaches of genome-wide annotation of alternative splicing in human and design of genome-wide splicing microarrays to measure differential splicing in biological samples provide a powerful high-throughput tool to study the role of alternative splicing in human biology.
Collapse
|
23
|
hsp70 genes in the human genome: Conservation and differentiation patterns predict a wide array of overlapping and specialized functions. BMC Evol Biol 2008; 8:19. [PMID: 18215318 PMCID: PMC2266713 DOI: 10.1186/1471-2148-8-19] [Citation(s) in RCA: 179] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2007] [Accepted: 01/23/2008] [Indexed: 02/05/2023] Open
Abstract
Background Hsp70 chaperones are required for key cellular processes and response to environmental changes and survival but they have not been fully characterized yet. The human hsp70-gene family has an unknown number of members (eleven counted over ten years ago); some have been described but the information is incomplete and inconsistent. A coherent body of knowledge encompassing all family components that would facilitate their study individually and as a group is lacking. Nowadays, the study of chaperone genes benefits from the availability of genome sequences and a new protocol, chaperonomics, which we applied to elucidate the human hsp70 family. Results We identified 47 hsp70 sequences, 17 genes and 30 pseudogenes. The genes distributed into seven evolutionarily distinct groups with distinguishable subgroups according to phylogenetic and other data, such as exon-intron and protein features. The N-terminal ATP-binding domain (ABD) was conserved at least partially in the majority of the proteins but the C-terminal substrate-binding domain (SBD) was not. Nine proteins were typical Hsp70s (65–80 kDa) with ABD and SBD, two were lighter lacking partly or totally the SBD, and six were heavier (>80 kDa) with divergent C-terminal domains. We also analyzed exon-intron features, transcriptional variants and protein structure and isoforms, and modality and patterns of expression in various tissues and developmental stages. Evolutionary analyses, including human hsp70 genes and pseudogenes, and other eukaryotic hsp70 genes, showed that six human genes encoding cytosolic Hsp70s and 27 pseudogenes originated from retro-transposition of HSPA8, a gene highly expressed in most tissues and developmental stages. Conclusion The human hsp70-gene family is characterized by a remarkable evolutionary diversity that mainly resulted from multiple duplications and retrotranspositions of a highly expressed gene, HSPA8. Human Hsp70 proteins are clustered into seven evolutionary Groups, with divergent C-terminal domains likely defining their distinctive functions. These functions may also be further defined by the observed differences in the N-terminal domain.
Collapse
|
24
|
Abstract
In recent years, genome-wide detection of alternative splicing based on Expressed Sequence Tag (EST) sequence alignments with mRNA and genomic sequences has dramatically expanded our understanding of the role of alternative splicing in functional regulation. This chapter reviews the data, methodology, and technical challenges of these genome-wide analyses of alternative splicing, and briefly surveys some of the uses to which such alternative splicing databases have been put. For example, with proper alternative splicing database schema design, it is possible to query genome-wide for alternative splicing patterns that are specific to particular tissues, disease states (e.g., cancer), gender, or developmental stages. EST alignments can be used to estimate exon inclusion or exclusion level of alternatively spliced exons and evolutionary changes for various species can be inferred from exon inclusion level. Such databases can also help automate design of probes for RT-PCR and microarrays, enabling high throughput experimental measurement of alternative splicing.
Collapse
|
25
|
Identification and analysis of human RCAN3 (DSCR1L2) mRNA and protein isoforms. Gene 2007; 407:159-68. [PMID: 18022329 DOI: 10.1016/j.gene.2007.10.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2007] [Revised: 10/02/2007] [Accepted: 10/04/2007] [Indexed: 11/22/2022]
Abstract
Human RCAN3 (Regulator of calcineurin 3; previously known as DSCR1L2, Down syndrome critical region gene 1-like 2) is a five-exon gene mapped on chromosome 1 and belongs to the human RCAN gene family which also includes RCAN1 and RCAN2. The novel denomination RCAN for genes and proteins, instead of DSCR1L (Down syndrome critical region gene 1-like) has recently been widely discussed. The aim of the present work was to perform a multiple approach analysis of five RCAN3 mRNA and encoded protein isoforms, two of which have been identified for the first time in this research. The two new RCAN3 mRNA isoforms, RCAN3-2,4,5, which lacks exon 3, and RCAN3-2,3,5, which lacks exon 4, were identified during RCAN3 RT-PCR (reverse transcription-polymerase chain reaction) cloning, the product of which unexpectedly revealed the presence of five isoforms as opposed to the three previously known. In order to analyze the expression pattern of the five RCAN3 mRNA isoforms in seven different human tissues, a quantitative relative RT-PCR was performed: interestingly, all isoforms are present in all tissues investigated, with a statistically significant constant prevalence of RCAN3 isoform (the most complete, "reference" isoform). The RCAN3 locus expression level was comparable in all seven tissues analyzed, considering all isoforms, which indicates a ubiquitous expression of this human RCAN family member. To date two possible interactors have been described for this protein: human cardiac troponin I (TNNI3) and calcineurin. Here we report the interaction between the new RCAN3 variants and TNNI3, demonstrated by both yeast cotransformation and by the GST (glutathione-sepharose transferase) fusion protein assay, as was to be expected from the presence of exon 2 whose product has been seen to be sufficient for binding to TNNI3.
Collapse
|
26
|
Tang H, Heeley T, Morlec R, Hubbard SJ. Characterising alternate splicing and tissue specific expression in the chicken from ESTs. Cytogenet Genome Res 2007; 117:268-77. [PMID: 17675868 PMCID: PMC2266501 DOI: 10.1159/000103188] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2006] [Accepted: 11/15/2006] [Indexed: 01/19/2023] Open
Abstract
Alternate splicing is believed to produce the greatest diversity in transcriptional complexity and function in eukaryotic species. In this study, we present an analysis of alternative splicing events that occur in the chicken, using the recently sequenced genomic sequence and over 580,000 EST sequences mapped back to the genome. A carefully controlled EST-to-genome mapping pipeline is presented, based around the EXONERATE program using the est2genome model, which also considers several quality control steps to filter out erroneous matches. The data is then used to estimate the level of alternate splicing events with respect to Ensembl predicted transcripts. The EST-genome mappings are characterised at the exon level, in order to classify individual splicing events and provide estimates of novel transcripts not currently annotated by the Ensembl genome database. This is the first large scale analysis of this kind in an avian species, and suggests that chicken displays a similar level of alternate splicing as that found in other higher vertebrates such as human and mouse, both in terms of the number of genes that undergo alternate splicing events, and the average number of transcripts produced per gene. The EST data suggests alternate splicing may occur in some 50-60% of the chicken gene set and with an average of around 2.3 transcripts per gene which undergo this process. The EST data is also used to look at gene and transcript usage in the tissues sequenced in embryonic and adult libraries. Genes which display notable biases were analysed in more detail, including twinfilin-2 and embryonic heavy chain myosin. This also highlights several as yet functionally un-annotated genes which appear to be important in embryonic tissues and also undergo alternate splicing events. The analysis also demonstrates some of the difficulties involved in using EST-based data to annotate transcriptional activity in eukaryotic genes, where a broad spectrum of tissues and a large number of sequenced transcripts are required in order to fully characterise alternate splicing and differential expression.
Collapse
Affiliation(s)
- H Tang
- Faculty of Life Sciences, The University of Manchester, Manchester, UK
| | | | | | | |
Collapse
|
27
|
A new advance in alternative splicing databases: from catalogue to detailed analysis of regulation of expression and function of human alternative splicing variants. BMC Bioinformatics 2007; 8:180. [PMID: 17547750 PMCID: PMC1904244 DOI: 10.1186/1471-2105-8-180] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2006] [Accepted: 06/04/2007] [Indexed: 11/25/2022] Open
Abstract
Background Most human genes produce several transcripts with different exon contents by using alternative promoters, alternative polyadenylation sites and alternative splice sites. Much effort has been devoted to describing known gene transcripts through the development of numerous databases. Nevertheless, owing to the diversity of the transcriptome, there is a need for interactive databases that provide information about the potential function of each splicing variant, as well as its expression pattern. Description After setting up a database in which human and mouse splicing variants were compiled, we developed tools (1) to predict the production of protein isoforms from these transcripts, taking account of the presence of open reading frames and mechanisms that could potentially eliminate transcripts and/or inhibit their translation, i.e. nonsense-mediated mRNA decay and microRNAs; (2) to support studies of the regulation of transcript expression at multiple levels, including transcription and splicing, particularly in terms of tissue specificity; and (3) to assist in experimental analysis of the expression of splicing variants. Importantly, analyses of all features from transcript metabolism to functional protein domains were integrated in a highly interactive, user-friendly web interface that allows the functional and regulatory features of gene transcripts to be assessed rapidly and accurately. Conclusion In addition to identifying the transcripts produced by human and mouse genes, fast DB provides tools for analyzing the putative functions of these transcripts and the regulation of their expression. Therefore, fast DB has achieved an advance in alternative splicing databases by providing resources for the functional interpretation of splicing variants for the human and mouse genomes. Because gene expression studies are increasingly employed in clinical analyses, our web interface has been designed to be as user-friendly as possible and to be readily searchable and intelligible at a glance by the whole biomedical community.
Collapse
|
28
|
SpliceMiner: a high-throughput database implementation of the NCBI Evidence Viewer for microarray splice variant analysis. BMC Bioinformatics 2007; 8:75. [PMID: 17338820 PMCID: PMC1839109 DOI: 10.1186/1471-2105-8-75] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2006] [Accepted: 03/05/2007] [Indexed: 12/12/2022] Open
Abstract
Background There are many fewer genes in the human genome than there are expressed transcripts. Alternative splicing is the reason. Alternatively spliced transcripts are often specific to tissue type, developmental stage, environmental condition, or disease state. Accurate analysis of microarray expression data and design of new arrays for alternative splicing require assessment of probes at the sequence and exon levels. Description SpliceMiner is a web interface for querying Evidence Viewer Database (EVDB). EVDB is a comprehensive, non-redundant compendium of splice variant data for human genes. We constructed EVDB as a queryable implementation of the NCBI Evidence Viewer (EV). EVDB is based on data obtained from NCBI Entrez Gene and EV. The automated EVDB build process uses only complete coding sequences, which may or may not include partial or complete 5' and 3' UTRs, and filters redundant splice variants. Unlike EV, which supports only one-at-a-time queries, SpliceMiner supports high-throughput batch queries and provides results in an easily parsable format. SpliceMiner maps probes to splice variants, effectively delineating the variants identified by a probe. Conclusion EVDB can be queried by gene symbol, genomic coordinates, or probe sequence via a user-friendly web-based tool we call SpliceMiner (). The EVDB/SpliceMiner combination provides an interface with human splice variant information and, going beyond the very valuable NCBI Evidence Viewer, supports fluent, high-throughput analysis. Integration of EVDB information into microarray analysis and design pipelines has the potential to improve the analysis and bioinformatic interpretation of gene expression data, for both batch and interactive processing. For example, whenever a gene expression value is recognized as important or appears anomalous in a microarray experiment, the interactive mode of SpliceMiner can be used quickly and easily to check for possible splice variant issues.
Collapse
|
29
|
Lee Y, Lee Y, Kim B, Shin Y, Nam S, Kim P, Kim N, Chung WH, Kim J, Lee S. ECgene: an alternative splicing database update. Nucleic Acids Res 2006; 35:D99-103. [PMID: 17132829 PMCID: PMC1716719 DOI: 10.1093/nar/gkl992] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
ECgene () was developed to provide functional annotation for alternatively spliced genes. The applications encompass the genome-based transcript modeling for alternative splicing (AS), domain analysis with Gene Ontology (GO) annotation and expression analysis based on the EST and SAGE data. We have expanded the ECgene's AS modeling and EST clustering to nine organisms for which sufficient EST data are available in the GenBank. As for the human genome, we have also introduced several new applications to analyze differential expression. ECprofiler is an ontology-based candidate gene search system that allows users to select an arbitrary combination of gene expression pattern and GO functional categories. DEGEST is a database of differentially expressed genes and isoforms based on the EST information. Importantly, gene expression is analyzed at three distinctive levels—gene, isoform and exon levels. The user interfaces for functional and expression analyses have been substantially improved. ASviewer is a dedicated java application that visualizes the transcript structure and functional features of alternatively spliced variants. The SAGE part of the expression module provides many additional features including SNP, differential expression and alternative tag positions.
Collapse
Affiliation(s)
- Yeunsook Lee
- Division of Molecular Life Sciences, Ewha Womans UniversitySeoul 120-750, Korea
| | - Younghee Lee
- Division of Molecular Life Sciences, Ewha Womans UniversitySeoul 120-750, Korea
| | - Bumjin Kim
- Division of Molecular Life Sciences, Ewha Womans UniversitySeoul 120-750, Korea
| | - Youngah Shin
- Division of Molecular Life Sciences, Ewha Womans UniversitySeoul 120-750, Korea
| | - Seungyoon Nam
- Division of Molecular Life Sciences, Ewha Womans UniversitySeoul 120-750, Korea
- Interdisciplinary Program in Bioinformatics, Seoul National UniversitySeoul 151-742, Korea
| | - Pora Kim
- Bioinformatics Team, Electronics and Telecommunications Research Institute (ETRI)Gajeong-Dong, Yuseong-Gu, Daejeon 305-350, Korea
| | - Namshin Kim
- Department of Chemistry and Biochemistry, Center for Computational Biology, Institute for Genomics and Proteomics, Molecular Biology Institute, University of California Los AngelesLos Angeles, CA 90095-1570, USA
| | - Won-Hyong Chung
- Korean Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology52 Eoeun, Yuseong, Daejeon 305-333, Korea
| | - Jaesang Kim
- Division of Molecular Life Sciences, Ewha Womans UniversitySeoul 120-750, Korea
| | - Sanghyuk Lee
- Division of Molecular Life Sciences, Ewha Womans UniversitySeoul 120-750, Korea
- To whom correspondence should be addressed. Tel: +82 2 3277 2888; Fax: +82 2 3277 3760;
| |
Collapse
|
30
|
Brocchieri L, Conway de Macario E, Macario AJL. Chaperonomics, a new tool to study ageing and associated diseases. Mech Ageing Dev 2006; 128:125-36. [PMID: 17123587 DOI: 10.1016/j.mad.2006.11.019] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The participation of molecular chaperones in the process of senescence and in the mechanisms of age-related diseases is currently under investigation in many laboratories. However, accurate, complete information about the number and diversity of chaperone genes in any given genome is scarce. Consequently, the results of efforts aimed at elucidating the role of chaperones in ageing and disease are often confusing and contradictory. To remedy this situation, we have developed chaperonomics, including means to identify and characterize chaperone genes and their families applicable to humans and model organisms. The problem is difficult because in eukaryotic organisms chaperones have evolved into complex multi-gene families. For instance, the occurrence of multiple paralogs in a single genome makes it difficult to interpret results if consideration is not given to the fact that similar but distinct chaperone genes can be differentially expressed in separate cellular compartments, tissues, and developmental stages. The availability of complete genome sequences allows implementation of chaperonomics with the purpose of understanding the composition of chaperone families in all cell compartments, their evolutionary and functional relations and, ultimately, their role in pathogenesis. Here, we present a series of concatenated, complementary procedures for identifying, characterizing, and classifying chaperone genes in genomes and for elucidating evolutionary relations and structural features useful in predicting functional properties. We illustrate the procedures with applications to the complex family of hsp70 genes and show that the kind of data obtained can provide a solid basis for future research.
Collapse
Affiliation(s)
- Luciano Brocchieri
- University of Florida, College of Medicine, Department of Molecular Genetics and Microbiology, UF Genetics Institute, P.O. Box 103610, Gainesville, FL 32610-3610, USA
| | | | | |
Collapse
|
31
|
Kim N, Alekseyenko AV, Roy M, Lee C. The ASAP II database: analysis and comparative genomics of alternative splicing in 15 animal species. Nucleic Acids Res 2006; 35:D93-8. [PMID: 17108355 PMCID: PMC1669709 DOI: 10.1093/nar/gkl884] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
We have greatly expanded the Alternative Splicing Annotation Project (ASAP) database: (i) its human alternative splicing data are expanded ∼3-fold over the previous ASAP database, to nearly 90 000 distinct alternative splicing events; (ii) it now provides genome-wide alternative splicing analyses for 15 vertebrate, insect and other animal species; (iii) it provides comprehensive comparative genomics information for comparing alternative splicing and splice site conservation across 17 aligned genomes, based on UCSC multigenome alignments; (iv) it provides an ∼2- to 3-fold expansion in detection of tissue-specific alternative splicing events, and of cancer versus normal specific alternative splicing events. We have also constructed a novel database linking orthologous exons and orthologous introns between genomes, based on multigenome alignment of 17 animal species. It can be a valuable resource for studies of gene structure evolution. ASAP II provides a new web interface enabling more detailed exploration of the data, and integrating comparative genomics information with alternative splicing data. We provide a set of tools for advanced data-mining of ASAP II with Pygr (the Python Graph Database Framework for Bioinformatics) including powerful features such as graph query, multigenome alignment query, etc. ASAP II is available at .
Collapse
Affiliation(s)
| | - Alexander V. Alekseyenko
- Department of Biomathematics, David Geffen School of MedicineUniversity of California Los Angeles, Los Angeles, CA 90095, USA
| | | | - Christopher Lee
- To whom correspondence should be addressed. Tel: +1 310 825 7374; Fax: +1 310 206 7286;
| |
Collapse
|
32
|
Castrignanò T, Rizzi R, Talamo IG, De Meo PD, Anselmo A, Bonizzoni P, Pesole G. ASPIC: a web resource for alternative splicing prediction and transcript isoforms characterization. Nucleic Acids Res 2006; 34:W440-3. [PMID: 16845044 PMCID: PMC1538898 DOI: 10.1093/nar/gkl324] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Alternative splicing (AS) is now emerging as a major mechanism contributing to the expansion of the transcriptome and proteome complexity of multicellular organisms. The fact that a single gene locus may give rise to multiple mRNAs and protein isoforms, showing both major and subtle structural variations, is an exceptionally versatile tool in the optimization of the coding capacity of the eukaryotic genome. The huge and continuously increasing number of genome and transcript sequences provides an essential information source for the computational detection of genes AS pattern. However, much of this information is not optimally or comprehensively used in gene annotation by current genome annotation pipelines. We present here a web resource implementing the ASPIC algorithm which we developed previously for the investigation of AS of user submitted genes, based on comparative analysis of available transcript and genome data from a variety of species. The ASPIC web resource provides graphical and tabular views of the splicing patterns of all full-length mRNA isoforms compatible with the detected splice sites of genes under investigation as well as relevant structural and functional annotation. The ASPIC web resource—available at —is dynamically interconnected with the Ensembl and Unigene databases and also implements an upload facility.
Collapse
Affiliation(s)
| | - Raffaella Rizzi
- DISCo, University of Milan Bicoccavia Bicocca degli Arcimboldi, 8, Milan, 20135, Italy
| | | | | | - Anna Anselmo
- Dipartimento di Scienze Biomolecolari e Biotecnologie, University of Milanvia Celoria 26, Milan 20133, Italy
| | - Paola Bonizzoni
- DISCo, University of Milan Bicoccavia Bicocca degli Arcimboldi, 8, Milan, 20135, Italy
| | - Graziano Pesole
- Dipartimento di Biochimica e Biologia Molecolare, University of Barivia Orabona, 4, Bari 70126, Italy
- To whom correspondence should be addressed. Tel: +39 080 5443588; Fax: +39 080 5443317;
| |
Collapse
|
33
|
Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JGR, Storey R, Swarbreck D, Rossier C, Ucla C, Hubbard T, Antonarakis SE, Guigo R. GENCODE: producing a reference annotation for ENCODE. Genome Biol 2006; 7 Suppl 1:S4.1-9. [PMID: 16925838 PMCID: PMC1810553 DOI: 10.1186/gb-2006-7-s1-s4] [Citation(s) in RCA: 443] [Impact Index Per Article: 24.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND The GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. This was achieved by a combination of initial manual annotation by the HAVANA team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results. RESULTS The GENCODE gene features are divided into eight different categories of which only the first two (known and novel coding sequence) are confidently predicted to be protein-coding genes. 5' rapid amplification of cDNA ends (RACE) and RT-PCR were used to experimentally verify the initial annotation. Of the 420 coding loci tested, 229 RACE products have been sequenced. They supported 5' extensions of 30 loci and new splice variants in 50 loci. In addition, 46 loci without evidence for a coding sequence were validated, consisting of 31 novel and 15 putative transcripts. We assessed the comprehensiveness of the GENCODE annotation by attempting to validate all the predicted exon boundaries outside the GENCODE annotation. Out of 1,215 tested in a subset of the ENCODE regions, 14 novel exon pairs were validated, only two of them in intergenic regions. CONCLUSION In total, 487 loci, of which 434 are coding, have been annotated as part of the GENCODE reference set available from the UCSC browser. Comparison of GENCODE annotation with RefSeq and ENSEMBL show only 40% of GENCODE exons are contained within the two sets, which is a reflection of the high number of alternative splice forms with unique exons annotated. Over 50% of coding loci have been experimentally verified by 5' RACE for EGASP and the GENCODE collaboration is continuing to refine its annotation of 1% human genome with the aid of experimental validation.
Collapse
Affiliation(s)
- Jennifer Harrow
- Wellcome Trust Sanger Institute, Wellcome Trust Campus, Hinxton, Cambridge CB10 1SA, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Bollina D, Lee BTK, Tan TW, Ranganathan S. ASGS: an alternative splicing graph web service. Nucleic Acids Res 2006; 34:W444-7. [PMID: 16845045 PMCID: PMC1538904 DOI: 10.1093/nar/gkl268] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2006] [Revised: 03/01/2006] [Accepted: 03/31/2006] [Indexed: 11/13/2022] Open
Abstract
Alternative transcript diversity manifests itself a prime cause of complexity in higher eukaryotes. The Alternative Splicing Graph Server (ASGS) is a web service facilitating the systematic study of alternatively spliced genes of higher eukaryotes by generating splicing graphs for the compact visual representation of transcript diversity from a single gene. Taking a set of transcripts in General Feature Format as input, ASGS identifies distinct reference and variable exons, generates a transcript splicing graph, an exon summary, splicing events classification and a single line graph to facilitate experimental analysis. This freely available web service can be accessed at http://asgs.biolinfo.org.
Collapse
Affiliation(s)
- Durgaprasad Bollina
- Department of Chemistry and Biomolecular Sciences and Biotechnology Research Institute, Macquarie UniversitySydney, NSW 2109, Australia
| | - Bernett T. K. Lee
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of SingaporeSingapore, 119260
| | - Tin Wee Tan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of SingaporeSingapore, 119260
| | - Shoba Ranganathan
- Department of Chemistry and Biomolecular Sciences and Biotechnology Research Institute, Macquarie UniversitySydney, NSW 2109, Australia
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of SingaporeSingapore, 119260
| |
Collapse
|
35
|
Xing Y, Yu T, Wu YN, Roy M, Kim J, Lee C. An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs. Nucleic Acids Res 2006; 34:3150-60. [PMID: 16757580 PMCID: PMC1475746 DOI: 10.1093/nar/gkl396] [Citation(s) in RCA: 116] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2006] [Revised: 04/13/2006] [Accepted: 05/10/2006] [Indexed: 11/13/2022] Open
Abstract
Reconstructing full-length transcript isoforms from sequence fragments (such as ESTs) is a major interest and challenge for bioinformatic analysis of pre-mRNA alternative splicing. This problem has been formulated as finding traversals across the splice graph, which is a directed acyclic graph (DAG) representation of gene structure and alternative splicing. In this manuscript we introduce a probabilistic formulation of the isoform reconstruction problem, and provide an expectation-maximization (EM) algorithm for its maximum likelihood solution. Using a series of simulated data and expressed sequences from real human genes, we demonstrate that our EM algorithm can correctly handle various situations of fragmentation and coupling in the input data. Our work establishes a general probabilistic framework for splice graph-based reconstructions of full-length isoforms.
Collapse
Affiliation(s)
- Yi Xing
- Molecular Biology Institute, Center for Computational Biology, Department of Chemistry and Biochemistry, University of CaliforniaLos Angeles, USA
| | - Tianwei Yu
- Department of Statistics, University of CaliforniaLos Angeles, USA
- Dental Research Institute, School of Dentistry, University of CaliforniaLos Angeles, USA
| | - Ying Nian Wu
- Department of Statistics, University of CaliforniaLos Angeles, USA
| | - Meenakshi Roy
- Molecular Biology Institute, Center for Computational Biology, Department of Chemistry and Biochemistry, University of CaliforniaLos Angeles, USA
| | - Joseph Kim
- Molecular Biology Institute, Center for Computational Biology, Department of Chemistry and Biochemistry, University of CaliforniaLos Angeles, USA
| | - Christopher Lee
- Molecular Biology Institute, Center for Computational Biology, Department of Chemistry and Biochemistry, University of CaliforniaLos Angeles, USA
| |
Collapse
|
36
|
Le Texier V, Riethoven JJ, Kumanduri V, Gopalakrishnan C, Lopez F, Gautheret D, Thanaraj TA. AltTrans: transcript pattern variants annotated for both alternative splicing and alternative polyadenylation. BMC Bioinformatics 2006; 7:169. [PMID: 16556303 PMCID: PMC1435940 DOI: 10.1186/1471-2105-7-169] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2005] [Accepted: 03/23/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The three major mechanisms that regulate transcript formation involve the selection of alternative sites for transcription start (TS), splicing, and polyadenylation. Currently there are efforts that collect data & annotation individually for each of these variants. It is important to take an integrated view of these data sets and to derive a data set of alternate transcripts along with consolidated annotation. We have been developing in the past computational pipelines that generate value-added data at genome-scale on individual variant types; these include AltSplice on splicing and AltPAS on polyadenylation. We now extend these pipelines and integrate the resultant data sets to facilitate an integrated view of the contributions from splicing and polyadenylation in the formation of transcript variants. DESCRIPTION The AltSplice pipeline examines gene-transcript alignments and delineates alternative splice events and splice patterns; this pipeline is extended as AltTrans to delineate isoform transcript patterns for each of which both introns/exons and 'terminating' polyA site are delineated; EST/mRNA sequences that qualify the transcript pattern confirm both the underlying splicing and polyadenylation. The AltPAS pipeline examines gene-transcript alignments and delineates all potential polyA sites irrespective of underlying splicing patterns. Resultant polyA sites from both AltTrans and AltPAS are merged. The generated database reports data on alternative splicing, alternative polyadenylation and the resultant alternate transcript patterns; the basal data is annotated for various biological features. The data (named as integrated AltTrans data) generated for both the organisms of human and mouse is made available through the Alternate Transcript Diversity web site at http://www.ebi.ac.uk/atd/. CONCLUSION The reported data set presents alternate transcript patterns that are annotated for both alternative splicing and alternative polyadenylation. Results based on current transcriptome data indicate that the contribution of alternative splicing is larger than that of alternative polyadenylation.
Collapse
Affiliation(s)
- Vincent Le Texier
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Jean-Jack Riethoven
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- 18 Crispin Close, Haverhill, Suffolk, CB9 9PT, UK
| | - Vasudev Kumanduri
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Chellappa Gopalakrishnan
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Fabrice Lopez
- INSERM ERM206, Université de la Méditerranée, Luminy case 928 – 13 288 Marseille Cedex 09, France
| | - Daniel Gautheret
- INSERM ERM206, Université de la Méditerranée, Luminy case 928 – 13 288 Marseille Cedex 09, France
| | - Thangavel Alphonse Thanaraj
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- 4 Copperfields, Saffron Walden, Essex, CB11 4FG, UK
| |
Collapse
|
37
|
Abstract
Chromosome translocation and gene fusion are frequent events in the human genome and are often the cause of many types of tumor. ChimerDB is the database of fusion sequences encompassing bioinformatics analysis of mRNA and expressed sequence tag (EST) sequences in the GenBank, manual collection of literature data and integration with other known database such as OMIM. Our bioinformatics analysis identifies the fusion transcripts that have non-overlapping alignments at multiple genomic loci. Fusion events at exon-exon borders are selected to filter out the cloning artifacts in cDNA library preparation. The result is classified into two groups--genuine chromosome translocation and fusion between neighboring genes owing to intergenic splicing. We also integrated manually collected literature and OMIM data for chromosome translocation as an aid to assess the validity of each fusion event. The database is available at http://genome.ewha.ac.kr/ChimerDB/ for human, mouse and rat genomes.
Collapse
Affiliation(s)
| | | | - Seungyoon Nam
- Interdisciplinary Program in Bioinformatics, Seoul National UniversitySeoul 151-747, Korea
| | - Seokmin Shin
- School of Chemistry, Seoul National UniversitySeoul 151-747, Korea
| | - Sanghyuk Lee
- To whom correspondence should be addressed. Tel: +82 232772888; Fax: +82 232772384;
| |
Collapse
|
38
|
Holste D, Huo G, Tung V, Burge CB. HOLLYWOOD: a comparative relational database of alternative splicing. Nucleic Acids Res 2006; 34:D56-62. [PMID: 16381932 PMCID: PMC1347411 DOI: 10.1093/nar/gkj048] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2005] [Revised: 09/26/2005] [Accepted: 10/04/2005] [Indexed: 01/05/2023] Open
Abstract
RNA splicing is an essential step in gene expression, and is often variable, giving rise to multiple alternatively spliced mRNA and protein isoforms from a single gene locus. The design of effective databases to support experimental and computational investigations of alternative splicing (AS) is a significant challenge. In an effort to integrate accurate exon and splice site annotation with current knowledge about splicing regulatory elements and predicted AS events, and to link information about the splicing of orthologous genes in different species, we have developed the Hollywood system. This database was built upon genomic annotation of splicing patterns of known genes derived from spliced alignment of complementary DNAs (cDNAs) and expressed sequence tags, and links features such as splice site sequence and strength, exonic splicing enhancers and silencers, conserved and non-conserved patterns of splicing, and cDNA library information for inferred alternative exons. Hollywood was implemented as a relational database and currently contains comprehensive information for human and mouse. It is accompanied by a web query tool that allows searches for sets of exons with specific splicing characteristics or splicing regulatory element composition, or gives a graphical or sequence-level summary of splicing patterns for a specific gene. A streamlined graphical representation of gene splicing patterns is provided, and these patterns can alternatively be layered onto existing information in the UCSC Genome Browser. The database is accessible at http://hollywood.mit.edu.
Collapse
Affiliation(s)
- Dirk Holste
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02319, USA.
| | | | | | | |
Collapse
|
39
|
Abstract
Alternative splicing and gene duplication are two major sources of proteomic function diversity. Here, we study the evolutionary trend of alternative splicing after gene duplication by analyzing the alternative splicing differences between duplicate genes. We observed that duplicate genes have fewer alternative splice (AS) forms than single-copy genes, and that a negative correlation exists between the mean number of AS forms and the gene family size. Interestingly, we found that the loss of alternative splicing in duplicate genes may occur shortly after the gene duplication. These results support the subfunctionization model of alternative splicing in the early stage after gene duplication. Further analysis of the alternative splicing distribution in human duplicate pairs showed the asymmetric evolution of alternative splicing after gene duplications; i.e., the AS forms between duplicates may differ dramatically. We therefore conclude that alternative splicing and gene duplication may not evolve independently. In the early stage after gene duplication, young duplicates may take over a certain amount of protein function diversity that previously was carried out by the alternative splicing mechanism. In the late stage, the gain and loss of alternative splicing seem to be independent between duplicates.
Collapse
Affiliation(s)
- Zhixi Su
- James D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310008, China
| | | | | | | | | |
Collapse
|
40
|
Kim N, Lim D, Lee S, Kim H. ASePCR: alternative splicing electronic RT-PCR in multiple tissues and organs. Nucleic Acids Res 2005; 33:W681-5. [PMID: 15980562 PMCID: PMC1160168 DOI: 10.1093/nar/gki407] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2005] [Revised: 03/21/2005] [Accepted: 03/21/2005] [Indexed: 11/13/2022] Open
Abstract
RT-PCR is one of the most powerful and direct methods to detect transcript variants due to alternative splicing (AS) that increase transcript diversity significantly in vertebrates. ASePCR is an efficient web-based application that emulates RT-PCR in various tissues. It estimates the amplicon size for a given primer pair based on the transcript models identified by the reverse e-PCR program of the NCBI. The tissue specificity of each PCR band is deduced from the tissue information of expressed sequence tag (EST) sequences compatible with each transcript structure. The output page shows PCR bands like a gel electrophoresis in various tissues. Each band in the output picture represents a putative isoform that could happen in a tissue-specific manner. It also shows the EST alignment and tissue information in the genome browser. Furthermore, the user can compare the AS patterns of orthologous genes in other species. The ASePCR, available at http://genome.ewha.ac.kr/ASePCR/, supports the transcriptome models of the RefSeq, Ensembl, ECgene and AceView for human, mouse, rat and chicken genomes. It will be a valuable web resource to explore the transcriptome diversity associated with different tissues and organs in multiple species.
Collapse
Affiliation(s)
- Namshin Kim
- Division of Molecular Life Sciences, Ewha Womans UniversitySeoul 120-750, Korea
| | - Dajeong Lim
- School of Agricultural Biotechnology, Seoul National UniversitySeoul 151-742, Korea
- Division of Molecular Life Sciences, Ewha Womans UniversitySeoul 120-750, Korea
| | - Sanghyuk Lee
- Division of Molecular Life Sciences, Ewha Womans UniversitySeoul 120-750, Korea
| | - Heebal Kim
- To whom correspondence should be addressed. Tel: +82 28804803; Fax: +82 28732271;
| |
Collapse
|
41
|
Kim N, Shin S, Lee S. ECgene: genome-based EST clustering and gene modeling for alternative splicing. Genome Res 2005; 15:566-76. [PMID: 15805497 PMCID: PMC1074371 DOI: 10.1101/gr.3030405] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
With the availability of the human genome map and fast algorithms for sequence alignment, genome-based EST clustering became a viable method for gene modeling. We developed a novel gene-modeling method, ECgene (Gene modeling by EST Clustering), which combines genome-based EST clustering and the transcript assembly procedure in a coherent and consistent fashion. Specifically, ECgene takes alternative splicing events into consideration. The position of splice sites (i.e., exon-intron boundaries) in the genome map is utilized as the critical information in the whole procedure. Sequences that share any splice sites are grouped together to define an EST cluster in a manner similar to that of the genome-based version of the UniGene algorithm. Transcript assembly is achieved using graph theory that represents the exon connectivity in each cluster as a directed acyclic graph (DAG). Distinct paths along exons correspond to possible gene models encompassing all alternative splicing events. EST sequences in each cluster are subclustered further according to the compatibility with gene structure of each splice variant, and they can be regarded as clone evidence for the corresponding isoform. The reliability of each isoform is assessed from the nature of cluster members and from the minimum number of clones required to reconstruct all exons in the transcript.
Collapse
Affiliation(s)
- Namshin Kim
- Division of Molecular Life Sciences, Ewha Womans University, Seoul 120-750, Korea
| | | | | |
Collapse
|