Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zambelli F, Pavesi G, Gissi C, Horner DS, Pesole G. Assessment of orthologous splicing isoforms in human and mouse orthologous genes. BMC Genomics 2010;11:534. [PMID: 20920313 PMCID: PMC3091683 DOI: 10.1186/1471-2164-11-534] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2010] [Accepted: 10/01/2010] [Indexed: 11/22/2022] Open

For:	Zambelli F, Pavesi G, Gissi C, Horner DS, Pesole G. Assessment of orthologous splicing isoforms in human and mouse orthologous genes. BMC Genomics 2010;11:534. [PMID: 20920313 PMCID: PMC3091683 DOI: 10.1186/1471-2164-11-534] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2010] [Accepted: 10/01/2010] [Indexed: 11/22/2022] Open

Number

Cited by Other Article(s)

Ouedraogo WYDD, Ouangraoua A. Orthology and Paralogy Relationships at Transcript Level. J Comput Biol 2024;31:277-293. [PMID: 38621191 DOI: 10.1089/cmb.2023.0400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024] Open

Santos LGC, Parreira VDSC, da Silva EMG, Santos MDM, Fernandes ADF, Neves-Ferreira AGDC, Carvalho PC, Freitas FCDP, Passetti F. SpliceProt 2.0: A Sequence Repository of Human, Mouse, and Rat Proteoforms. Int J Mol Sci 2024;25:1183. [PMID: 38256255 PMCID: PMC10816255 DOI: 10.3390/ijms25021183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/15/2023] [Accepted: 01/03/2024] [Indexed: 01/24/2024] Open

Affiliation(s)

Letícia Graziela Costa Santos Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
Vinícius da Silva Coutinho Parreira Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
Esdras Matheus Gomes da Silva Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil Laboratory of Toxinology, Oswaldo Cruz Institute, Fundação Oswaldo Cruz (FIOCRUZ), Av. Brazil 4036, Campus Maré, Rio de Janeiro 21040-361, RJ, Brazil
Marlon Dias Mariano Santos Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
Alexander da Franca Fernandes Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
Ana Gisele da Costa Neves-Ferreira Laboratory of Toxinology, Oswaldo Cruz Institute, Fundação Oswaldo Cruz (FIOCRUZ), Av. Brazil 4036, Campus Maré, Rio de Janeiro 21040-361, RJ, Brazil
Paulo Costa Carvalho Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
Flávia Cristina de Paula Freitas Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil Departamento de Genética e Evolução, Universidade Federal de São Carlos (UFSCar), Rodovia Washington Luis, Km 235, São Carlos 13565-905, SP, Brazil
Fabio Passetti Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil

Collapse

Ma J, Wu JY, Zhu L. Detection of orthologous exons and isoforms using EGIO. Bioinformatics 2022;38:4474-4480. [PMID: 35946527 PMCID: PMC9525004 DOI: 10.1093/bioinformatics/btac548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 06/15/2022] [Accepted: 08/05/2022] [Indexed: 12/24/2022] Open

Guillaudeux N, Belleannée C, Blanquart S. Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog. BMC Genomics 2022;23:216. [PMID: 35303798 PMCID: PMC8933948 DOI: 10.1186/s12864-022-08429-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 02/07/2022] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

In eukaryote transcriptomes, a significant amount of transcript diversity comes from genes' capacity to generate different transcripts through alternative splicing. Identifying orthologous alternative transcripts across multiple species is of particular interest for genome annotators. However, there is no formal definition of transcript orthology based on the splicing structure conservation. Likewise there is no public dataset benchmark providing groups of orthologous transcripts sharing a conserved splicing structure.

RESULTS

We introduced a formal definition of splicing structure orthology and we predicted transcript orthologs in human, mouse and dog. Applying a selective strategy, we analyzed 2,167 genes and their 18,109 known transcripts and identified a set of 253 gene orthologs that shared a conserved splicing structure in all three species. We predicted 6,861 transcript CDSs (coding sequence), mainly for dog, an emergent model species. Each predicted transcript was an ortholog of a known transcript: both share the same CDS splicing structure. Evidence for the existence of the predicted CDSs was found in external data.

CONCLUSIONS

We generated a dataset of 253 gene triplets, structurally conserved and sharing all their CDSs in human, mouse and dog, which correspond to 879 triplets of spliced CDS orthologs. We have released the dataset both as an SQL database and as tabulated files. The data consists of the 879 CDS orthology groups with their detailed splicing structures, and the predicted CDSs, associated with their experimental evidence. The 6,861 predicted CDSs are provided in GTF files. Our data may contribute to compare highly conserved genes across three species, for comparative transcriptomics at the isoform level, or for benchmarking splice aligners and methods focusing on the identification of splicing orthologs. The data is available at https://data-access.cesgo.org/index.php/s/V97GXxOS66NqTkZ .

Collapse

Reinhardt F, Stadler PF. ExceS-A: an exon-centric split aligner. J Integr Bioinform 2022;19:jib-2021-0040. [PMID: 35254744 PMCID: PMC9069663 DOI: 10.1515/jib-2021-0040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 01/12/2022] [Indexed: 11/25/2022] Open

Jammali S, Djossou A, Ouédraogo WYDD, Nevers Y, Chegrane I, Ouangraoua A. From pairwise to multiple spliced alignment. BIOINFORMATICS ADVANCES 2022;2:vbab044. [PMID: 36699392 PMCID: PMC9710695 DOI: 10.1093/bioadv/vbab044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 11/25/2021] [Indexed: 01/28/2023]

The MAGOH paralogs - MAGOH, MAGOHB and their multiple isoforms. GENE REPORTS 2021. [DOI: 10.1016/j.genrep.2021.101214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Yildirim A, Mozaffari-Jovin S, Wallisch AK, Schäfer J, Ludwig SEJ, Urlaub H, Lührmann R, Wolfrum U. SANS (USH1G) regulates pre-mRNA splicing by mediating the intra-nuclear transfer of tri-snRNP complexes. Nucleic Acids Res 2021;49:5845-5866. [PMID: 34023904 PMCID: PMC8191790 DOI: 10.1093/nar/gkab386] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 04/22/2021] [Accepted: 04/28/2021] [Indexed: 02/06/2023] Open

Chakraborty A, Ay F, Davuluri RV. ExTraMapper: Exon- and Transcript-level mappings for orthologous gene pairs. Bioinformatics 2021;37:3412-3420. [PMID: 34014317 PMCID: PMC8545320 DOI: 10.1093/bioinformatics/btab393] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 04/27/2021] [Accepted: 05/19/2021] [Indexed: 12/13/2022] Open

Sulakhe D, D'Souza M, Wang S, Balasubramanian S, Athri P, Xie B, Canzar S, Agam G, Gilliam TC, Maltsev N. Exploring the functional impact of alternative splicing on human protein isoforms using available annotation sources. Brief Bioinform 2020;20:1754-1768. [PMID: 29931155 DOI: 10.1093/bib/bby047] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/02/2018] [Indexed: 12/30/2022] Open

Association Study of Puberty-Related Candidate Genes in Chinese Female Population. Int J Genomics 2020;2020:1426761. [PMID: 32566640 PMCID: PMC7285286 DOI: 10.1155/2020/1426761] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Revised: 03/18/2020] [Accepted: 04/27/2020] [Indexed: 01/05/2023] Open

Kuitche E, Jammali S, Ouangraoua A. SimSpliceEvol: alternative splicing-aware simulation of biological sequence evolution. BMC Bioinformatics 2019;20:640. [PMID: 31842741 PMCID: PMC6916212 DOI: 10.1186/s12859-019-3207-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open

Jammali S, Aguilar JD, Kuitche E, Ouangraoua A. SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups. BMC Bioinformatics 2019;20:133. [PMID: 30925859 PMCID: PMC6439985 DOI: 10.1186/s12859-019-2647-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open

Abstract

BACKGROUND

The inference of splicing orthology relationships between gene transcripts is a basic step for the prediction of transcripts and the annotation of gene structures in genomes. The splicing structure of a sequence refers to the exon extremity information in a CDS or the exon-intron extremity information in a gene sequence. Splicing orthologous CDS are pairs of CDS with similar sequences and conserved splicing structures from orthologous genes. Spliced alignment that consists in aligning a spliced cDNA sequence against an unspliced genomic sequence, constitutes a promising, yet unexplored approach for the identification of splicing orthology relationships. Existing spliced alignment algorithms do not exploit the information on the splicing structure of the input sequences, namely the exon structure of the cDNA sequence and the exon-intron structure of the genomic sequences. Yet, this information is often available for coding DNA sequences (CDS) and gene sequences annotated in databases, and it can help improve the accuracy of the computed spliced alignments. To address this issue, we introduce a new spliced alignment problem and a method called SplicedFamAlign (SFA) for computing the alignment of a spliced CDS against a gene sequence while accounting for the splicing structures of the input sequences, and then the inference of transcript splicing orthology groups in a gene family based on spliced alignments.

RESULTS

The experimental results show that SFA outperforms existing spliced alignment methods in terms of accuracy and execution time for CDS-to-gene alignment. We also show that the performance of SFA remains high for various levels of sequence similarity between input sequences, thanks to accounting for the splicing structure of the input sequences. It is important to notice that unlike all current spliced alignment methods that are meant for cDNA-to-genome alignments and can be used for CDS-to-gene alignments, SFA is the first method specifically designed for CDS-to-gene alignments.

CONCLUSION

We show the usefulness of SFA for the comparison of genes and transcripts within a gene family for the purpose of analyzing splicing orthologies. It can also be used for gene structure annotation and alternative splicing analyses. SplicedFamAlign was implemented in Python. Source code is freely available at https://github.com/UdeS-CoBIUS/SpliceFamAlign .

Collapse

Chen X, Wang S, Zhou Y, Han Y, Li S, Xu Q, Xu L, Zhu Z, Deng Y, Yu L, Song L, Chen AP, Song J, Takahashi E, He G, He L, Li W, Chen CD. Phf8 histone demethylase deficiency causes cognitive impairments through the mTOR pathway. Nat Commun 2018;9:114. [PMID: 29317619 PMCID: PMC5760733 DOI: 10.1038/s41467-017-02531-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2016] [Accepted: 12/06/2017] [Indexed: 12/16/2022] Open

Affiliation(s)

Xuemei Chen Bio-X Institutes, Key Laboratory for the Genetics of Development and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China.,State Key Laboratory of Molecular Biology, Shanghai Key laboratory of Molecular Andrology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.,Department of Anesthesiology, Ren Ji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
Shuai Wang Bio-X Institutes, Key Laboratory for the Genetics of Development and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
Ying Zhou Bio-X Institutes, Key Laboratory for the Genetics of Development and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
Yanfei Han Bio-X Institutes, Key Laboratory for the Genetics of Development and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China.,Discipline of Neuroscience and Department of Anatomy and Physiology, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200025, China
Shengtian Li Bio-X Institutes, Key Laboratory for the Genetics of Development and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
Qing Xu State Key Laboratory of Molecular Biology, Shanghai Key laboratory of Molecular Andrology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
Longyong Xu State Key Laboratory of Molecular Biology, Shanghai Key laboratory of Molecular Andrology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
Ziqi Zhu State Key Laboratory of Molecular Biology, Shanghai Key laboratory of Molecular Andrology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
Youming Deng State Key Laboratory of Molecular Biology, Shanghai Key laboratory of Molecular Andrology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
Lu Yu State Key Laboratory of Molecular Biology, Shanghai Key laboratory of Molecular Andrology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
Lulu Song Bio-X Institutes, Key Laboratory for the Genetics of Development and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
Adele Pin Chen State Key Laboratory of Molecular Biology, Shanghai Key laboratory of Molecular Andrology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
Juan Song Department of Pharmacology and Neuroscience Center, University of North Carolina School of Medicine, Chapel Hill, NC, 27514, USA
Eiki Takahashi Research Resources Center, RIKEN Brain Science Institute, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
Guang He Bio-X Institutes, Key Laboratory for the Genetics of Development and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
Lin He Bio-X Institutes, Key Laboratory for the Genetics of Development and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
Weidong Li Bio-X Institutes, Key Laboratory for the Genetics of Development and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China.
Charlie Degui Chen State Key Laboratory of Molecular Biology, Shanghai Key laboratory of Molecular Andrology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.

Collapse

Kuitche E, Lafond M, Ouangraoua A. Reconstructing protein and gene phylogenies using reconciliation and soft-clustering. J Bioinform Comput Biol 2017;15:1740007. [DOI: 10.1142/s0219720017400078] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Breschi A, Gingeras TR, Guigó R. Comparative transcriptomics in human and mouse. Nat Rev Genet 2017;18:425-440. [PMID: 28479595 PMCID: PMC6413734 DOI: 10.1038/nrg.2017.19] [Citation(s) in RCA: 150] [Impact Index Per Article: 21.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Jammali S, Kuitche E, Rachati A, Bélanger F, Scott M, Ouangraoua A. Aligning coding sequences with frameshift extension penalties. Algorithms Mol Biol 2017;12:10. [PMID: 28373895 PMCID: PMC5374649 DOI: 10.1186/s13015-017-0101-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Accepted: 03/18/2017] [Indexed: 11/15/2022] Open

Abstract

BACKGROUND

Frameshift translation is an important phenomenon that contributes to the appearance of novel coding DNA sequences (CDS) and functions in gene evolution, by allowing alternative amino acid translations of gene coding regions. Frameshift translations can be identified by aligning two CDS, from a same gene or from homologous genes, while accounting for their codon structure. Two main classes of algorithms have been proposed to solve the problem of aligning CDS, either by amino acid sequence alignment back-translation, or by simultaneously accounting for the nucleotide and amino acid levels. The former does not allow to account for frameshift translations and up to now, the latter exclusively accounts for frameshift translation initiation, not considering the length of the translation disruption caused by a frameshift.

RESULTS

We introduce a new scoring scheme with an algorithm for the pairwise alignment of CDS accounting for frameshift translation initiation and length, while simultaneously considering nucleotide and amino acid sequences. The main specificity of the scoring scheme is the introduction of a penalty cost accounting for frameshift extension length to compute an adequate similarity score for a CDS alignment. The second specificity of the model is that the search space of the problem solved is the set of all feasible alignments between two CDS. Previous approaches have considered restricted search space or additional constraints on the decomposition of an alignment into length-3 sub-alignments. The algorithm described in this paper has the same asymptotic time complexity as the classical Needleman-Wunsch algorithm.

CONCLUSIONS

We compare the method to other CDS alignment methods based on an application to the comparison of pairs of CDS from homologous human, mouse and cow genes of ten mammalian gene families from the Ensembl-Compara database. The results show that our method is particularly robust to parameter changes as compared to existing methods. It also appears to be a good compromise, performing well both in the presence and absence of frameshift translations. An implementation of the method is available at https://github.com/UdeS-CoBIUS/FsePSA.

Collapse

Blanquart S, Varré JS, Guertin P, Perrin A, Bergeron A, Swenson KM. Assisted transcriptome reconstruction and splicing orthology. BMC Genomics 2016;17:786. [PMID: 28185551 PMCID: PMC5123294 DOI: 10.1186/s12864-016-3103-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open

Abstract

Background

Transcriptome reconstruction, defined as the identification of all protein isoforms that may be expressed by a gene, is a notably difficult computational task. With real data, the best methods based on RNA-seq data identify barely 21 % of the expressed transcripts. While waiting for algorithms and sequencing techniques to improve — as has been strongly suggested in the literature — it is important to evaluate assisted transcriptome prediction; this is the question of how alternative transcription in one species performs as a predictor of protein isoforms in another relatively close species. Most evidence-based gene predictors use transcripts from other species to annotate a genome, but the predictive power of procedures that use exclusively transcripts from external species has never been quantified. The cornerstone of such an evaluation is the correct identification of pairs of transcripts with the same splicing patterns, called splicing orthologs.

Results

We propose a rigorous procedural definition of splicing orthologs, based on the identification of all ortholog pairs of splicing sites in the nucleotide sequences, and alignments at the protein level. Using our definition, we compared 24 382 human transcripts and 17 909 mouse transcripts from the highly curated CCDS database, and identified 11 122 splicing orthologs. In prediction mode, we show that human transcripts can be used to infer over 62 % of mouse protein isoforms. When restricting the predictions to transcripts known eight years ago, the percentage grows to 74 %. Using CCDS timestamped releases, we also analyze the evolution of the number of splicing orthologs over the last decade.

Conclusions

Alternative splicing is now recognized to play a major role in the protein diversity of eukaryotic organisms, but definitions of spliced isoform orthologs are still approximate. Here we propose a definition adapted to the subtle variations of conserved alternative splicing sites, and use it to validate numerous accurate orthologous isoform predictions.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-016-3103-6) contains supplementary material, which is available to authorized users.

Collapse

Chen J, Hackett CS, Zhang S, Song YK, Bell RJA, Molinaro AM, Quigley DA, Balmain A, Song JS, Costello JF, Gustafson WC, Van Dyke T, Kwok PY, Khan J, Weiss WA. The genetics of splicing in neuroblastoma. Cancer Discov 2015;5:380-95. [PMID: 25637275 PMCID: PMC4390477 DOI: 10.1158/2159-8290.cd-14-0892] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Accepted: 01/26/2015] [Indexed: 02/06/2023]

Affiliation(s)

Justin Chen Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, California. Department of Neurology, University of California, San Francisco, San Francisco, California. Department of Neurosurgery, University of California, San Francisco, San Francisco, California
Christopher S Hackett Department of Neurology, University of California, San Francisco, San Francisco, California. Department of Neurosurgery, University of California, San Francisco, San Francisco, California
Shile Zhang Program in Bioinformatics, Boston University, Boston, Massachusetts. Oncogenomics Section, Pediatric Oncology Branch, National Cancer Institute, Bethesda, Maryland
Young K Song Oncogenomics Section, Pediatric Oncology Branch, National Cancer Institute, Bethesda, Maryland
Robert J A Bell Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, California. Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, California
Annette M Molinaro Department of Neurology, University of California, San Francisco, San Francisco, California. Department of Neurosurgery, University of California, San Francisco, San Francisco, California. Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, California. Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, California
David A Quigley Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, California. Institute for Cancer Research, Oslo, Norway
Allan Balmain Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, California
Jun S Song Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, California. Department of Bioengineering, University of Illinois, Urbana-Champaign, Urbana, Illinois. Department of Physics, University of Illinois, Urbana-Champaign, Urbana, Illinois
Joseph F Costello Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, California
W Clay Gustafson Department of Pediatrics, University of California, San Francisco, San Francisco, California
Terry Van Dyke Mouse Cancer Genetics Program, Center for Advanced Preclinical Research, National Cancer Institute, Frederick, Maryland
Pui-Yan Kwok Institute for Human Genetics, University of California, San Francisco, San Francisco, California. Department of Dermatology, University of California, San Francisco, San Francisco, California. Cardiovascular Research Institute, University of California, San Francisco, San Francisco, California
Javed Khan Oncogenomics Section, Pediatric Oncology Branch, National Cancer Institute, Bethesda, Maryland
William A Weiss Department of Neurology, University of California, San Francisco, San Francisco, California. Department of Neurosurgery, University of California, San Francisco, San Francisco, California. Department of Pediatrics, University of California, San Francisco, San Francisco, California.

Collapse

Whitney IE, Kautzman AG, Reese BE. Alternative splicing of the LIM-homeodomain transcription factor Isl1 in the mouse retina. Mol Cell Neurosci 2015;65:102-13. [PMID: 25752730 DOI: 10.1016/j.mcn.2015.03.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2014] [Revised: 02/12/2015] [Accepted: 03/05/2015] [Indexed: 11/25/2022] Open

Abstract

Islet-1 (Isl1) is a LIM-homeodomain (LIM-HD) transcription factor that functions in a combinatorial manner with other LIM-HD proteins to direct the differentiation of distinct cell types within the central nervous system and many other tissues. A study of pancreatic cell lines showed that Isl1 is alternatively spliced generating a second isoform, Isl1β, which is missing 23 amino acids within the C-terminal region. This study examines the expression of the canonical and alternative Isl1 transcripts across other tissues, in particular, within the retina, where Isl1 is required for the differentiation of multiple neuronal cell types. The alternative splicing of Isl1 is shown to occur in multiple tissues, but the relative abundance of Isl1α and Isl1β expression varies greatly across them. In most tissues, Isl1α is the more abundant transcript, but in others the transcripts are expressed equally, or the alternative splice variant is dominant. Within the retina, differential expression of the two Isl1 transcripts increases as a function of development, with dynamic changes in expression peaking at E16.5 and again at P10. At the cellular level, individual retinal ganglion cells vary in their expression, with a subset of small-to-medium sized cells expressing only the alternative isoform. The functional significance of the difference in protein sequence between the two Isl1 isoforms was also assessed using a luciferase assay, demonstrating that the alternative isoform forms a less effective transcriptional complex for activating gene expression. These results demonstrate the differential presence of the canonical and alternative isoforms of Isl1 amongst retinal ganglion cell classes. As Isl1 participates in the differentiation of multiple cell types within the CNS, the present results support a role for alternative splicing in the establishment of cellular diversity in the developing nervous system.

Collapse

Gu J, Lu Y, Qiao L, Ran D, Li N, Cao H, Gao Y, Zheng Q. Mouse p63 variants and chondrogenesis. INTERNATIONAL JOURNAL OF CLINICAL AND EXPERIMENTAL PATHOLOGY 2013;6:2872-2879. [PMID: 24294373 PMCID: PMC3843267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 10/09/2013] [Accepted: 11/08/2013] [Indexed: 06/02/2023]

Spangenberg L, Correa A, Dallagiovanna B, Naya H. Role of alternative polyadenylation during adipogenic differentiation: an in silico approach. PLoS One 2013;8:e75578. [PMID: 24143171 PMCID: PMC3797115 DOI: 10.1371/journal.pone.0075578] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 08/14/2013] [Indexed: 01/22/2023] Open

Fong JH, Murphy TD, Pruitt KD. Comparison of RefSeq protein-coding regions in human and vertebrate genomes. BMC Genomics 2013;14:654. [PMID: 24063302 PMCID: PMC3882889 DOI: 10.1186/1471-2164-14-654] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2013] [Accepted: 08/22/2013] [Indexed: 12/25/2022] Open

Abstract

BACKGROUND

Advances in high-throughput sequencing technology have yielded a large number of publicly available vertebrate genomes, many of which are selected for inclusion in NCBI's RefSeq project and subsequently processed by NCBI's eukaryotic annotation pipeline. Genome annotation results are affected by differences in available support evidence and may be impacted by annotation pipeline software changes over time. The RefSeq project has not previously assessed annotation trends across organisms or over time. To address this deficiency, we have developed a comparative protocol which integrates analysis of annotated protein-coding regions across a data set of vertebrate orthologs in genomic sequence coordinates, protein sequences, and protein features.

RESULTS

We assessed an ortholog dataset that includes 34 annotated vertebrate RefSeq genomes including human. We confirm that RefSeq protein-coding gene annotations in mammals exhibit considerable similarity. Over 50% of the orthologous protein-coding genes in 20 organisms are supported at the level of splicing conservation with at least three selected reference genomes. Approximately 7,500 ortholog sets include at least half of the analyzed organisms, show highly similar sequence and conserved splicing, and may serve as a minimal set of mammalian "core proteins" for initial assessment of new mammalian genomes. Additionally, 80% of the proteins analyzed pass a suite of tests to detect proteins that lack splicing conservation and have unusual sequence or domain annotation. We use these tests to define an annotation quality metric that is based directly on the annotated proteins thus operates independently of other quality metrics such as availability of transcripts or assembly quality measures. Results are available on the RefSeq FTP site [http://ftp.ncbi.nlm.nih.gov/refseq/supplemental/ProtCore/SM1.txt].

CONCLUSIONS

Our multi-factored analysis demonstrates a high level of consistency in RefSeq protein representation among vertebrates. We find that the majority of the RefSeq vertebrate proteins for which we have calculated orthology are good as measured by these metrics. The process flow described provides specific information on the scope and degree of conservation for the analyzed protein sequences and annotations and will be used to enrich the quality of RefSeq records by identifying targets for further improvement in the computational annotation pipeline, and by flagging specific genes for manual curation.

Collapse

Villanueva-Cañas JL, Laurie S, Albà MM. Improving genome-wide scans of positive selection by using protein isoforms of similar length. Genome Biol Evol 2013;5:457-67. [PMID: 23377868 PMCID: PMC3590775 DOI: 10.1093/gbe/evt017] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open

Abstract

Large-scale evolutionary studies often require the automated construction of alignments of a large number of homologous gene families. The majority of eukaryotic genes can produce different transcripts due to alternative splicing or transcription initiation, and many such transcripts encode different protein isoforms. As analyses tend to be gene centered, one single-protein isoform per gene is selected for the alignment, with the de facto approach being to use the longest protein isoform per gene (Longest), presumably to avoid including partial sequences and to maximize sequence information. Here, we show that this approach is problematic because it increases the number of indels in the alignments due to the inclusion of nonhomologous regions, such as those derived from species-specific exons, increasing the number of misaligned positions. With the aim of ameliorating this problem, we have developed a novel heuristic, Protein ALignment Optimizer (PALO), which, for each gene family, selects the combination of protein isoforms that are most similar in length. We examine several evolutionary parameters inferred from alignments in which the only difference is the method used to select the protein isoform combination: Longest, PALO, the combination that results in the highest sequence conservation, and a randomly selected combination. We observe that Longest tends to overestimate both nonsynonymous and synonymous substitution rates when compared with PALO, which is most likely due to an excess of misaligned positions. The estimation of the fraction of genes that have experienced positive selection by maximum likelihood is very sensitive to the method of isoform selection employed, both when alignments are constructed with MAFFT and with Prank_+F. Longest performs better than a random combination but still estimates up to 3 times more positively selected genes than the combination showing the highest conservation, indicating the presence of many false positives. We show that PALO can eliminate the majority of such false positives and thus that it is a more appropriate approach for large-scale analyses than Longest. A web server has been set up to facilitate the use of PALO given a user-defined set of gene families; it is available at http://evolutionarygenomics.imim.es/palo.

Collapse

Koumandou VL, Scorilas A. Evolution of the plasma and tissue kallikreins, and their alternative splicing isoforms. PLoS One 2013;8:e68074. [PMID: 23874499 PMCID: PMC3707919 DOI: 10.1371/journal.pone.0068074] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2012] [Accepted: 05/25/2013] [Indexed: 12/14/2022] Open

Abstract

Kallikreins are secreted serine proteases with important roles in human physiology. Human plasma kallikrein, encoded by the KLKB1 gene on locus 4q34-35, functions in the blood coagulation pathway, and in regulating blood pressure. The human tissue kallikrein and kallikrein-related peptidases (KLKs) have diverse expression patterns and physiological roles, including cancer-related processes such as cell growth regulation, angiogenesis, invasion, and metastasis. Prostate-specific antigen (PSA), the product of the KLK3 gene, is the most widely used biomarker in clinical practice today. A total of 15 KLKs are encoded by the largest contiguous cluster of protease genes in the human genome (19q13.3-13.4), which makes them ideal for evolutionary analysis of gene duplication events. Previous studies on the evolution of KLKs have traced mammalian homologs as well as a probable early origin of the family in aves, amphibia and reptilia. The aim of this study was to address the evolutionary and functional relationships between tissue KLKs and plasma kallikrein, and to examine the evolution of alternative splicing isoforms. Sequences of plasma and tissue kallikreins and their alternative transcripts were collected from the NCBI and Ensembl databases, and comprehensive phylogenetic analysis was performed by Bayesian as well as maximum likelihood methods. Plasma and tissue kallikreins exhibit high sequence similarity in the trypsin domain (>50%). Phylogenetic analysis indicates an early divergence of KLKB1, which groups closely with plasminogen, chymotrypsin, and complement factor D (CFD), in a monophyletic group distinct from trypsin and the tissue KLKs. Reconstruction of the earliest events leading to the diversification of the tissue KLKs is not well resolved, indicating rapid expansion in mammals. Alternative transcripts of each KLK gene show species-specific divergence, while examination of sequence conservation indicates that many annotated human KLK isoforms are missing the catalytic triad that is crucial for protease activity.

Collapse

Fu GCL, Lin WC. Identification of gene-oriented exon orthology between human and mouse. BMC Genomics 2012;13 Suppl 1:S10. [PMID: 22369432 PMCID: PMC3303729 DOI: 10.1186/1471-2164-13-s1-s10] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open

Abstract

BACKGROUND

Gene orthology has been well studied in the evolutionary area and is thought to be an important implication to functional genome annotations. As the accumulation of transcriptomic data, alternative splicing is taken into account in the assignments of gene orthologs and the orthology is suggested to be further considered at transcript level. Whether gene or transcript orthology, exons are the basic units that represent the whole gene structure; however, there is no any reported study on how to build exon level orthology in a whole genome scale. Therefore, it is essential to establish a gene-oriented exon orthology dataset.

RESULTS

Using a customized pipeline, we first build exon orthologous relationships from assigned gene orthologs pairs in two well-annotated genomes: human and mouse. More than 92% of non-overlapping exons have at least one ortholog between human and mouse and only a small portion of them own more than one ortholog. The exons located in the coding region are more conserved in terms of finding their ortholog counterparts. Within the untranslated region, the 5' UTR seems to have more diversity than the 3' UTR according to exon orthology designations. Interestingly, most exons located in the coding region are also conserved in length but this conservation phenomenon dramatically drops down in untranslated regions. In addition, we allowed multiple assignments in exon orthologs and a subset of exons with possible fusion/split events were defined here after a thorough analysis procedure.

CONCLUSIONS

Identification of orthologs at the exon level is essential to provide a detailed way to interrogate gene orthology and splicing analysis. It could be used to extend the genome annotation as well. Besides examining the one-to-one orthologous relationship, we manage the one-to-multi exon pairs to represent complicated exon generation behavior. Our results can be further applied in many research fields studying intron-exon structure and alternative/constitutive exons in functional genomic areas.

Collapse

Prosdocimi F, Linard B, Pontarotti P, Poch O, Thompson JD. Controversies in modern evolutionary biology: the imperative for error detection and quality control. BMC Genomics 2012;13:5. [PMID: 22217008 PMCID: PMC3311146 DOI: 10.1186/1471-2164-13-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2011] [Accepted: 01/04/2012] [Indexed: 12/03/2022] Open

Gharib WH, Robinson-Rechavi M. When orthologs diverge between human and mouse. Brief Bioinform 2011;12:436-41. [PMID: 21677033 PMCID: PMC3178054 DOI: 10.1093/bib/bbr031] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open