1
|
Ouedraogo WYDD, Ouangraoua A. Orthology and Paralogy Relationships at Transcript Level. J Comput Biol 2024; 31:277-293. [PMID: 38621191 DOI: 10.1089/cmb.2023.0400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024] Open
Abstract
Eukaryotic genes undergo a mechanism called alternative processing, resulting in transcriptome diversity by allowing the production of multiple distinct transcripts from a gene. More than half of human genes are affected, and the resulting transcripts are highly conserved among orthologous genes of distinct species. In this work, we present the definition of orthology and paralogy between transcripts of homologous genes, together with an algorithm to compute clusters of conserved orthologous and paralogous transcripts. Gene-level homology relationships are utilized to define various types of homology relationships between transcripts originating from the same ancestral transcript. A Reciprocal Best Hits approach is employed to infer clusters of isoorthologous and recent paralogous transcripts. We applied this method to transcripts from simulated gene families as well as real gene families from the Ensembl-Compara database. The results are consistent with those from previous studies that compared orthologous gene transcripts. Furthermore, our findings provide evidence that searching for conserved transcripts between homologous genes, beyond the scope of orthologous genes, is likely to yield valuable information.
Collapse
Affiliation(s)
| | - Aida Ouangraoua
- Department of Computer Science, Université de Sherbrooke, Sherbrooke, Quebec, Canada
| |
Collapse
|
2
|
Santos LGC, Parreira VDSC, da Silva EMG, Santos MDM, Fernandes ADF, Neves-Ferreira AGDC, Carvalho PC, Freitas FCDP, Passetti F. SpliceProt 2.0: A Sequence Repository of Human, Mouse, and Rat Proteoforms. Int J Mol Sci 2024; 25:1183. [PMID: 38256255 PMCID: PMC10816255 DOI: 10.3390/ijms25021183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/15/2023] [Accepted: 01/03/2024] [Indexed: 01/24/2024] Open
Abstract
SpliceProt 2.0 is a public proteogenomics database that aims to list the sequence of known proteins and potential new proteoforms in human, mouse, and rat proteomes. This updated repository provides an even broader range of computationally translated proteins and serves, for example, to aid with proteomic validation of splice variants absent from the reference UniProtKB/SwissProt database. We demonstrate the value of SpliceProt 2.0 to predict orthologous proteins between humans and murines based on transcript reconstruction, sequence annotation and detection at the transcriptome and proteome levels. In this release, the annotation data used in the reconstruction of transcripts based on the methodology of ternary matrices were acquired from new databases such as Ensembl, UniProt, and APPRIS. Another innovation implemented in the pipeline is the exclusion of transcripts predicted to be susceptible to degradation through the NMD pathway. Taken together, our repository and its applications represent a valuable resource for the proteogenomics community.
Collapse
Affiliation(s)
- Letícia Graziela Costa Santos
- Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
| | - Vinícius da Silva Coutinho Parreira
- Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
| | - Esdras Matheus Gomes da Silva
- Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
- Laboratory of Toxinology, Oswaldo Cruz Institute, Fundação Oswaldo Cruz (FIOCRUZ), Av. Brazil 4036, Campus Maré, Rio de Janeiro 21040-361, RJ, Brazil
| | - Marlon Dias Mariano Santos
- Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
| | - Alexander da Franca Fernandes
- Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
| | - Ana Gisele da Costa Neves-Ferreira
- Laboratory of Toxinology, Oswaldo Cruz Institute, Fundação Oswaldo Cruz (FIOCRUZ), Av. Brazil 4036, Campus Maré, Rio de Janeiro 21040-361, RJ, Brazil
| | - Paulo Costa Carvalho
- Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
| | - Flávia Cristina de Paula Freitas
- Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
- Departamento de Genética e Evolução, Universidade Federal de São Carlos (UFSCar), Rodovia Washington Luis, Km 235, São Carlos 13565-905, SP, Brazil
| | - Fabio Passetti
- Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
| |
Collapse
|
3
|
Ma J, Wu JY, Zhu L. Detection of orthologous exons and isoforms using EGIO. Bioinformatics 2022; 38:4474-4480. [PMID: 35946527 PMCID: PMC9525004 DOI: 10.1093/bioinformatics/btac548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 06/15/2022] [Accepted: 08/05/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Alternative splicing is an important mechanism to generate transcriptomic and phenotypic diversity. Existing methods have limited power to detect orthologous isoforms. RESULTS We develop a new method, EGIO, to detect orthologous exons and orthologous isoforms from two species. EGIO uses unique exonic regions to construct exon groups, in which process dynamic programming strategy is used to do exon alignment. EGIO could cover all the coding exons within orthologous genes. A comparison between EGIO and ExTraMapper shows that EGIO could detect more orthologous isoforms with conserved sequence and exon structures. We apply EGIO to compare human and chimpanzee protein-coding isoforms expressed in the frontal cortex and identify 6912 genes that express human unique isoforms. Unexpectedly, more human unique isoforms are detected than those conserved between humans and chimpanzees. AVAILABILITY AND IMPLEMENTATION Source code and test data of EGIO are available at https://github.com/wu-lab-egio/EGIO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jinfa Ma
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China,College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Jane Y Wu
- To whom correspondence should be addressed. or
| | - Li Zhu
- To whom correspondence should be addressed. or
| |
Collapse
|
4
|
Guillaudeux N, Belleannée C, Blanquart S. Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog. BMC Genomics 2022; 23:216. [PMID: 35303798 PMCID: PMC8933948 DOI: 10.1186/s12864-022-08429-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 02/07/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In eukaryote transcriptomes, a significant amount of transcript diversity comes from genes' capacity to generate different transcripts through alternative splicing. Identifying orthologous alternative transcripts across multiple species is of particular interest for genome annotators. However, there is no formal definition of transcript orthology based on the splicing structure conservation. Likewise there is no public dataset benchmark providing groups of orthologous transcripts sharing a conserved splicing structure. RESULTS We introduced a formal definition of splicing structure orthology and we predicted transcript orthologs in human, mouse and dog. Applying a selective strategy, we analyzed 2,167 genes and their 18,109 known transcripts and identified a set of 253 gene orthologs that shared a conserved splicing structure in all three species. We predicted 6,861 transcript CDSs (coding sequence), mainly for dog, an emergent model species. Each predicted transcript was an ortholog of a known transcript: both share the same CDS splicing structure. Evidence for the existence of the predicted CDSs was found in external data. CONCLUSIONS We generated a dataset of 253 gene triplets, structurally conserved and sharing all their CDSs in human, mouse and dog, which correspond to 879 triplets of spliced CDS orthologs. We have released the dataset both as an SQL database and as tabulated files. The data consists of the 879 CDS orthology groups with their detailed splicing structures, and the predicted CDSs, associated with their experimental evidence. The 6,861 predicted CDSs are provided in GTF files. Our data may contribute to compare highly conserved genes across three species, for comparative transcriptomics at the isoform level, or for benchmarking splice aligners and methods focusing on the identification of splicing orthologs. The data is available at https://data-access.cesgo.org/index.php/s/V97GXxOS66NqTkZ .
Collapse
|
5
|
Reinhardt F, Stadler PF. ExceS-A: an exon-centric split aligner. J Integr Bioinform 2022; 19:jib-2021-0040. [PMID: 35254744 PMCID: PMC9069663 DOI: 10.1515/jib-2021-0040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 01/12/2022] [Indexed: 11/25/2022] Open
Abstract
Spliced alignments are a key step in the construction of high-quality homology-based annotations of protein sequences. The exon/intron structure, which is computed as part of spliced alignment procedures, often conveys important information for the distinguishing paralogous members of gene families. Here we present an exon-centric pipeline for spliced alignment that is intended in particular for applications that involve exon-by-exon comparisons of coding sequences. We show that the simple, blat-based approach has advantages over established tools in particular for genes with very large introns and applications to fragmented genome assemblies.
Collapse
Affiliation(s)
- Franziska Reinhardt
- Bioinformatics Group, Institute of Computer Science, Interdisciplinary Center of Bioinformatics, Leipzig University, Härtelstraße 16-18, D-04107 Leipzig, Germany
| | - Peter F Stadler
- Bioinformatics Group, Institute of Computer Science, Interdisciplinary Center of Bioinformatics, Leipzig University, Härtelstraße 16-18, D-04107 Leipzig, Germany.,Max-Planck-Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany.,Institute of Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.,Facultad de Ciencias, Universidad National de Colombia, Sede Bogotá, Colombia.,Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| |
Collapse
|
6
|
Jammali S, Djossou A, Ouédraogo WYDD, Nevers Y, Chegrane I, Ouangraoua A. From pairwise to multiple spliced alignment. BIOINFORMATICS ADVANCES 2022; 2:vbab044. [PMID: 36699392 PMCID: PMC9710695 DOI: 10.1093/bioadv/vbab044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 11/25/2021] [Indexed: 01/28/2023]
Abstract
Motivation Alternative splicing is a ubiquitous process in eukaryotes that allows distinct transcripts to be produced from the same gene. Yet, the study of transcript evolution within a gene family is still in its infancy. One prerequisite for this study is the availability of methods to compare sets of transcripts while accounting for their splicing structure. In this context, we generalize the concept of pairwise spliced alignments (PSpAs) to multiple spliced alignments (MSpAs). MSpAs have several important purposes in addition to empowering the study of the evolution of transcripts. For instance, it is a key to improving the prediction of gene models, which is important to solve the growing problem of genome annotation. Despite its essentialness, a formal definition of the concept and methods to compute MSpAs are still lacking. Results We introduce the MSpA problem and the SplicedFamAlignMulti (SFAM) method, to compute the MSpA of a gene family. Like most multiple sequence alignment (MSA) methods that are generally greedy heuristic methods assembling pairwise alignments, SFAM combines all PSpAs of coding DNA sequences and gene sequences of a gene family into an MSpA. It produces a single structure that represents the superstructure and models of the gene family. Using real vertebrate and simulated gene family data, we illustrate the utility of SFAM for computing accurate gene family superstructures, MSAs, inferring splicing orthologous groups and improving gene-model annotations. Availability and implementation The supporting data and implementation of SFAM are freely available at https://github.com/UdeS-CoBIUS/SpliceFamAlignMulti. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Safa Jammali
- Département D’informatique, Faculté des Sciences, Université de Sherbrooke, 2500, boul. de l'Université, Sherbrooke (Québec) J1K 2R1, Canada,Département de Biochimie et de Génomique Fonctionnelle, Faculté de Médecine et des Sciences de la santé, Université de Sherbrooke, 3001, 12e avenue Nord, Sherbrooke (Québec) J1H 5N4, Canada
| | - Abigaïl Djossou
- Département D’informatique, Faculté des Sciences, Université de Sherbrooke, 2500, boul. de l'Université, Sherbrooke (Québec) J1K 2R1, Canada
| | - Wend-Yam D D Ouédraogo
- Département D’informatique, Faculté des Sciences, Université de Sherbrooke, 2500, boul. de l'Université, Sherbrooke (Québec) J1K 2R1, Canada
| | - Yannis Nevers
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland,Department of Computational Biology, University of Lausanne, Lausanne 1015, Switzerland,Center for Integrative Genomics, University of Lausanne, Lausanne 1015, Switzerland
| | - Ibrahim Chegrane
- Département D’informatique, Faculté des Sciences, Université de Sherbrooke, 2500, boul. de l'Université, Sherbrooke (Québec) J1K 2R1, Canada
| | - Aïda Ouangraoua
- Département D’informatique, Faculté des Sciences, Université de Sherbrooke, 2500, boul. de l'Université, Sherbrooke (Québec) J1K 2R1, Canada,To whom correspondence should be addressed.
| |
Collapse
|
7
|
The MAGOH paralogs - MAGOH, MAGOHB and their multiple isoforms. GENE REPORTS 2021. [DOI: 10.1016/j.genrep.2021.101214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
8
|
Yildirim A, Mozaffari-Jovin S, Wallisch AK, Schäfer J, Ludwig SEJ, Urlaub H, Lührmann R, Wolfrum U. SANS (USH1G) regulates pre-mRNA splicing by mediating the intra-nuclear transfer of tri-snRNP complexes. Nucleic Acids Res 2021; 49:5845-5866. [PMID: 34023904 PMCID: PMC8191790 DOI: 10.1093/nar/gkab386] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 04/22/2021] [Accepted: 04/28/2021] [Indexed: 02/06/2023] Open
Abstract
Splicing is catalyzed by the spliceosome, a compositionally dynamic complex assembled stepwise on pre-mRNA. We reveal links between splicing machinery components and the intrinsically disordered ciliopathy protein SANS. Pathogenic mutations in SANS/USH1G lead to Usher syndrome—the most common cause of deaf-blindness. Previously, SANS was shown to function only in the cytosol and primary cilia. Here, we have uncovered molecular links between SANS and pre-mRNA splicing catalyzed by the spliceosome in the nucleus. We show that SANS is found in Cajal bodies and nuclear speckles, where it interacts with components of spliceosomal sub-complexes such as SF3B1 and the large splicing cofactor SON but also with PRPFs and snRNAs related to the tri-snRNP complex. SANS is required for the transfer of tri-snRNPs between Cajal bodies and nuclear speckles for spliceosome assembly and may also participate in snRNP recycling back to Cajal bodies. SANS depletion alters the kinetics of spliceosome assembly, leading to accumulation of complex A. SANS deficiency and USH1G pathogenic mutations affects splicing of genes related to cell proliferation and human Usher syndrome. Thus, we provide the first evidence that splicing dysregulation may participate in the pathophysiology of Usher syndrome.
Collapse
Affiliation(s)
- Adem Yildirim
- Molecular Cell Biology, Institute of Molecular Physiology, Johannes Gutenberg-University of Mainz, Germany
| | - Sina Mozaffari-Jovin
- Department of Cellular Biochemistry, Max-Planck-Institute for Biophysical Chemistry, Goettingen, Germany.,Medical Genetics Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.,Department of Medical Genetics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.,Bioanalytical Mass Spectrometry, Max-Planck-Institute for Biophysical Chemistry, Goettingen, Germany
| | - Ann-Kathrin Wallisch
- Molecular Cell Biology, Institute of Molecular Physiology, Johannes Gutenberg-University of Mainz, Germany
| | - Jessica Schäfer
- Molecular Cell Biology, Institute of Molecular Physiology, Johannes Gutenberg-University of Mainz, Germany
| | - Sebastian E J Ludwig
- Department of Cellular Biochemistry, Max-Planck-Institute for Biophysical Chemistry, Goettingen, Germany
| | - Henning Urlaub
- Bioanalytical Mass Spectrometry, Max-Planck-Institute for Biophysical Chemistry, Goettingen, Germany.,Bioanalytics, Department of Clinical Chemistry, University Medical Center Goettingen, Germany
| | - Reinhard Lührmann
- Department of Cellular Biochemistry, Max-Planck-Institute for Biophysical Chemistry, Goettingen, Germany
| | - Uwe Wolfrum
- Molecular Cell Biology, Institute of Molecular Physiology, Johannes Gutenberg-University of Mainz, Germany
| |
Collapse
|
9
|
Chakraborty A, Ay F, Davuluri RV. ExTraMapper: Exon- and Transcript-level mappings for orthologous gene pairs. Bioinformatics 2021; 37:3412-3420. [PMID: 34014317 PMCID: PMC8545320 DOI: 10.1093/bioinformatics/btab393] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 04/27/2021] [Accepted: 05/19/2021] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Access to large-scale genomics and transcriptomics data from various tissues and cell lines allowed the discovery of wide-spread alternative splicing events and alternative promoter usage in mammalians. Between human and mouse, gene-level orthology is currently present for nearly 16k protein-coding genes spanning a diverse repertoire of over 200k total transcript isoforms. RESULTS Here, we describe a novel method, ExTraMapper, which leverages sequence conservation between exons of a pair of organisms and identifies a fine-scale orthology mapping at the exon and then transcript level. ExTraMapper identifies more than 350k exon mappings, as well as 30k transcript mappings between human and mouse using only sequence and gene annotation information. We demonstrate that ExTraMapper identifies a larger number of exon and transcript mappings compared to previous methods. Further, it identifies exon fusions, splits, and losses due to splice site mutations, and finds mappings between microexons that are previously missed. By reanalysis of RNA-seq data from 13 matched human and mouse tissues, we show that ExTraMapper improves the correlation of transcript-specific expression levels suggesting a more accurate mapping of human and mouse transcripts. We also applied the method to detect conserved exon and transcript pairs between human and rhesus macaque genomes to highlight the point that ExTraMapper is applicable to any pair of organisms that have orthologous gene pairs. AVAILABILITY The source code and the results are available at https://github.com/ay-lab/ExTraMapper and http://ay-lab-tools.lji.org/extramapper. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Ferhat Ay
- La Jolla Institute for Immunology, La Jolla, CA, 92037, USA.,Department of Pediatrics, UC San Diego - School of Medicine, La Jolla, 92093, CA, USA
| | - Ramana V Davuluri
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, 11794, USA
| |
Collapse
|
10
|
Sulakhe D, D'Souza M, Wang S, Balasubramanian S, Athri P, Xie B, Canzar S, Agam G, Gilliam TC, Maltsev N. Exploring the functional impact of alternative splicing on human protein isoforms using available annotation sources. Brief Bioinform 2020; 20:1754-1768. [PMID: 29931155 DOI: 10.1093/bib/bby047] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/02/2018] [Indexed: 12/30/2022] Open
Abstract
In recent years, the emphasis of scientific inquiry has shifted from whole-genome analyses to an understanding of cellular responses specific to tissue, developmental stage or environmental conditions. One of the central mechanisms underlying the diversity and adaptability of the contextual responses is alternative splicing (AS). It enables a single gene to encode multiple isoforms with distinct biological functions. However, to date, the functions of the vast majority of differentially spliced protein isoforms are not known. Integration of genomic, proteomic, functional, phenotypic and contextual information is essential for supporting isoform-based modeling and analysis. Such integrative proteogenomics approaches promise to provide insights into the functions of the alternatively spliced protein isoforms and provide high-confidence hypotheses to be validated experimentally. This manuscript provides a survey of the public databases supporting isoform-based biology. It also presents an overview of the potential global impact of AS on the human canonical gene functions, molecular interactions and cellular pathways.
Collapse
Affiliation(s)
- Dinanath Sulakhe
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| | - Mark D'Souza
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA
| | - Sheng Wang
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, USA
| | - Sandhya Balasubramanian
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Genentech, Inc. 1 DNA Way, Mail Stop: 35-6J, South San Francisco, CA, USA
| | - Prashanth Athri
- Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, Kasavanahalli, Carmelaram P.O., Bengaluru, Karnataka, India
| | - Bingqing Xie
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
| | - Stefan Canzar
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, USA.,Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Gady Agam
- Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
| | - T Conrad Gilliam
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| | - Natalia Maltsev
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| |
Collapse
|
11
|
Association Study of Puberty-Related Candidate Genes in Chinese Female Population. Int J Genomics 2020; 2020:1426761. [PMID: 32566640 PMCID: PMC7285286 DOI: 10.1155/2020/1426761] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Revised: 03/18/2020] [Accepted: 04/27/2020] [Indexed: 01/05/2023] Open
Abstract
Puberty is a transition period where a child transforms to an adult. Puberty can be affected by various genetic factors and environmental influences. In mammals, the regulation of puberty is enhanced by the hypothalamic-pituitary-gonadal axis (HPG axis). A number of genes such as GnRH, Kiss1, and GPR54 have been reported as key regulators of puberty onset. In this study, we have conducted an association study of puberty-related candidate genes in Chinese female population. Gene variations reported to be related with some traits in a population may not exist in others due to different genetic and ethnic backgrounds, hence the need for this kind of study. The genotyping of SNPs was based on multiplex PCR and the next-generation sequencing (NGS) platform of Illumina. We finally performed association study using PLINK software. Our results confirmed that SNPs rs34787247 in LIN28, rs74795793 and rs9347389 in OCT-1, and rs379202 and rs10491080 in ZEB1 genes showed a significant association with puberty. With the result, it is reasonable to conclude that these genes affect the process of puberty in Shanghai Chinese female population, yet the mechanism remains to be investigated by further study.
Collapse
|
12
|
Kuitche E, Jammali S, Ouangraoua A. SimSpliceEvol: alternative splicing-aware simulation of biological sequence evolution. BMC Bioinformatics 2019; 20:640. [PMID: 31842741 PMCID: PMC6916212 DOI: 10.1186/s12859-019-3207-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Background It is now well established that eukaryotic coding genes have the ability to produce more than one type of transcript thanks to the mechanisms of alternative splicing and alternative transcription. Because of the lack of gold standard real data on alternative splicing, simulated data constitute a good option for evaluating the accuracy and the efficiency of methods developed for splice-aware sequence analysis. However, existing sequence evolution simulation methods do not model alternative splicing, and so they can not be used to test spliced sequence analysis methods. Results We propose a new method called SimSpliceEvol for simulating the evolution of sets of alternative transcripts along the branches of an input gene tree. In addition to traditional sequence evolution events, the simulation also includes gene exon-intron structure evolution events and alternative splicing events that modify the sets of transcripts produced from genes. SimSpliceEvol was implemented in Python. The source code is freely available at https://github.com/UdeS-CoBIUS/SimSpliceEvol. Conclusions Data generated using SimSpliceEvol are useful for testing spliced RNA sequence analysis methods such as methods for spliced alignment of cDNA and genomic sequences, multiple cDNA alignment, orthologous exons identification, splicing orthology inference, transcript phylogeny inference, which requires to know the real evolutionary relationships between the sequences.
Collapse
Affiliation(s)
- Esaie Kuitche
- Department of Computer Science, University of Sherbrooke, 2500 Boulevard de l'Université, Quebec, J1K2R1, Canada.
| | - Safa Jammali
- Department of Computer Science, University of Sherbrooke, 2500 Boulevard de l'Université, Quebec, J1K2R1, Canada.,Department of Biochemistry, University of Sherbrooke, 3001 12e avenue Nord, Quebec, J1H5N4, Canada
| | - Aïda Ouangraoua
- Department of Computer Science, University of Sherbrooke, 2500 Boulevard de l'Université, Quebec, J1K2R1, Canada
| |
Collapse
|
13
|
Jammali S, Aguilar JD, Kuitche E, Ouangraoua A. SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups. BMC Bioinformatics 2019; 20:133. [PMID: 30925859 PMCID: PMC6439985 DOI: 10.1186/s12859-019-2647-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND The inference of splicing orthology relationships between gene transcripts is a basic step for the prediction of transcripts and the annotation of gene structures in genomes. The splicing structure of a sequence refers to the exon extremity information in a CDS or the exon-intron extremity information in a gene sequence. Splicing orthologous CDS are pairs of CDS with similar sequences and conserved splicing structures from orthologous genes. Spliced alignment that consists in aligning a spliced cDNA sequence against an unspliced genomic sequence, constitutes a promising, yet unexplored approach for the identification of splicing orthology relationships. Existing spliced alignment algorithms do not exploit the information on the splicing structure of the input sequences, namely the exon structure of the cDNA sequence and the exon-intron structure of the genomic sequences. Yet, this information is often available for coding DNA sequences (CDS) and gene sequences annotated in databases, and it can help improve the accuracy of the computed spliced alignments. To address this issue, we introduce a new spliced alignment problem and a method called SplicedFamAlign (SFA) for computing the alignment of a spliced CDS against a gene sequence while accounting for the splicing structures of the input sequences, and then the inference of transcript splicing orthology groups in a gene family based on spliced alignments. RESULTS The experimental results show that SFA outperforms existing spliced alignment methods in terms of accuracy and execution time for CDS-to-gene alignment. We also show that the performance of SFA remains high for various levels of sequence similarity between input sequences, thanks to accounting for the splicing structure of the input sequences. It is important to notice that unlike all current spliced alignment methods that are meant for cDNA-to-genome alignments and can be used for CDS-to-gene alignments, SFA is the first method specifically designed for CDS-to-gene alignments. CONCLUSION We show the usefulness of SFA for the comparison of genes and transcripts within a gene family for the purpose of analyzing splicing orthologies. It can also be used for gene structure annotation and alternative splicing analyses. SplicedFamAlign was implemented in Python. Source code is freely available at https://github.com/UdeS-CoBIUS/SpliceFamAlign .
Collapse
Affiliation(s)
- Safa Jammali
- Department of Computer science, Faculty of Science, Université de Sherbrooke, Sherbrooke, Quebec, Canada
- Department of Biochemistry, Faculty of medecine and health science, Université de Sherbrooke, Sherbrooke, Quebec, Canada
| | - Jean-David Aguilar
- Department of Computer science, Faculty of Science, Université de Sherbrooke, Sherbrooke, Quebec, Canada
- Department of Biochemistry, Faculty of medecine and health science, Université de Sherbrooke, Sherbrooke, Quebec, Canada
| | - Esaie Kuitche
- Department of Computer science, Faculty of Science, Université de Sherbrooke, Sherbrooke, Quebec, Canada
| | - Aïda Ouangraoua
- Department of Computer science, Faculty of Science, Université de Sherbrooke, Sherbrooke, Quebec, Canada
| |
Collapse
|
14
|
Chen X, Wang S, Zhou Y, Han Y, Li S, Xu Q, Xu L, Zhu Z, Deng Y, Yu L, Song L, Chen AP, Song J, Takahashi E, He G, He L, Li W, Chen CD. Phf8 histone demethylase deficiency causes cognitive impairments through the mTOR pathway. Nat Commun 2018; 9:114. [PMID: 29317619 PMCID: PMC5760733 DOI: 10.1038/s41467-017-02531-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2016] [Accepted: 12/06/2017] [Indexed: 12/16/2022] Open
Abstract
Epigenomic abnormalities caused by genetic mutation in epigenetic regulators can result in neurodevelopmental disorders, deficiency in neural plasticity and mental retardation. As a histone demethylase, plant homeodomain finger protein 8 (Phf8) is a candidate gene for syndromal and non-specific forms of X-chromosome-linked intellectual disability (XLID). Here we report that Phf8 knockout mice displayed impaired learning and memory, and impaired hippocampal long-term potentiation (LTP) without gross morphological defects. We also show that mTOR signaling pathway is hyperactive in hippocampus in Phf8 knockout mouse. Mechanistically, we show that demethylation of H4K20me1 by Phf8 results in transcriptional suppression of RSK1 and homeostasis of mTOR signaling. Pharmacological suppression of mTOR signaling with rapamycin in Phf8 knockout mice recovers the weakened LTP and cognitive deficits. Together, our results indicate that loss of Phf8 in animals causes deficient learning and memory by epigenetic disruption of mTOR signaling, and provides a potential therapeutic drug target to treat XLID. Mutations in PHF8 gene are genetically associated with X-linked mental retardation. Here, Chen et al. show that Phf8 KO mouse have cognitive and synaptic plasticity impairment, and pharmacological inhibition of mTOR signaling can partially alleviate such defects.
Collapse
Affiliation(s)
- Xuemei Chen
- Bio-X Institutes, Key Laboratory for the Genetics of Development and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China.,State Key Laboratory of Molecular Biology, Shanghai Key laboratory of Molecular Andrology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.,Department of Anesthesiology, Ren Ji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Shuai Wang
- Bio-X Institutes, Key Laboratory for the Genetics of Development and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Ying Zhou
- Bio-X Institutes, Key Laboratory for the Genetics of Development and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Yanfei Han
- Bio-X Institutes, Key Laboratory for the Genetics of Development and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China.,Discipline of Neuroscience and Department of Anatomy and Physiology, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200025, China
| | - Shengtian Li
- Bio-X Institutes, Key Laboratory for the Genetics of Development and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Qing Xu
- State Key Laboratory of Molecular Biology, Shanghai Key laboratory of Molecular Andrology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Longyong Xu
- State Key Laboratory of Molecular Biology, Shanghai Key laboratory of Molecular Andrology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Ziqi Zhu
- State Key Laboratory of Molecular Biology, Shanghai Key laboratory of Molecular Andrology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Youming Deng
- State Key Laboratory of Molecular Biology, Shanghai Key laboratory of Molecular Andrology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Lu Yu
- State Key Laboratory of Molecular Biology, Shanghai Key laboratory of Molecular Andrology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Lulu Song
- Bio-X Institutes, Key Laboratory for the Genetics of Development and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Adele Pin Chen
- State Key Laboratory of Molecular Biology, Shanghai Key laboratory of Molecular Andrology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Juan Song
- Department of Pharmacology and Neuroscience Center, University of North Carolina School of Medicine, Chapel Hill, NC, 27514, USA
| | - Eiki Takahashi
- Research Resources Center, RIKEN Brain Science Institute, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Guang He
- Bio-X Institutes, Key Laboratory for the Genetics of Development and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Lin He
- Bio-X Institutes, Key Laboratory for the Genetics of Development and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Weidong Li
- Bio-X Institutes, Key Laboratory for the Genetics of Development and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China.
| | - Charlie Degui Chen
- State Key Laboratory of Molecular Biology, Shanghai Key laboratory of Molecular Andrology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| |
Collapse
|
15
|
Kuitche E, Lafond M, Ouangraoua A. Reconstructing protein and gene phylogenies using reconciliation and soft-clustering. J Bioinform Comput Biol 2017; 15:1740007. [DOI: 10.1142/s0219720017400078] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The architecture of eukaryotic coding genes allows the production of several different protein isoforms by genes. Current gene phylogeny reconstruction methods make use of a single protein product per gene, ignoring information on alternative protein isoforms. These methods often lead to inaccurate gene tree reconstructions that require to be corrected before phylogenetic analyses. Here, we propose a new approach for the reconstruction of gene trees and protein trees accounting for alternative protein isoforms. We extend the concept of reconciliation to protein trees, and we define a new reconciliation problem called MinDRGT that consists in finding a gene tree that minimizes a double reconciliation cost with a given protein tree and a given species tree. We define a second problem called MinDRPGT that consists in finding a protein supertree and a gene tree minimizing a double reconciliation cost, given a species tree and a set of protein subtrees. We propose a shift from the traditional view of protein ortholog groups as hard-clusters to soft-clusters and we study the MinDRPGT problem under this assumption. We provide algorithmic exact and heuristic solutions for versions of the problems, and we present the results of applications on protein and gene trees from the Ensembl database. The implementations of the methods are available at https://github.com/UdeS-CoBIUS/Protein2GeneTree and https://github.com/UdeS-CoBIUS/SuperProteinTree .
Collapse
Affiliation(s)
- Esaie Kuitche
- Department of Computer Science, Université de Sherbrooke, Sherbrooke, QC J1K2R1, Canada
| | - Manuel Lafond
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, ON K1N6N5, Canada
| | - Aïda Ouangraoua
- Department of Computer Science, Université de Sherbrooke, Sherbrooke, QC J1K2R1, Canada
| |
Collapse
|
16
|
Abstract
Cross-species comparisons of genomes, transcriptomes and gene regulation are now feasible at unprecedented resolution and throughput, enabling the comparison of human and mouse biology at the molecular level. Insights have been gained into the degree of conservation between human and mouse at the level of not only gene expression but also epigenetics and inter-individual variation. However, a number of limitations exist, including incomplete transcriptome characterization and difficulties in identifying orthologous phenotypes and cell types, which are beginning to be addressed by emerging technologies. Ultimately, these comparisons will help to identify the conditions under which the mouse is a suitable model of human physiology and disease, and optimize the use of animal models.
Collapse
Affiliation(s)
- Alessandra Breschi
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Thomas R Gingeras
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11742, USA
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| |
Collapse
|
17
|
Jammali S, Kuitche E, Rachati A, Bélanger F, Scott M, Ouangraoua A. Aligning coding sequences with frameshift extension penalties. Algorithms Mol Biol 2017; 12:10. [PMID: 28373895 PMCID: PMC5374649 DOI: 10.1186/s13015-017-0101-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Accepted: 03/18/2017] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Frameshift translation is an important phenomenon that contributes to the appearance of novel coding DNA sequences (CDS) and functions in gene evolution, by allowing alternative amino acid translations of gene coding regions. Frameshift translations can be identified by aligning two CDS, from a same gene or from homologous genes, while accounting for their codon structure. Two main classes of algorithms have been proposed to solve the problem of aligning CDS, either by amino acid sequence alignment back-translation, or by simultaneously accounting for the nucleotide and amino acid levels. The former does not allow to account for frameshift translations and up to now, the latter exclusively accounts for frameshift translation initiation, not considering the length of the translation disruption caused by a frameshift. RESULTS We introduce a new scoring scheme with an algorithm for the pairwise alignment of CDS accounting for frameshift translation initiation and length, while simultaneously considering nucleotide and amino acid sequences. The main specificity of the scoring scheme is the introduction of a penalty cost accounting for frameshift extension length to compute an adequate similarity score for a CDS alignment. The second specificity of the model is that the search space of the problem solved is the set of all feasible alignments between two CDS. Previous approaches have considered restricted search space or additional constraints on the decomposition of an alignment into length-3 sub-alignments. The algorithm described in this paper has the same asymptotic time complexity as the classical Needleman-Wunsch algorithm. CONCLUSIONS We compare the method to other CDS alignment methods based on an application to the comparison of pairs of CDS from homologous human, mouse and cow genes of ten mammalian gene families from the Ensembl-Compara database. The results show that our method is particularly robust to parameter changes as compared to existing methods. It also appears to be a good compromise, performing well both in the presence and absence of frameshift translations. An implementation of the method is available at https://github.com/UdeS-CoBIUS/FsePSA.
Collapse
Affiliation(s)
- Safa Jammali
- Département d’informatique, Faculté des Sciences, Université de Sherbrooke, Sherbrooke, QC J1K2R1 Canada
| | - Esaie Kuitche
- Département d’informatique, Faculté des Sciences, Université de Sherbrooke, Sherbrooke, QC J1K2R1 Canada
| | - Ayoub Rachati
- Département d’informatique, Faculté des Sciences, Université de Sherbrooke, Sherbrooke, QC J1K2R1 Canada
| | - François Bélanger
- Département d’informatique, Faculté des Sciences, Université de Sherbrooke, Sherbrooke, QC J1K2R1 Canada
| | - Michelle Scott
- Département de biochimie, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, QC J1E4K8 Canada
| | - Aïda Ouangraoua
- Département d’informatique, Faculté des Sciences, Université de Sherbrooke, Sherbrooke, QC J1K2R1 Canada
| |
Collapse
|
18
|
Blanquart S, Varré JS, Guertin P, Perrin A, Bergeron A, Swenson KM. Assisted transcriptome reconstruction and splicing orthology. BMC Genomics 2016; 17:786. [PMID: 28185551 PMCID: PMC5123294 DOI: 10.1186/s12864-016-3103-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Background Transcriptome reconstruction, defined as the identification of all protein isoforms that may be expressed by a gene, is a notably difficult computational task. With real data, the best methods based on RNA-seq data identify barely 21 % of the expressed transcripts. While waiting for algorithms and sequencing techniques to improve — as has been strongly suggested in the literature — it is important to evaluate assisted transcriptome prediction; this is the question of how alternative transcription in one species performs as a predictor of protein isoforms in another relatively close species. Most evidence-based gene predictors use transcripts from other species to annotate a genome, but the predictive power of procedures that use exclusively transcripts from external species has never been quantified. The cornerstone of such an evaluation is the correct identification of pairs of transcripts with the same splicing patterns, called splicing orthologs. Results We propose a rigorous procedural definition of splicing orthologs, based on the identification of all ortholog pairs of splicing sites in the nucleotide sequences, and alignments at the protein level. Using our definition, we compared 24 382 human transcripts and 17 909 mouse transcripts from the highly curated CCDS database, and identified 11 122 splicing orthologs. In prediction mode, we show that human transcripts can be used to infer over 62 % of mouse protein isoforms. When restricting the predictions to transcripts known eight years ago, the percentage grows to 74 %. Using CCDS timestamped releases, we also analyze the evolution of the number of splicing orthologs over the last decade. Conclusions Alternative splicing is now recognized to play a major role in the protein diversity of eukaryotic organisms, but definitions of spliced isoform orthologs are still approximate. Here we propose a definition adapted to the subtle variations of conserved alternative splicing sites, and use it to validate numerous accurate orthologous isoform predictions. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3103-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Jean-Stéphane Varré
- Université de Lille, CNRS, Centrale Lille, Inria, UMR 9189 - CRIStAL, Lille, France
| | - Paul Guertin
- LaCIM, Université du Québec à Montréal, Montréal, Canada.,Département de mathématiques, Collège André-Grasset, Montréal, Canada
| | - Amandine Perrin
- Université de Lille, CNRS, Centrale Lille, Inria, UMR 9189 - CRIStAL, Lille, France.,Institut Pasteur, Microbial Evolutionary Genomics, CNRS, UMR3525, and Hub Bioinformatique et Biostatistique, C3BI, USR 3756 IP CNRS, Paris, France
| | - Anne Bergeron
- LaCIM, Université du Québec à Montréal, Montréal, Canada
| | - Krister M Swenson
- LIRMM, CNRS - Université de Montpellier, 161 rue Ada, Montpellier, 34392, France. .,IBC Institut de Biologie Computationnelle, Montpellier, France.
| |
Collapse
|
19
|
Chen J, Hackett CS, Zhang S, Song YK, Bell RJA, Molinaro AM, Quigley DA, Balmain A, Song JS, Costello JF, Gustafson WC, Van Dyke T, Kwok PY, Khan J, Weiss WA. The genetics of splicing in neuroblastoma. Cancer Discov 2015; 5:380-95. [PMID: 25637275 PMCID: PMC4390477 DOI: 10.1158/2159-8290.cd-14-0892] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Accepted: 01/26/2015] [Indexed: 02/06/2023]
Abstract
UNLABELLED Regulation of mRNA splicing, a critical and tightly regulated cellular function, underlies the majority of proteomic diversity and is frequently disrupted in disease. Using an integrative genomics approach, we combined both genomic data and exon-level transcriptome data in two somatic tissues (cerebella and peripheral ganglia) from a transgenic mouse model of neuroblastoma, a tumor that arises from the peripheral neural crest. Here, we describe splicing quantitative trait loci associated with differential splicing across the genome that we use to identify genes with previously unknown functions within the splicing pathway and to define de novo intronic splicing motifs that influence splicing from hundreds of bases away. Our results show that these splicing motifs represent sites for functional recurrent mutations and highlight novel candidate genes in human cancers, including childhood neuroblastoma. SIGNIFICANCE Somatic mutations with predictable downstream effects are largely relegated to coding regions, which comprise less than 2% of the human genome. Using an unbiased in vivo analysis of a mouse model of neuroblastoma, we have identified intronic splicing motifs that translate into sites for recurrent somatic mutations in human cancers.
Collapse
Affiliation(s)
- Justin Chen
- Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, California. Department of Neurology, University of California, San Francisco, San Francisco, California. Department of Neurosurgery, University of California, San Francisco, San Francisco, California
| | - Christopher S Hackett
- Department of Neurology, University of California, San Francisco, San Francisco, California. Department of Neurosurgery, University of California, San Francisco, San Francisco, California
| | - Shile Zhang
- Program in Bioinformatics, Boston University, Boston, Massachusetts. Oncogenomics Section, Pediatric Oncology Branch, National Cancer Institute, Bethesda, Maryland
| | - Young K Song
- Oncogenomics Section, Pediatric Oncology Branch, National Cancer Institute, Bethesda, Maryland
| | - Robert J A Bell
- Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, California. Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, California
| | - Annette M Molinaro
- Department of Neurology, University of California, San Francisco, San Francisco, California. Department of Neurosurgery, University of California, San Francisco, San Francisco, California. Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, California. Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, California
| | - David A Quigley
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, California. Institute for Cancer Research, Oslo, Norway
| | - Allan Balmain
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, California
| | - Jun S Song
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, California. Department of Bioengineering, University of Illinois, Urbana-Champaign, Urbana, Illinois. Department of Physics, University of Illinois, Urbana-Champaign, Urbana, Illinois
| | - Joseph F Costello
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, California
| | - W Clay Gustafson
- Department of Pediatrics, University of California, San Francisco, San Francisco, California
| | - Terry Van Dyke
- Mouse Cancer Genetics Program, Center for Advanced Preclinical Research, National Cancer Institute, Frederick, Maryland
| | - Pui-Yan Kwok
- Institute for Human Genetics, University of California, San Francisco, San Francisco, California. Department of Dermatology, University of California, San Francisco, San Francisco, California. Cardiovascular Research Institute, University of California, San Francisco, San Francisco, California
| | - Javed Khan
- Oncogenomics Section, Pediatric Oncology Branch, National Cancer Institute, Bethesda, Maryland
| | - William A Weiss
- Department of Neurology, University of California, San Francisco, San Francisco, California. Department of Neurosurgery, University of California, San Francisco, San Francisco, California. Department of Pediatrics, University of California, San Francisco, San Francisco, California.
| |
Collapse
|
20
|
Whitney IE, Kautzman AG, Reese BE. Alternative splicing of the LIM-homeodomain transcription factor Isl1 in the mouse retina. Mol Cell Neurosci 2015; 65:102-13. [PMID: 25752730 DOI: 10.1016/j.mcn.2015.03.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2014] [Revised: 02/12/2015] [Accepted: 03/05/2015] [Indexed: 11/25/2022] Open
Abstract
Islet-1 (Isl1) is a LIM-homeodomain (LIM-HD) transcription factor that functions in a combinatorial manner with other LIM-HD proteins to direct the differentiation of distinct cell types within the central nervous system and many other tissues. A study of pancreatic cell lines showed that Isl1 is alternatively spliced generating a second isoform, Isl1β, which is missing 23 amino acids within the C-terminal region. This study examines the expression of the canonical and alternative Isl1 transcripts across other tissues, in particular, within the retina, where Isl1 is required for the differentiation of multiple neuronal cell types. The alternative splicing of Isl1 is shown to occur in multiple tissues, but the relative abundance of Isl1α and Isl1β expression varies greatly across them. In most tissues, Isl1α is the more abundant transcript, but in others the transcripts are expressed equally, or the alternative splice variant is dominant. Within the retina, differential expression of the two Isl1 transcripts increases as a function of development, with dynamic changes in expression peaking at E16.5 and again at P10. At the cellular level, individual retinal ganglion cells vary in their expression, with a subset of small-to-medium sized cells expressing only the alternative isoform. The functional significance of the difference in protein sequence between the two Isl1 isoforms was also assessed using a luciferase assay, demonstrating that the alternative isoform forms a less effective transcriptional complex for activating gene expression. These results demonstrate the differential presence of the canonical and alternative isoforms of Isl1 amongst retinal ganglion cell classes. As Isl1 participates in the differentiation of multiple cell types within the CNS, the present results support a role for alternative splicing in the establishment of cellular diversity in the developing nervous system.
Collapse
Affiliation(s)
- Irene E Whitney
- Neuroscience Research Institute, University of California at Santa Barbara, Santa Barbara, CA 93106-5060, United States; Department of Molecular, Cellular and Developmental Biology, University of California at Santa Barbara, Santa Barbara, CA 93106-9625, United States.
| | - Amanda G Kautzman
- Neuroscience Research Institute, University of California at Santa Barbara, Santa Barbara, CA 93106-5060, United States; Department of Psychological & Brain Sciences, University of California at Santa Barbara, Santa Barbara, CA 93106-9660, United States.
| | - Benjamin E Reese
- Neuroscience Research Institute, University of California at Santa Barbara, Santa Barbara, CA 93106-5060, United States; Department of Psychological & Brain Sciences, University of California at Santa Barbara, Santa Barbara, CA 93106-9660, United States.
| |
Collapse
|
21
|
Gu J, Lu Y, Qiao L, Ran D, Li N, Cao H, Gao Y, Zheng Q. Mouse p63 variants and chondrogenesis. INTERNATIONAL JOURNAL OF CLINICAL AND EXPERIMENTAL PATHOLOGY 2013; 6:2872-2879. [PMID: 24294373 PMCID: PMC3843267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 10/09/2013] [Accepted: 11/08/2013] [Indexed: 06/02/2023]
Abstract
As a critical member of the p53 family of transcription factors, p63 has been implicated a role in development than in tumor formation, because p63 is seldom mutated in human cancers, while p63 null mice exhibit severe developmental abnormalities without increasing cancer susceptibility. Notably, besides the major epithelial and cardiac defect, p63 deficient mice show severe limb and craniofacial abnormalities. In addition, humans with p63 mutations also show severe limb and digit defects, suggesting a putative role of p63 in skeletal development. There are eight p63 variants which encode for the TAp63 and ΔNp63 isoforms by alternative promoters. How these isoforms function during skeletal development is currently largely unknown. Our recent transgenic studies suggest a role of TAP63α, but not ΔNP63α, during embryonic long bone development. However, the moderate skeletal phenotypes in the TAP63α transgenic mice suggest requirement of additional p63 isoform(s) for the limb defects in p63 null mice. Here, we report analysis of mouse p63 variants in MCT and ATDC5 cells, two cell models undergo hypertrophic differentiation and mimic the process of endochondral bone formation upon growth arrest or induction. We detected increased level of p63 variants in hypertrophic MCT cells by regular RT-PCR analysis. Further analysis by qRT-PCR, we detected significantly upregulated level of γ variant (p<0.05), but not α or β variant (p>0.05), in hypertrophic MCT cells than in proliferative MCT cells. Moreover, we detected upregulated TAP63γ in ATDC5 cells undergoing hypertrophic differentiation. Our results suggest that TAp63γ plays a positive role during endochondral bone formation.
Collapse
Affiliation(s)
- Junxia Gu
- Department of Hematology and Hematological Laboratory Science, School of Medical Science and Laboratory Medicine, Jiangsu UniversityZhenjiang 212013, China
| | - Yaojuan Lu
- Department of Hematology and Hematological Laboratory Science, School of Medical Science and Laboratory Medicine, Jiangsu UniversityZhenjiang 212013, China
- Department of Anatomy and Cell Biology, Rush University Medical CenterChicago, IL 60612, USA
| | - Longwei Qiao
- Department of Hematology and Hematological Laboratory Science, School of Medical Science and Laboratory Medicine, Jiangsu UniversityZhenjiang 212013, China
| | - Deyuan Ran
- Department of Hematology and Hematological Laboratory Science, School of Medical Science and Laboratory Medicine, Jiangsu UniversityZhenjiang 212013, China
| | - Na Li
- Department of Hematology and Hematological Laboratory Science, School of Medical Science and Laboratory Medicine, Jiangsu UniversityZhenjiang 212013, China
| | - Hong Cao
- Department of Hematology and Hematological Laboratory Science, School of Medical Science and Laboratory Medicine, Jiangsu UniversityZhenjiang 212013, China
| | - Yan Gao
- Department of Hematology and Hematological Laboratory Science, School of Medical Science and Laboratory Medicine, Jiangsu UniversityZhenjiang 212013, China
| | - Qiping Zheng
- Department of Hematology and Hematological Laboratory Science, School of Medical Science and Laboratory Medicine, Jiangsu UniversityZhenjiang 212013, China
- Department of Anatomy and Cell Biology, Rush University Medical CenterChicago, IL 60612, USA
| |
Collapse
|
22
|
Spangenberg L, Correa A, Dallagiovanna B, Naya H. Role of alternative polyadenylation during adipogenic differentiation: an in silico approach. PLoS One 2013; 8:e75578. [PMID: 24143171 PMCID: PMC3797115 DOI: 10.1371/journal.pone.0075578] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 08/14/2013] [Indexed: 01/22/2023] Open
Abstract
Post-transcriptional regulation of stem cell differentiation is far from being completely understood. Changes in protein levels are not fully correlated with corresponding changes in mRNAs; the observed differences might be partially explained by post-transcriptional regulation mechanisms, such as alternative polyadenylation. This would involve changes in protein binding, transcript usage, miRNAs and other non-coding RNAs. In the present work we analyzed the distribution of alternative transcripts during adipogenic differentiation and the potential role of miRNAs in post-transcriptional regulation. Our in silico analysis suggests a modest, consistent, bias in 3'UTR lengths during differentiation enabling a fine-tuned transcript regulation via small non-coding RNAs. Including these effects in the analyses partially accounts for the observed discrepancies in relative abundance of protein and mRNA.
Collapse
Affiliation(s)
- Lucía Spangenberg
- Bioinformatics Unit, Institut Pasteur Montevideo, Montevideo, Uruguay
| | - Alejandro Correa
- Instituto Carlos Chagas, Fiocruz-Paraná, Curitiba, Paraná, Brazil
| | | | - Hugo Naya
- Bioinformatics Unit, Institut Pasteur Montevideo, Montevideo, Uruguay
- Departamento de Producción Animal y Pasturas, Facultad de Agronomía, Universidad de la República
| |
Collapse
|
23
|
Fong JH, Murphy TD, Pruitt KD. Comparison of RefSeq protein-coding regions in human and vertebrate genomes. BMC Genomics 2013; 14:654. [PMID: 24063302 PMCID: PMC3882889 DOI: 10.1186/1471-2164-14-654] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2013] [Accepted: 08/22/2013] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Advances in high-throughput sequencing technology have yielded a large number of publicly available vertebrate genomes, many of which are selected for inclusion in NCBI's RefSeq project and subsequently processed by NCBI's eukaryotic annotation pipeline. Genome annotation results are affected by differences in available support evidence and may be impacted by annotation pipeline software changes over time. The RefSeq project has not previously assessed annotation trends across organisms or over time. To address this deficiency, we have developed a comparative protocol which integrates analysis of annotated protein-coding regions across a data set of vertebrate orthologs in genomic sequence coordinates, protein sequences, and protein features. RESULTS We assessed an ortholog dataset that includes 34 annotated vertebrate RefSeq genomes including human. We confirm that RefSeq protein-coding gene annotations in mammals exhibit considerable similarity. Over 50% of the orthologous protein-coding genes in 20 organisms are supported at the level of splicing conservation with at least three selected reference genomes. Approximately 7,500 ortholog sets include at least half of the analyzed organisms, show highly similar sequence and conserved splicing, and may serve as a minimal set of mammalian "core proteins" for initial assessment of new mammalian genomes. Additionally, 80% of the proteins analyzed pass a suite of tests to detect proteins that lack splicing conservation and have unusual sequence or domain annotation. We use these tests to define an annotation quality metric that is based directly on the annotated proteins thus operates independently of other quality metrics such as availability of transcripts or assembly quality measures. Results are available on the RefSeq FTP site [http://ftp.ncbi.nlm.nih.gov/refseq/supplemental/ProtCore/SM1.txt]. CONCLUSIONS Our multi-factored analysis demonstrates a high level of consistency in RefSeq protein representation among vertebrates. We find that the majority of the RefSeq vertebrate proteins for which we have calculated orthology are good as measured by these metrics. The process flow described provides specific information on the scope and degree of conservation for the analyzed protein sequences and annotations and will be used to enrich the quality of RefSeq records by identifying targets for further improvement in the computational annotation pipeline, and by flagging specific genes for manual curation.
Collapse
Affiliation(s)
- Jessica H Fong
- National Center for Biotechnology Information, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA
- Current address: 6425 Penn Ave. Suite 700, Pittsburgh, PA 15206, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA
| |
Collapse
|
24
|
Villanueva-Cañas JL, Laurie S, Albà MM. Improving genome-wide scans of positive selection by using protein isoforms of similar length. Genome Biol Evol 2013; 5:457-67. [PMID: 23377868 PMCID: PMC3590775 DOI: 10.1093/gbe/evt017] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Large-scale evolutionary studies often require the automated construction of alignments of a large number of homologous gene families. The majority of eukaryotic genes can produce different transcripts due to alternative splicing or transcription initiation, and many such transcripts encode different protein isoforms. As analyses tend to be gene centered, one single-protein isoform per gene is selected for the alignment, with the de facto approach being to use the longest protein isoform per gene (Longest), presumably to avoid including partial sequences and to maximize sequence information. Here, we show that this approach is problematic because it increases the number of indels in the alignments due to the inclusion of nonhomologous regions, such as those derived from species-specific exons, increasing the number of misaligned positions. With the aim of ameliorating this problem, we have developed a novel heuristic, Protein ALignment Optimizer (PALO), which, for each gene family, selects the combination of protein isoforms that are most similar in length. We examine several evolutionary parameters inferred from alignments in which the only difference is the method used to select the protein isoform combination: Longest, PALO, the combination that results in the highest sequence conservation, and a randomly selected combination. We observe that Longest tends to overestimate both nonsynonymous and synonymous substitution rates when compared with PALO, which is most likely due to an excess of misaligned positions. The estimation of the fraction of genes that have experienced positive selection by maximum likelihood is very sensitive to the method of isoform selection employed, both when alignments are constructed with MAFFT and with Prank+F. Longest performs better than a random combination but still estimates up to 3 times more positively selected genes than the combination showing the highest conservation, indicating the presence of many false positives. We show that PALO can eliminate the majority of such false positives and thus that it is a more appropriate approach for large-scale analyses than Longest. A web server has been set up to facilitate the use of PALO given a user-defined set of gene families; it is available at http://evolutionarygenomics.imim.es/palo.
Collapse
Affiliation(s)
- José Luis Villanueva-Cañas
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Research Institute (IMIM), Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | | | | |
Collapse
|
25
|
Koumandou VL, Scorilas A. Evolution of the plasma and tissue kallikreins, and their alternative splicing isoforms. PLoS One 2013; 8:e68074. [PMID: 23874499 PMCID: PMC3707919 DOI: 10.1371/journal.pone.0068074] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2012] [Accepted: 05/25/2013] [Indexed: 12/14/2022] Open
Abstract
Kallikreins are secreted serine proteases with important roles in human physiology. Human plasma kallikrein, encoded by the KLKB1 gene on locus 4q34-35, functions in the blood coagulation pathway, and in regulating blood pressure. The human tissue kallikrein and kallikrein-related peptidases (KLKs) have diverse expression patterns and physiological roles, including cancer-related processes such as cell growth regulation, angiogenesis, invasion, and metastasis. Prostate-specific antigen (PSA), the product of the KLK3 gene, is the most widely used biomarker in clinical practice today. A total of 15 KLKs are encoded by the largest contiguous cluster of protease genes in the human genome (19q13.3-13.4), which makes them ideal for evolutionary analysis of gene duplication events. Previous studies on the evolution of KLKs have traced mammalian homologs as well as a probable early origin of the family in aves, amphibia and reptilia. The aim of this study was to address the evolutionary and functional relationships between tissue KLKs and plasma kallikrein, and to examine the evolution of alternative splicing isoforms. Sequences of plasma and tissue kallikreins and their alternative transcripts were collected from the NCBI and Ensembl databases, and comprehensive phylogenetic analysis was performed by Bayesian as well as maximum likelihood methods. Plasma and tissue kallikreins exhibit high sequence similarity in the trypsin domain (>50%). Phylogenetic analysis indicates an early divergence of KLKB1, which groups closely with plasminogen, chymotrypsin, and complement factor D (CFD), in a monophyletic group distinct from trypsin and the tissue KLKs. Reconstruction of the earliest events leading to the diversification of the tissue KLKs is not well resolved, indicating rapid expansion in mammals. Alternative transcripts of each KLK gene show species-specific divergence, while examination of sequence conservation indicates that many annotated human KLK isoforms are missing the catalytic triad that is crucial for protease activity.
Collapse
Affiliation(s)
| | - Andreas Scorilas
- Department of Biochemistry and Molecular Biology, University of Athens, Athens, Greece
- * E-mail:
| |
Collapse
|
26
|
Abstract
BACKGROUND Gene orthology has been well studied in the evolutionary area and is thought to be an important implication to functional genome annotations. As the accumulation of transcriptomic data, alternative splicing is taken into account in the assignments of gene orthologs and the orthology is suggested to be further considered at transcript level. Whether gene or transcript orthology, exons are the basic units that represent the whole gene structure; however, there is no any reported study on how to build exon level orthology in a whole genome scale. Therefore, it is essential to establish a gene-oriented exon orthology dataset. RESULTS Using a customized pipeline, we first build exon orthologous relationships from assigned gene orthologs pairs in two well-annotated genomes: human and mouse. More than 92% of non-overlapping exons have at least one ortholog between human and mouse and only a small portion of them own more than one ortholog. The exons located in the coding region are more conserved in terms of finding their ortholog counterparts. Within the untranslated region, the 5' UTR seems to have more diversity than the 3' UTR according to exon orthology designations. Interestingly, most exons located in the coding region are also conserved in length but this conservation phenomenon dramatically drops down in untranslated regions. In addition, we allowed multiple assignments in exon orthologs and a subset of exons with possible fusion/split events were defined here after a thorough analysis procedure. CONCLUSIONS Identification of orthologs at the exon level is essential to provide a detailed way to interrogate gene orthology and splicing analysis. It could be used to extend the genome annotation as well. Besides examining the one-to-one orthologous relationship, we manage the one-to-multi exon pairs to represent complicated exon generation behavior. Our results can be further applied in many research fields studying intron-exon structure and alternative/constitutive exons in functional genomic areas.
Collapse
Affiliation(s)
- Gloria C-L Fu
- Institute of of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | | |
Collapse
|
27
|
Prosdocimi F, Linard B, Pontarotti P, Poch O, Thompson JD. Controversies in modern evolutionary biology: the imperative for error detection and quality control. BMC Genomics 2012; 13:5. [PMID: 22217008 PMCID: PMC3311146 DOI: 10.1186/1471-2164-13-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2011] [Accepted: 01/04/2012] [Indexed: 12/03/2022] Open
Abstract
Background The data from high throughput genomics technologies provide unique opportunities for studies of complex biological systems, but also pose many new challenges. The shift to the genome scale in evolutionary biology, for example, has led to many interesting, but often controversial studies. It has been suggested that part of the conflict may be due to errors in the initial sequences. Most gene sequences are predicted by bioinformatics programs and a number of quality issues have been raised, concerning DNA sequencing errors or badly predicted coding regions, particularly in eukaryotes. Results We investigated the impact of these errors on evolutionary studies and specifically on the identification of important genetic events. We focused on the detection of asymmetric evolution after duplication, which has been the subject of controversy recently. Using the human genome as a reference, we established a reliable set of 688 duplicated genes in 13 complete vertebrate genomes, where significantly different evolutionary rates are observed. We estimated the rates at which protein sequence errors occur and are accumulated in the higher-level analyses. We showed that the majority of the detected events (57%) are in fact artifacts due to the putative erroneous sequences and that these artifacts are sufficient to mask the true functional significance of the events. Conclusions Initial errors are accumulated throughout the evolutionary analysis, generating artificially high rates of event predictions and leading to substantial uncertainty in the conclusions. This study emphasizes the urgent need for error detection and quality control strategies in order to efficiently extract knowledge from the new genome data.
Collapse
Affiliation(s)
- Francisco Prosdocimi
- Department of Integrated Structural Biology, IGBMC (Institut de Génétique et de Biologie Moléculaire et Cellulaire) CNRS/INSERM/Université de Strasbourg, 1 rue Laurent Fries, Illkirch, F-67404, France
| | | | | | | | | |
Collapse
|
28
|
Abstract
Despite the common assumption that orthologs usually share the same function, there have been various reports of divergence between orthologs, even among species as close as mammals. The comparison of mouse and human is of special interest, because mouse is often used as a model organism to understand human biology. We review the literature on evidence for divergence between human and mouse orthologous genes, and discuss it in the context of biomedical research.
Collapse
Affiliation(s)
- Walid H Gharib
- Department of Ecology and Evolution, Biophore, Swiss Institute of Bioinformatics, Lausanne University, CH-1015 Lausanne, Switzerland
| | | |
Collapse
|