1
|
Ouedraogo WYDD, Ouangraoua A. Orthology and Paralogy Relationships at Transcript Level. J Comput Biol 2024; 31:277-293. [PMID: 38621191 DOI: 10.1089/cmb.2023.0400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024] Open
Abstract
Eukaryotic genes undergo a mechanism called alternative processing, resulting in transcriptome diversity by allowing the production of multiple distinct transcripts from a gene. More than half of human genes are affected, and the resulting transcripts are highly conserved among orthologous genes of distinct species. In this work, we present the definition of orthology and paralogy between transcripts of homologous genes, together with an algorithm to compute clusters of conserved orthologous and paralogous transcripts. Gene-level homology relationships are utilized to define various types of homology relationships between transcripts originating from the same ancestral transcript. A Reciprocal Best Hits approach is employed to infer clusters of isoorthologous and recent paralogous transcripts. We applied this method to transcripts from simulated gene families as well as real gene families from the Ensembl-Compara database. The results are consistent with those from previous studies that compared orthologous gene transcripts. Furthermore, our findings provide evidence that searching for conserved transcripts between homologous genes, beyond the scope of orthologous genes, is likely to yield valuable information.
Collapse
Affiliation(s)
| | - Aida Ouangraoua
- Department of Computer Science, Université de Sherbrooke, Sherbrooke, Quebec, Canada
| |
Collapse
|
2
|
Jammali S, Djossou A, Ouédraogo WYDD, Nevers Y, Chegrane I, Ouangraoua A. From pairwise to multiple spliced alignment. BIOINFORMATICS ADVANCES 2022; 2:vbab044. [PMID: 36699392 PMCID: PMC9710695 DOI: 10.1093/bioadv/vbab044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 11/25/2021] [Indexed: 01/28/2023]
Abstract
Motivation Alternative splicing is a ubiquitous process in eukaryotes that allows distinct transcripts to be produced from the same gene. Yet, the study of transcript evolution within a gene family is still in its infancy. One prerequisite for this study is the availability of methods to compare sets of transcripts while accounting for their splicing structure. In this context, we generalize the concept of pairwise spliced alignments (PSpAs) to multiple spliced alignments (MSpAs). MSpAs have several important purposes in addition to empowering the study of the evolution of transcripts. For instance, it is a key to improving the prediction of gene models, which is important to solve the growing problem of genome annotation. Despite its essentialness, a formal definition of the concept and methods to compute MSpAs are still lacking. Results We introduce the MSpA problem and the SplicedFamAlignMulti (SFAM) method, to compute the MSpA of a gene family. Like most multiple sequence alignment (MSA) methods that are generally greedy heuristic methods assembling pairwise alignments, SFAM combines all PSpAs of coding DNA sequences and gene sequences of a gene family into an MSpA. It produces a single structure that represents the superstructure and models of the gene family. Using real vertebrate and simulated gene family data, we illustrate the utility of SFAM for computing accurate gene family superstructures, MSAs, inferring splicing orthologous groups and improving gene-model annotations. Availability and implementation The supporting data and implementation of SFAM are freely available at https://github.com/UdeS-CoBIUS/SpliceFamAlignMulti. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Safa Jammali
- Département D’informatique, Faculté des Sciences, Université de Sherbrooke, 2500, boul. de l'Université, Sherbrooke (Québec) J1K 2R1, Canada,Département de Biochimie et de Génomique Fonctionnelle, Faculté de Médecine et des Sciences de la santé, Université de Sherbrooke, 3001, 12e avenue Nord, Sherbrooke (Québec) J1H 5N4, Canada
| | - Abigaïl Djossou
- Département D’informatique, Faculté des Sciences, Université de Sherbrooke, 2500, boul. de l'Université, Sherbrooke (Québec) J1K 2R1, Canada
| | - Wend-Yam D D Ouédraogo
- Département D’informatique, Faculté des Sciences, Université de Sherbrooke, 2500, boul. de l'Université, Sherbrooke (Québec) J1K 2R1, Canada
| | - Yannis Nevers
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland,Department of Computational Biology, University of Lausanne, Lausanne 1015, Switzerland,Center for Integrative Genomics, University of Lausanne, Lausanne 1015, Switzerland
| | - Ibrahim Chegrane
- Département D’informatique, Faculté des Sciences, Université de Sherbrooke, 2500, boul. de l'Université, Sherbrooke (Québec) J1K 2R1, Canada
| | - Aïda Ouangraoua
- Département D’informatique, Faculté des Sciences, Université de Sherbrooke, 2500, boul. de l'Université, Sherbrooke (Québec) J1K 2R1, Canada,To whom correspondence should be addressed.
| |
Collapse
|