1
|
Ijaz AZ, Ali RH, Sarwar A, Ali Khan T, Baig MM. Importance of Synteny in Homology Inference. 2022 17TH INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES (ICET) 2022. [DOI: 10.1109/icet56601.2022.10004649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Affiliation(s)
- Ali Zeeshan Ijaz
- AI Research Group GIK Institute of Engg. Sciences & Tech.,Faculty of Computer Science & Engg.,Topi,Khyber Pakhtunkhwa,Pakistan
| | - Raja Hashim Ali
- AI Research Group GIK Institute of Engg. Sciences & Tech.,Faculty of Computer Science & Engg.,Topi,Khyber Pakhtunkhwa,Pakistan
| | - Asima Sarwar
- AI Research Group GIK Institute of Engg. Sciences & Tech.,Faculty of Computer Science & Engg.,Topi,Khyber Pakhtunkhwa,Pakistan
| | - Talha Ali Khan
- Univ. of Europe of Applied Sciences,Dept. of Tech & Software Engg.,Berlin,Germany
| | - Muhammad Muneeb Baig
- AI Research Group GIK Institute of Engg. Sciences & Tech.,Faculty of Computer Science & Engg.,Topi,Khyber Pakhtunkhwa,Pakistan
| |
Collapse
|
2
|
de Montgolfier F, Raffinot M, Rusu I. Easy identification of generalized common and conserved nested intervals. J Comput Biol 2014; 21:520-33. [PMID: 24650221 DOI: 10.1089/cmb.2013.0146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In this article we explain how to easily compute gene clusters, formalized by classical or generalized nested common or conserved intervals, between a set of K genomes represented as K permutations. A b-nested common (resp. conserved) interval I of size |I| is either an interval of size 1 or a common (resp. conserved) interval that contains another b-nested common (resp. conserved) interval of size at least |I|-b. When b=1, this corresponds to the classical notion of nested interval. We exhibit two simple algorithms to output all b-nested common or conserved intervals between K permutations in O(Kn+nocc) time, where nocc is the total number of such intervals. We also explain how to count all b-nested intervals in O(Kn) time. New properties of the family of conserved intervals are proposed to do so.
Collapse
Affiliation(s)
- Fabien de Montgolfier
- 1 Laboratoire d'Informatique Algorithmique: Fondements et Applications, Univ. Paris Diderot , Paris, France
| | | | | |
Collapse
|
3
|
Wittler R, Maňuch J, Patterson M, Stoye J. Consistency of sequence-based gene clusters. J Comput Biol 2012; 18:1023-39. [PMID: 21899413 DOI: 10.1089/cmb.2011.0083] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In comparative genomics, differences or similarities of gene orders are determined to predict functional relations of genes or phylogenetic relations of genomes. For this purpose, various combinatorial models can be used to specify gene clusters--groups of genes that are co-located in a set of genomes. Several approaches have been proposed to reconstruct putative ancestral gene clusters based on the gene order of contemporary species. One prevalent and natural reconstruction criterion is consistency: For a set of reconstructed gene clusters, there should exist a gene order that comprises all given clusters. For permutation-based gene cluster models, efficient methods exist to verify this condition. In this article, we discuss the consistency problem for different gene cluster models on sequences with restricted gene multiplicities. Our results range from linear-time algorithms for the simple model of adjacencies to NP-completeness proofs for more complex models like common intervals.
Collapse
Affiliation(s)
- Roland Wittler
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada.
| | | | | | | |
Collapse
|
4
|
Abstract
The purpose of this chapter is to provide a comprehensive review of the field of genome rearrangement, i.e., comparative genomics, based on the representation of genomes as ordered sequences of signed genes. We specifically focus on the "hard part" of genome rearrangement, how to handle duplicated genes. The main questions are: how have present-day genomes evolved from a common ancestor? What are the most realistic evolutionary scenarios explaining the observed gene orders? What was the content and structure of ancestral genomes? We aim to provide a concise but complete overview of the field, starting with the practical problem of finding an appropriate representation of a genome as a sequence of ordered genes or blocks, namely the problems of orthology, paralogy, and synteny block identification. We then consider three levels of gene organization: the gene family level (evolution by duplication, loss, and speciation), the cluster level (evolution by tandem duplications), and the genome level (all types of rearrangement events, including whole genome duplication).
Collapse
Affiliation(s)
- Nadia El-Mabrouk
- Département d'Informatique et de Recherche Opérationnelle, Université de Montréal, Montréal, QC, Canada
| | | |
Collapse
|
5
|
Grusea S. On the distribution of the number of cycles in the breakpoint graph of a random signed permutation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1411-1416. [PMID: 21116045 DOI: 10.1109/tcbb.2010.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
We use the finite Markov chain embedding technique to obtain the distribution of the number of cycles in the breakpoint graph of a random uniform signed permutation. This further gives a very good approximation of the distribution of the reversal distance between two random genomes.
Collapse
Affiliation(s)
- Simona Grusea
- LATP - UMR CNRS 6632, Equipe EBM, Université de Provence, France.
| |
Collapse
|
6
|
Grusea S, Pardoux E, Chabrol O, Pontarotti P. Compound Poisson Approximation and Testing for Gene Clusters with Multigene Families. J Comput Biol 2011; 18:579-94. [DOI: 10.1089/cmb.2010.0043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
Affiliation(s)
- S. Grusea
- LATP–UMR CNRS 6632, Équipe Évolution Biologique et Modélisation, Université de Provence, Marseille, France
- Institut de Mathématiques de Toulouse, INSA de Toulouse, Université de Toulouse, Toulouse, France
| | - E. Pardoux
- LATP–UMR CNRS 6632, Équipe Évolution Biologique et Modélisation, Université de Provence, Marseille, France
| | - O. Chabrol
- LATP–UMR CNRS 6632, Équipe Évolution Biologique et Modélisation, Université de Provence, Marseille, France
| | - P. Pontarotti
- LATP–UMR CNRS 6632, Équipe Évolution Biologique et Modélisation, Université de Provence, Marseille, France
| |
Collapse
|
7
|
Abstract
Given the gene orders in two modern genomes, it may be difficult to decide if some genes are close enough in both genomes to infer some ancestral proximity or some functional relationship. Current methods all depend on arbitrary parameters. We explore a class of gene proximity criteria and find two kinds of natural values for their parameters. One kind has to do with the parameter value where the expected information contained in two genomes about each other is maximized. The other kind of natural value has to do with parameter values beyond which all genes are clustered. We analyze these using combinatorial and probabilistic arguments as well as simulations.
Collapse
Affiliation(s)
- Zhenyu Yang
- Department of Mathematics and Statistics, University of Ottawa, 585 King Edward Avenue, Ottawa, Ontario, Canada
| | | |
Collapse
|
8
|
Affiliation(s)
- Guillaume Blin
- Université Paris-Est, LIGM–UMR CNRS 8049, Marne-la-vallíe cedex 2, France
| | - David Faye
- Université Paris-Est, LIGM–UMR CNRS 8049, Marne-la-vallíe cedex 2, France
- Université Gaston-Berger, LANI, Saint-Louis, Senegal
| | - Jens Stoye
- Technische Fakultät, Universität Bielefeld, Bielefeld, Germany
| |
Collapse
|
9
|
Zhang M, Leong HW. Gene team tree: a hierarchical representation of gene teams for all gap lengths. J Comput Biol 2009; 16:1383-98. [PMID: 19803736 DOI: 10.1089/cmb.2009.0093] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The identification of spatially co-located gene clusters is an important step towards understanding genome evolution and function. Gene team is a popular model for conserved gene clusters that constrains the maximum distance between adjacent genes in the same cluster. Existing algorithms for finding gene teams require the specification of the maximum allowed distance, delta. However, determining suitable values of delta is non-trivial, due to varying rates of rearrangement and differences in the distribution of genes across multiple genomes. Instead of trying to determine a single best value of delta, we propose constructing the Gene Team Tree, a compact representation of gene teams for all values of delta. The teams computed can then be verified/scored using application specific methods. Our algorithm for computing the GTT extends existing gene team mining algorithms without increasing their time complexity. We compute the GTT for E. coli K-12 and B. subtilis and show that E. coli K-12 operons are modelled by gene teams with different values of delta. We demonstrate the scalability of our method and the trade-off involved when comparing more than two genomes, through a comparative study using five gamma-proteobacteria genomes. Lastly, we describe how to compute the GTT for multi-chromosomal genomes and illustrate by computing the GTT for the human and mouse genomes. An implementation of the algorithms described in this article and the datasets used in the experiments can be downloaded from http://www.comp.nus.edu.sg/~leonghw/GTT .
Collapse
Affiliation(s)
- Melvin Zhang
- Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore, Republic of Singapore
| | | |
Collapse
|
10
|
|
11
|
Durand D, Hoberman R. Diagnosing duplications – can it be done? Trends Genet 2006; 22:156-64. [PMID: 16442663 DOI: 10.1016/j.tig.2006.01.002] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2005] [Revised: 11/30/2005] [Accepted: 01/11/2006] [Indexed: 01/10/2023]
Abstract
New genes arise through duplication and modification of DNA sequences on a range of scales: single gene duplication, duplication of large chromosomal fragments and whole-genome duplication. Each duplication mechanism has specific characteristics that influence the fate of the resulting duplicates, such as the size of the duplicated fragment, the potential for dosage imbalance, the preservation or disruption of regulatory control and genomic context. The ability to diagnose or identify the mechanism that produced a pair of paralogs has the potential to increase our ability to reconstruct evolutionary history, to understand the processes that govern genome evolution and to make functional predictions based on paralogy. The recent availability of large amounts of whole-genome sequence, often from several closely related species, has stimulated a wealth of new computational methods to diagnose gene duplications.
Collapse
Affiliation(s)
- Dannie Durand
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
| | | |
Collapse
|
12
|
Rahmann S, Klau GW. Integer Linear Programs for Discovering Approximate Gene Clusters. ACTA ACUST UNITED AC 2006. [DOI: 10.1007/11851561_28] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2023]
|