1
|
De R, Jani M, Azad RK. DICEP: An integrative approach to augmenting genomic island detection. J Biotechnol 2024; 388:49-58. [PMID: 38641137 DOI: 10.1016/j.jbiotec.2024.04.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 03/17/2024] [Accepted: 04/16/2024] [Indexed: 04/21/2024]
Abstract
Mobilization of clusters of genes called genomic islands (GIs) across bacterial lineages facilitates dissemination of traits, such as, resistance against antibiotics, virulence or hypervirulence, and versatile metabolic capabilities. Robust delineation of GIs is critical to understanding bacterial evolution that has a vast impact on different life forms. Methods for identification of GIs exploit different evolutionary features or signals encoded within the genomes of bacteria, however, the current state-of-the-art in GI detection still leaves much to be desired. Here, we have taken a combinatorial approach that accounted for GI specific features such as compositional bias, aberrant phyletic pattern, and marker gene enrichment within an integrative framework to delineate GIs in bacterial genomes. Our GI prediction tool, DICEP, was assessed on simulated genomes and well-characterized bacterial genomes. DICEP compared favorably with current GI detection tools on real and synthetic datasets.
Collapse
Affiliation(s)
- Ronika De
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX 76203, United States
| | - Mehul Jani
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX 76203, United States
| | - Rajeev K Azad
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX 76203, United States; Department of Mathematics, University of North Texas, Denton, TX 76203, United States.
| |
Collapse
|
2
|
Burks DJ, Pusadkar V, Azad RK. POSMM: an efficient alignment-free metagenomic profiler that complements alignment-based profiling. ENVIRONMENTAL MICROBIOME 2023; 18:16. [PMID: 36890583 PMCID: PMC9993663 DOI: 10.1186/s40793-023-00476-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 02/25/2023] [Indexed: 06/18/2023]
Abstract
We present here POSMM (pronounced 'Possum'), Python-Optimized Standard Markov Model classifier, which is a new incarnation of the Markov model approach to metagenomic sequence analysis. Built on the top of a rapid Markov model based classification algorithm SMM, POSMM reintroduces high sensitivity associated with alignment-free taxonomic classifiers to probe whole genome or metagenome datasets of increasingly prohibitive sizes. Logistic regression models generated and optimized using the Python sklearn library, transform Markov model probabilities to scores suitable for thresholding. Featuring a dynamic database-free approach, models are generated directly from genome fasta files per run, making POSMM a valuable accompaniment to many other programs. By combining POSMM with ultrafast classifiers such as Kraken2, their complementary strengths can be leveraged to produce higher overall accuracy in metagenomic sequence classification than by either as a standalone classifier. POSMM is a user-friendly and highly adaptable tool designed for broad use by the metagenome scientific community.
Collapse
Affiliation(s)
- David J Burks
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX, 76203, USA
| | - Vaidehi Pusadkar
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX, 76203, USA
| | - Rajeev K Azad
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX, 76203, USA.
- Department of Mathematics, University of North Texas, Denton, TX, 76203, USA.
| |
Collapse
|
3
|
Sengupta S, Azad RK. Reconstructing horizontal gene flow network to understand prokaryotic evolution. Open Biol 2022; 12:220169. [PMID: 36446404 PMCID: PMC9708380 DOI: 10.1098/rsob.220169] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Horizontal gene transfer (HGT) is a major source of phenotypic innovation and a mechanism of niche adaptation in prokaryotes. Quantification of HGT is critical to decipher its myriad roles in microbial evolution and adaptation. Advances in genome sequencing and bioinformatics have augmented our ability to understand the microbial world, particularly the direct or indirect influence of HGT on diverse life forms. Methods for detecting HGT can be classified into phylogenetic-based and parametric or composition-based approaches. Here, we exploited the complementary strengths of both the approaches to construct a high confidence horizontal gene flow network. Our network is unique in its ability to detect the transfer of native genes of a genome to genomes from other taxa, thus establishing donor and recipient organisms (taxa), rather than through a post hoc analysis as is the practice with several other approaches. The scale-free horizontal gene flow network presented here provides new insights into modes of transfer for the exchange of genetic information and also illuminates differential gene flow across phyla.
Collapse
Affiliation(s)
- Soham Sengupta
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX 76203, USA
| | - Rajeev K. Azad
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX 76203, USA,Department of Mathematics, University of North Texas, Denton, TX 76203, USA
| |
Collapse
|
4
|
Sengupta S, Azad RK. Reconstructing horizontal gene flow network to understand prokaryotic evolution. Open Biol 2022. [PMID: 36446404 DOI: 10.6084/m9.figshare.c.6307519] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Horizontal gene transfer (HGT) is a major source of phenotypic innovation and a mechanism of niche adaptation in prokaryotes. Quantification of HGT is critical to decipher its myriad roles in microbial evolution and adaptation. Advances in genome sequencing and bioinformatics have augmented our ability to understand the microbial world, particularly the direct or indirect influence of HGT on diverse life forms. Methods for detecting HGT can be classified into phylogenetic-based and parametric or composition-based approaches. Here, we exploited the complementary strengths of both the approaches to construct a high confidence horizontal gene flow network. Our network is unique in its ability to detect the transfer of native genes of a genome to genomes from other taxa, thus establishing donor and recipient organisms (taxa), rather than through a post hoc analysis as is the practice with several other approaches. The scale-free horizontal gene flow network presented here provides new insights into modes of transfer for the exchange of genetic information and also illuminates differential gene flow across phyla.
Collapse
Affiliation(s)
- Soham Sengupta
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX 76203, USA
| | - Rajeev K Azad
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX 76203, USA.,Department of Mathematics, University of North Texas, Denton, TX 76203, USA
| |
Collapse
|
5
|
Burks DJ, Azad RK. Mapping Strengths and Weaknesses of Different Clustering Approaches to Deciphering Bacterial Chimerism. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2022; 26:422-439. [PMID: 35925817 DOI: 10.1089/omi.2022.0062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Bacterial genomes are chimeras of DNA of different ancestries. Deconstructing chimeric genomes is central to understanding the evolutionary trajectories of their disparate components and thus the organisms as a whole in the light of their evolutionary contexts. Of specific interest is to delineate and quantify native (vertically inherited) and alien (horizontally acquired) components of bacterial genomes and also specify genomic fractions that represent different donor sources. An agglomerative clustering procedure that prioritizes grouping of proximal similar genomic segments has previously been invoked for this purpose in conjunction with a recursive segmentation procedure. Surprisingly, however, the relative strengths and weaknesses of different clustering approaches to deciphering bacterial chimerism have not yet been investigated, despite the need to robustly interpret tens of thousands of completely sequenced bacterial genomes and nearly complete genome assemblies available in the public databases. To bridge this knowledge gap and develop more robust approaches, we assessed different clustering methods, including segment order based (proximal) clustering, hierarchical clustering, affinity propagation clustering, and a novel network clustering approach on chimeric genomes modeled after bacterial genomes representing a broad spectrum of compositional complexity. Although segment order-based clustering and network clustering compared favorably with the other approaches in discriminating between native and alien DNA at genome optimized settings, network clustering did consistently better than other methods at parametric settings optimized on all test genomes together. Segment order-based clustering and hierarchical clustering outperformed other methods in alien DNA identification while preserving donor identity in the genomes. Our study highlights the strengths and weaknesses of different approaches and suggests how this can be leveraged to achieve a more robust deconstruction of bacterial chimerism.
Collapse
Affiliation(s)
- David J Burks
- Department of Biological Sciences, BioDiscovery Institute, University of North Texas, Denton, Texas, USA
| | - Rajeev K Azad
- Department of Biological Sciences, BioDiscovery Institute, University of North Texas, Denton, Texas, USA
- Department of Mathematics, University of North Texas, Denton, Texas, USA
| |
Collapse
|
6
|
Pandey RS, Azad RK. Factors That Influence the Choice of Markov Model Order in Discriminating DNA Sequences from Different Sources. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2022; 26:348-355. [PMID: 35648077 DOI: 10.1089/omi.2022.0043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Markov models have frequently been used in genetic sequence analysis. The number of parameters of a Markov model increases exponentially with model order, so it is often recommended that the order be chosen based on the size of data being modeled, lower orders for small and higher orders for large dataset sizes. Approaches based on model selection criterion have also been proposed. An important problem in microbiology and evolutionary biology is to decipher chimeric genomes of microbes, particularly, identify segments of distinct ancestries in genomes and reconstruct the plausible evolutionary scenarios that might have shaped the chimeric genomes in the microbial world. In this study, we assessed a Markov model-based segmentation method for its ability to detect compositionally disparate segments in chimeric sequence constructs as a function of model order, sequence length, and phylogenetic divergence. Our results show that the choice of Markov model order depends on both sequence size and composition. Higher order Markov models were found to be more effective in delineating sequence segments arising from closely related organisms in longer constructs; on the other hand, lower order Markov models were found to be more appropriate in delineating sequence segments arising from distantly related organisms in shorter constructs. These findings are important and timely, with broad implications in fields such as epidemiology that has to deal with the emergence of novel pathogenic chimeras that arise by foreign DNA acquisition, and ecology where chimeric structures may arise in various ecosystems, necessitating more robust approaches for their deconstruction and interpretation.
Collapse
Affiliation(s)
- Ravi S Pandey
- Department of Biological Sciences, BioDiscovery Institute, University of North Texas, Denton, Texas, USA
| | - Rajeev K Azad
- Department of Biological Sciences, BioDiscovery Institute, University of North Texas, Denton, Texas, USA
- Department of Mathematics, University of North Texas, Denton, Texas, USA
| |
Collapse
|
7
|
Pandey RS, Azad RK. A Protocol for Horizontally Acquired Metabolic Gene Detection in Algae. Methods Mol Biol 2022; 2396:61-69. [PMID: 34786676 DOI: 10.1007/978-1-0716-1822-6_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Horizontal gene transfer (HGT) or lateral gene transfer (LGT), the exchange of genetic materials among organisms by means of other than parent-to-offspring (vertical) inheritance, plays a major role in prokaryotic genome evolution, facilitating adaptation of prokaryotes to changes in the environment. Phylogenetic methods have been frequently invoked to catalog horizontally acquired genes; however, these methods are often constrained by the paucity of sequenced genomes of close relatives (and even distant relatives) for a robust analysis and reliable inference. In this chapter, we describe a HGT quantification protocol that exploits the complementary strengths of the integrative segmentation and clustering method and the comparative genomics approach to identify foreign genes. Users can use this pipeline in combination with phylogenetic tree reconstruction to identify foreign genes that are supported by multiple lines of evidence, that is, atypical composition, atypical distribution in close relatives, and aberrant phylogenetic pattern.
Collapse
Affiliation(s)
- Ravi S Pandey
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Rajeev K Azad
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX, USA.
- Department of Mathematics, University of North Texas, Denton, TX, USA.
| |
Collapse
|
8
|
Jiang S, Ren X, Liu S, Lu Z, Xu A, Qin C, Wang Z. Integrated Analysis of the Prognosis-Associated RNA-Binding Protein Genes and Candidate Drugs in Renal Papillary Cell Carcinoma. Front Genet 2021; 12:627508. [PMID: 33643390 PMCID: PMC7907657 DOI: 10.3389/fgene.2021.627508] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 01/20/2021] [Indexed: 12/13/2022] Open
Abstract
RNA-binding proteins (RBPs) play significant roles in various cancer types. However, the functions of RBPs have not been clarified in renal papillary cell carcinoma (pRCC). In this study, we identified 31 downregulated and 89 upregulated differentially expressed RBPs on the basis of the cancer genome atlas (TCGA) database and performed functional enrichment analyses. Subsequently, through univariate Cox, random survival forest, and multivariate Cox regression analysis, six RBPs of SNRPN, RRS1, INTS8, RBPMS2, IGF2BP3, and PIH1D2 were screened out, and the prognostic model was then established. Further analyses revealed that the high-risk group had poor overall survival. The area under the curve values were 0.87 and 0.75 at 3 years and 0.78 and 0.69 at 5 years in the training set and test set, respectively. We then plotted a nomogram on the basis of the six RBPs and tumor stage with the substantiation in the TCGA cohort. Moreover, we selected two intersectant RBPs and evaluate their biological effects by GSEA and predicted three drugs, including STOCK1N-28457, pyrimethamine, and trapidil by using the Connectivity Map. Our research provided a novel insight into pRCC and improved the determination of prognosis and individualized therapeutic strategies.
Collapse
Affiliation(s)
- Silin Jiang
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Xiaohan Ren
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Shouyong Liu
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Zhongwen Lu
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Aiming Xu
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Chao Qin
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Zengjun Wang
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| |
Collapse
|
9
|
Kim M, Lee S, Lim S, Kim S. SpliceHetero: An information theoretic approach for measuring spliceomic intratumor heterogeneity from bulk tumor RNA-seq. PLoS One 2019; 14:e0223520. [PMID: 31644551 PMCID: PMC6808416 DOI: 10.1371/journal.pone.0223520] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Accepted: 09/23/2019] [Indexed: 01/19/2023] Open
Abstract
Motivation Intratumor heterogeneity (ITH) represents the diversity of cell populations that make up cancer tissue. The level of ITH in a tumor is usually measured by a genomic variation profile, such as copy number variation and somatic mutation. However, a recent study has identified ITH at the transcriptome level and suggested that ITH at gene expression levels is useful for predicting prognosis. Measuring ITH levels at the spliceome level is a natural extension. There are serious technical challenges in measuring spliceomic ITH (sITH) from bulk tumor RNA sequencing (RNA-seq) due to the complex splicing patterns. Results We propose an information-theoretic method to measure the sITH of bulk tumors to overcome the above challenges. This method has been extensively tested in experiments using synthetic data, xenograft tumor data, and TCGA pan-cancer data. As a result, we showed that sITH is closely related to cancer progression and clonal heterogeneity, along with clinically significant features such as cancer stage, survival outcome and PAM50 subtype. As far as we know, it is the first study to define ITH at the spliceome level. This method can greatly improve the understanding of cancer spliceome and has great potential as a diagnostic and prognostic tool.
Collapse
Affiliation(s)
- Minsu Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Korea
| | - Sangseon Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Korea
| | - Sangsoo Lim
- Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Korea
- Bioinformatics Institute, Seoul National University, Seoul, 08826, Korea
- * E-mail:
| |
Collapse
|
10
|
IslandCafe: Compositional Anomaly and Feature Enrichment Assessment for Delineation of Genomic Islands. G3-GENES GENOMES GENETICS 2019; 9:3273-3285. [PMID: 31387857 PMCID: PMC6778810 DOI: 10.1534/g3.119.400562] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
One of the evolutionary forces driving bacterial genome evolution is the acquisition of clusters of genes through horizontal gene transfer (HGT). These genomic islands may confer adaptive advantages to the recipient bacteria, such as, the ability to thwart antibiotics, become virulent or hypervirulent, or acquire novel metabolic traits. Methods for detecting genomic islands either search for markers or features typical of islands or examine anomaly in oligonucleotide composition against the genome background. The former tends to underestimate, missing islands that have the markers either lost or degraded, while the latter tends to overestimate, due to their inability to discriminate compositional atypicality arising because of HGT from those that are a consequence of other biological factors. We propose here a framework that exploits the strengths of both these approaches while bypassing the pitfalls of either. Genomic islands lacking markers are identified by their association with genomic islands with markers. This was made possible by performing marker enrichment and phyletic pattern analyses within an integrated framework of recursive segmentation and clustering. The proposed method, IslandCafe, compared favorably with frequently used methods for genomic island detection on synthetic test datasets and on a test-set of known islands from 15 well-characterized bacterial species. Furthermore, IslandCafe identified novel islands with imprints of likely horizontal acquisition.
Collapse
|
11
|
Wasik S, Szostak N, Kudla M, Wachowiak M, Krawiec K, Blazewicz J. Detecting life signatures with RNA sequence similarity measures. J Theor Biol 2018; 463:110-120. [PMID: 30562502 DOI: 10.1016/j.jtbi.2018.12.018] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Revised: 10/25/2018] [Accepted: 12/14/2018] [Indexed: 12/20/2022]
Abstract
The RNA World is currently the most plausible hypothesis for explaining the origins of life on Earth. The supporting body of evidence is growing and it comes from multiple areas, including astrobiology, chemistry, biology, mathematics, and, in particular, from computer simulations. Such methods frequently assume the existence of a hypothetical species on Earth, around three billion years ago, with a base sequence probably dissimilar from any in known genomes. However, it is often hard to verify whether or not a hypothetical sequence has the characteristics of biological sequences, and is thus likely to be functional. The primary objective of the presented research was to verify the possibility of building a computational 'life probe' for determining whether a given genetic sequence is biological, and assessing the sensitivity of such probes to the signatures of life present in known biological sequences. We have proposed decision algorithms based on the normalized compression distance (NCD) and Levenshtein distance (LD). We have validated the proposed method in the context of the RNA World hypothesis using short genetic sequences shorter than the error threshold value (i.e., 100 nucleotides). We have demonstrated that both measures can be successfully used to construct life probes that are significantly better than a random decision procedure, while varying from each other when it comes to detailed characteristics. We also observed that fragments of sequences related to replication have better discriminatory power than sequences having other molecular functions. In a broader context, this shows that the signatures of life in short RNA samples can be effectively detected using relatively simple means.
Collapse
Affiliation(s)
- Szymon Wasik
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland; Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland; European Centre for Bioinformatics and Genomics, Poznan, Poland.
| | - Natalia Szostak
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland; Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland; European Centre for Bioinformatics and Genomics, Poznan, Poland
| | - Mateusz Kudla
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
| | - Michal Wachowiak
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
| | - Krzysztof Krawiec
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
| | - Jacek Blazewicz
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland; Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland; European Centre for Bioinformatics and Genomics, Poznan, Poland
| |
Collapse
|
12
|
Pandey RS, Saxena G, Bhattacharya D, Qiu H, Azad RK. Using complementary approaches to identify trans-domain nuclear gene transfers in the extremophile Galdieria sulphuraria (Rhodophyta). JOURNAL OF PHYCOLOGY 2017; 53:7-11. [PMID: 27704560 DOI: 10.1111/jpy.12466] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Accepted: 09/05/2016] [Indexed: 06/06/2023]
Abstract
Identification of horizontal gene transfers (HGTs) has primarily relied on phylogenetic tree based methods, which require a rich sampling of sequenced genomes to ensure a reliable inference. Because the success of phylogenetic approaches depends on the breadth and depth of the database, researchers usually apply stringent filters to detect only the most likely gene transfers in the genomes of interest. One such study focused on a highly conservative estimate of trans-domain gene transfers in the extremophile eukaryote, Galdieria sulphuraria (Galdieri) Merola (Rhodophyta), by applying multiple filters in their phylogenetic pipeline. This led to the identification of 75 inter-domain acquisitions from Bacteria or Archaea. Because of the evolutionary, ecological, and potential biotechnological significance of foreign genes in algae, alternative approaches and pipelines complementing phylogenetics are needed for a more comprehensive assessment of HGT. We present here a novel pipeline that uncovered 17 novel foreign genes of prokaryotic origin in G. sulphuraria, results that are supported by multiple lines of evidence including composition-based, comparative data, and phylogenetics. These genes encode a variety of potentially adaptive functions, from metabolite transport to DNA repair.
Collapse
Affiliation(s)
- Ravi S Pandey
- Department of Biological Sciences, University of North Texas, Denton, Texas, USA
| | - Garima Saxena
- Department of Biological Sciences, University of North Texas, Denton, Texas, USA
| | - Debashish Bhattacharya
- Department of Ecology, Evolution, and Natural Resources, Rutgers University, New Brunswick, New Jersey, USA
| | - Huan Qiu
- Department of Ecology, Evolution, and Natural Resources, Rutgers University, New Brunswick, New Jersey, USA
| | - Rajeev K Azad
- Department of Biological Sciences, University of North Texas, Denton, Texas, USA
- Department of Mathematics, University of North Texas, Denton, Texas, USA
| |
Collapse
|
13
|
Jani M, Mathee K, Azad RK. Identification of Novel Genomic Islands in Liverpool Epidemic Strain of Pseudomonas aeruginosa Using Segmentation and Clustering. Front Microbiol 2016; 7:1210. [PMID: 27536294 PMCID: PMC4971588 DOI: 10.3389/fmicb.2016.01210] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Accepted: 07/20/2016] [Indexed: 02/03/2023] Open
Abstract
Pseudomonas aeruginosa is an opportunistic pathogen implicated in a myriad of infections and a leading pathogen responsible for mortality in patients with cystic fibrosis (CF). Horizontal transfers of genes among the microorganisms living within CF patients have led to highly virulent and multi-drug resistant strains such as the Liverpool epidemic strain of P. aeruginosa, namely the LESB58 strain that has the propensity to acquire virulence and antibiotic resistance genes. Often these genes are acquired in large clusters, referred to as "genomic islands (GIs)." To decipher GIs and understand their contributions to the evolution of virulence and antibiotic resistance in P. aeruginosa LESB58, we utilized a recursive segmentation and clustering procedure, presented here as a genome-mining tool, "GEMINI." GEMINI was validated on experimentally verified islands in the LESB58 strain before examining its potential to decipher novel islands. Of the 6062 genes in P. aeruginosa LESB58, 596 genes were identified to be resident on 20 GIs of which 12 have not been previously reported. Comparative genomics provided evidence in support of our novel predictions. Furthermore, GEMINI unraveled the mosaic structure of islands that are composed of segments of likely different evolutionary origins, and demonstrated its ability to identify potential strain biomarkers. These newly found islands likely have contributed to the hyper-virulence and multidrug resistance of the Liverpool epidemic strain of P. aeruginosa.
Collapse
Affiliation(s)
- Mehul Jani
- Department of Biological Sciences, University of North Texas Denton, TX, USA
| | - Kalai Mathee
- Department of Human and Molecular Genetics, Herbert Wertheim College of Medicine Global Health Consortium, and Biomolecular Sciences Institute, Florida International University Miami, FL, USA
| | - Rajeev K Azad
- Department of Biological Sciences, University of North TexasDenton, TX, USA; Department of Mathematics, University of North TexasDenton, TX, USA
| |
Collapse
|
14
|
Pandey RS, Azad RK. Deciphering evolutionary strata on plant sex chromosomes and fungal mating-type chromosomes through compositional segmentation. PLANT MOLECULAR BIOLOGY 2016; 90:359-373. [PMID: 26694866 DOI: 10.1007/s11103-015-0422-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2015] [Accepted: 12/15/2015] [Indexed: 06/05/2023]
Abstract
Sex chromosomes have evolved from a pair of homologous autosomes which differentiated into sex determination systems, such as XY or ZW system, as a consequence of successive recombination suppression between the gametologous chromosomes. Identifying the regions of recombination suppression, namely, the "evolutionary strata", is central to understanding the history and dynamics of sex chromosome evolution. Evolution of sex chromosomes as a consequence of serial recombination suppressions is well-studied for mammals and birds, but not for plants, although 48 dioecious plants have already been reported. Only two plants Silene latifolia and papaya have been studied until now for the presence of evolutionary strata on their X chromosomes, made possible by the sequencing of sex-linked genes on both the X and Y chromosomes, which is a requirement of all current methods that determine stratum structure based on the comparison of gametologous sex chromosomes. To circumvent this limitation and detect strata even if only the sequence of sex chromosome in the homogametic sex (i.e. X or Z chromosome) is available, we have developed an integrated segmentation and clustering method. In application to gene sequences on the papaya X chromosome and protein-coding sequences on the S. latifolia X chromosome, our method could decipher all known evolutionary strata, as reported by previous studies. Our method, after validating on known strata on the papaya and S. latifolia X chromosome, was applied to the chromosome 19 of Populus trichocarpa, an incipient sex chromosome, deciphering two, yet unknown, evolutionary strata. In addition, we applied this approach to the recently sequenced sex chromosome V of the brown alga Ectocarpus sp. that has a haploid sex determination system (UV system) recovering the sex determining and pseudoautosomal regions, and then to the mating-type chromosomes of an anther-smut fungus Microbotryum lychnidis-dioicae predicting five strata in the non-recombining region of both the chromosomes.
Collapse
Affiliation(s)
- Ravi S Pandey
- Department of Biological Sciences, University of North Texas, Denton, TX, USA
| | - Rajeev K Azad
- Department of Biological Sciences, University of North Texas, Denton, TX, USA.
- Department of Mathematics, University of North Texas, Denton, TX, USA.
| |
Collapse
|
15
|
Lalli MA, Jang J, Park JHC, Wang Y, Guzman E, Zhou H, Audouard M, Bridges D, Tovar KR, Papuc SM, Tutulan-Cunita AC, Huang Y, Budisteanu M, Arghir A, Kosik KS. Haploinsufficiency of BAZ1B contributes to Williams syndrome through transcriptional dysregulation of neurodevelopmental pathways. Hum Mol Genet 2016; 25:1294-306. [PMID: 26755828 DOI: 10.1093/hmg/ddw010] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2015] [Accepted: 01/07/2016] [Indexed: 12/31/2022] Open
Abstract
Williams syndrome (WS) is a neurodevelopmental disorder caused by a genomic deletion of ∼28 genes that results in a cognitive and behavioral profile marked by overall intellectual impairment with relative strength in expressive language and hypersocial behavior. Advancements in protocols for neuron differentiation from induced pluripotent stem cells allowed us to elucidate the molecular circuitry underpinning the ontogeny of WS. In patient-derived stem cells and neurons, we determined the expression profile of the Williams-Beuren syndrome critical region-deleted genes and the genome-wide transcriptional consequences of the hemizygous genomic microdeletion at chromosome 7q11.23. Derived neurons displayed disease-relevant hallmarks and indicated novel aberrant pathways in WS neurons including over-activated Wnt signaling accompanying an incomplete neurogenic commitment. We show that haploinsufficiency of the ATP-dependent chromatin remodeler, BAZ1B, which is deleted in WS, significantly contributes to this differentiation defect. Chromatin-immunoprecipitation (ChIP-seq) revealed BAZ1B target gene functions are enriched for neurogenesis, neuron differentiation and disease-relevant phenotypes. BAZ1B haploinsufficiency caused widespread gene expression changes in neural progenitor cells, and together with BAZ1B ChIP-seq target genes, explained 42% of the transcriptional dysregulation in WS neurons. BAZ1B contributes to regulating the balance between neural precursor self-renewal and differentiation and the differentiation defect caused by BAZ1B haploinsufficiency can be rescued by mitigating over-active Wnt signaling in neural stem cells. Altogether, these results reveal a pivotal role for BAZ1B in neurodevelopment and implicate its haploinsufficiency as a likely contributor to the neurological phenotypes in WS.
Collapse
Affiliation(s)
- Matthew A Lalli
- Department of Molecular, Cellular, and Developmental Biology, Neuroscience Research Institute, Biomolecular Science and Engineering Program
| | - Jiwon Jang
- Department of Molecular, Cellular, and Developmental Biology, Neuroscience Research Institute
| | - Joo-Hye C Park
- Department of Molecular, Cellular, and Developmental Biology, Neuroscience Research Institute
| | - Yidi Wang
- Department of Molecular, Cellular, and Developmental Biology, Neuroscience Research Institute
| | - Elmer Guzman
- Department of Molecular, Cellular, and Developmental Biology, Neuroscience Research Institute
| | - Hongjun Zhou
- Department of Molecular, Cellular, and Developmental Biology, Neuroscience Research Institute
| | - Morgane Audouard
- Department of Molecular, Cellular, and Developmental Biology, Neuroscience Research Institute
| | - Daniel Bridges
- Department of Molecular, Cellular, and Developmental Biology, Neuroscience Research Institute, Department of Physics, University of California, Santa Barbara, CA, USA
| | - Kenneth R Tovar
- Department of Molecular, Cellular, and Developmental Biology, Neuroscience Research Institute
| | - Sorina M Papuc
- Victor Babes National Institute of Pathology, Clinical Cytogenetics, Bucharest, Romania
| | | | - Yadong Huang
- Gladstone Institute of Neurological Disease, University of California, San Francisco, CA, USA and
| | - Magdalena Budisteanu
- Victor Babes National Institute of Pathology, Clinical Cytogenetics, Bucharest, Romania, Alexandru Obregia Clinical Hospital of Psychiatry, Neuropediatric Pathology, Bucharest, Romania
| | - Aurora Arghir
- Victor Babes National Institute of Pathology, Clinical Cytogenetics, Bucharest, Romania
| | - Kenneth S Kosik
- Department of Molecular, Cellular, and Developmental Biology, Neuroscience Research Institute, Biomolecular Science and Engineering Program,
| |
Collapse
|
16
|
Algama M, Keith JM. Investigating genomic structure using changept: A Bayesian segmentation model. Comput Struct Biotechnol J 2014; 10:107-15. [PMID: 25349679 PMCID: PMC4204429 DOI: 10.1016/j.csbj.2014.08.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Genomes are composed of a wide variety of elements with distinct roles and characteristics. Some of these elements are well-characterised functional components such as protein-coding exons. Other elements play regulatory or structural roles, encode functional non-protein-coding RNAs, or perform some other function yet to be characterised. Still others may have no functional importance, though they may nevertheless be of interest to biologists. One technique for investigating the composition of genomes is to segment sequences into compositionally homogenous blocks. This technique, known as 'sequence segmentation' or 'change-point analysis', is used to identify patterns of variation across genomes such as GC-rich and GC-poor regions, coding and non-coding regions, slowly evolving and rapidly evolving regions and many other types of variation. In this mini-review we outline many of the genome segmentation methods currently available and then focus on a Bayesian DNA segmentation algorithm, with examples of its various applications.
Collapse
Affiliation(s)
- Manjula Algama
- School of Mathematical Sciences, Monash University, Clayton, VIC 3800, Australia
| | - Jonathan M Keith
- School of Mathematical Sciences, Monash University, Clayton, VIC 3800, Australia
| |
Collapse
|
17
|
Arighi CN, Wu CH, Cohen KB, Hirschman L, Krallinger M, Valencia A, Lu Z, Wilbur JW, Wiegers TC. BioCreative-IV virtual issue. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau039. [PMID: 24852177 PMCID: PMC4030502 DOI: 10.1093/database/bau039] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Affiliation(s)
- Cecilia N Arighi
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Cathy H Wu
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Kevin B Cohen
- Center for Computational Pharmacology, University of Colorado Denver School of Medicine, Aurora, CO, USA
| | | | - Martin Krallinger
- Structural and Computational Biology Group, Spanish National Cancer Research Centre, Madrid, Spain
| | - Alfonso Valencia
- Structural and Computational Biology Group, Spanish National Cancer Research Centre, Madrid, Spain
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA
| | - John W Wilbur
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA
| | - Thomas C Wiegers
- Department of Biological Sciences, North Carolina State University, Raleigh, NC, USA
| |
Collapse
|
18
|
Pandey RS, Wilson Sayres MA, Azad RK. Detecting evolutionary strata on the human x chromosome in the absence of gametologous y-linked sequences. Genome Biol Evol 2014; 5:1863-71. [PMID: 24036954 PMCID: PMC3814197 DOI: 10.1093/gbe/evt139] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Mammalian sex chromosomes arose from a pair of homologous autosomes that differentiated into the X and Y chromosomes following a series of recombination suppression events between the X and Y. The stepwise recombination suppressions from the distal long arm to the distal short arm of the chromosomes are reflected as regions with distinct X-Y divergence, referred to as evolutionary strata on the X. All current methods for stratum detection depend on X-Y comparisons but are severely limited by the paucity of X-Y gametologs. We have developed an integrative method that combines a top-down, recursive segmentation algorithm with a bottom-up, agglomerative clustering algorithm to decipher compositionally distinct regions on the X, which reflect regions of unique X-Y divergence. In application to human X chromosome, our method correctly classified a concatenated set of 35 previously assayed X-linked gene sequences by evolutionary strata. We then extended our analysis, applying this method to the entire sequence of the human X chromosome, in an effort to define stratum boundaries. The boundaries of more recently formed strata on X-added region, namely the fourth and fifth strata, have been defined by previous studies and are recapitulated with our method. The older strata, from the first up to the third stratum, have remained poorly resolved due to paucity of X-Y gametologs. By analyzing the entire X sequence, our method identified seven evolutionary strata in these ancient regions, where only three could previously be assayed, thus demonstrating the robustness of our method in detecting the evolutionary strata.
Collapse
|
19
|
Ré MA, Azad RK. Generalization of entropy based divergence measures for symbolic sequence analysis. PLoS One 2014; 9:e93532. [PMID: 24728338 PMCID: PMC3984095 DOI: 10.1371/journal.pone.0093532] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2013] [Accepted: 03/04/2014] [Indexed: 11/26/2022] Open
Abstract
Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms.
Collapse
Affiliation(s)
- Miguel A. Ré
- Departamento de Ciencias Básicas, CIII - Facultad Regional Córdoba, Universidad Tecnológica Nacional, Córdoba, Argentina
- Facultad de Matemática, Astronomía y Física, Universidad Nacional de Córdoba, Córdoba, Argentina
| | - Rajeev K. Azad
- Department of Biological Sciences, University of North Texas, Denton, Texas, United States of America
- Department of Mathematics, University of North Texas, Denton, Texas, United States of America
- * E-mail:
| |
Collapse
|
20
|
Abstract
A plethora of biologically useful information lies obscured in the genomes of organisms. Encoded within the genome of an organism is the information about its evolutionary history. Evolutionary signals are scattered throughout the genome. Bioinformatics approaches are frequently invoked to deconstruct the evolutionary patterns underlying genomes, which are difficult to decipher using traditional laboratory experiments. However, interpreting constantly evolving genomes is a non-trivial task for bioinformaticians. Processes such as mutations, recombinations, insertions and deletions make genomes not only heterogeneous and difficult to decipher but also renders direct sequence comparison less effective. Here we present a brief overview of the sequence comparison methods with a focus on recently proposed alignment-free sequence comparison methods based on Shannon information entropy. Many of these sequence comparison methods have been adapted to construct phylogenetic trees to infer relationships among organisms.
Collapse
Affiliation(s)
- Mehul Jani
- University of North Texas, Denton, Texas
| | | |
Collapse
|