1
|
Prokaryotic phylogenies inferred from whole-genome sequence and annotation data. BIOMED RESEARCH INTERNATIONAL 2013; 2013:409062. [PMID: 24073404 PMCID: PMC3773407 DOI: 10.1155/2013/409062] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Revised: 06/26/2013] [Accepted: 07/22/2013] [Indexed: 11/25/2022]
Abstract
Phylogenetic trees are used to represent the evolutionary relationship among various groups of species. In this paper, a novel method for inferring prokaryotic phylogenies using multiple genomic information is proposed. The method is called CGCPhy and based on the distance matrix of orthologous gene clusters between whole-genome pairs. CGCPhy comprises four main steps. First, orthologous genes are determined by sequence similarity, genomic function, and genomic structure information. Second, genes involving potential HGT events are eliminated, since such genes are considered to be the highly conserved genes across different species and the genes located on fragments with abnormal genome barcode. Third, we calculate the distance of the orthologous gene clusters between each genome pair in terms of the number of orthologous genes in conserved clusters. Finally, the neighbor-joining method is employed to construct phylogenetic trees across different species. CGCPhy has been examined on different datasets from 617 complete single-chromosome prokaryotic genomes and achieved applicative accuracies on different species sets in agreement with Bergey's taxonomy in quartet topologies. Simulation results show that CGCPhy achieves high average accuracy and has a low standard deviation on different datasets, so it has an applicative potential for phylogenetic analysis.
Collapse
|
2
|
Sharma R, Evans PA, Bhavsar VC. Regulatory link mapping between organisms. BMC SYSTEMS BIOLOGY 2011; 5 Suppl 1:S4. [PMID: 21689479 PMCID: PMC3121120 DOI: 10.1186/1752-0509-5-s1-s4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Background Identification of gene regulatory networks is useful in understanding gene regulation in any organism. Some regulatory network information has already been determined experimentally for model organisms, but much less has been identified for non-model organisms, and the limited amount of gene expression data available for non-model organisms makes inference of regulatory networks difficult. Results This paper proposes a method to determine the regulatory links that can be mapped from a model to a non-model organism. Mapping a regulatory network involves mapping the transcription factors and target genes from one genome to another. In the proposed method, Basic Local Alignment Search Tool (BLAST) and InterProScan are used to map the transcription factors, whereas BLAST along with transcription factor binding site motifs and the GALF-P tool are used to map the target genes. Experiments are performed to map the regulatory network data of S. cerevisiae to A. thaliana and analyze the results. Since limited information is available about gene regulatory network links, gene expression data is used to analyze results. A set of rules are defined on the gene expression experiments to identify the predicted regulatory links that are well supported. Conclusions Combining transcription factors mapped using BLAST and subfamily classification, together with target genes mapped using BLAST and binding site motifs, produced the best regulatory link predictions. More than two-thirds of these predicted regulatory links that were analyzed using gene expression data have been verified as correctly mapped regulatory links in the target genome.
Collapse
Affiliation(s)
- Rachita Sharma
- Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada.
| | | | | |
Collapse
|
3
|
Liu Y, Li J, Sam L, Goh CS, Gerstein M, Lussier YA. An integrative genomic approach to uncover molecular mechanisms of prokaryotic traits. PLoS Comput Biol 2006; 2:e159. [PMID: 17112314 PMCID: PMC1636675 DOI: 10.1371/journal.pcbi.0020159] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2006] [Accepted: 10/10/2006] [Indexed: 11/18/2022] Open
Abstract
With mounting availability of genomic and phenotypic databases, data integration and mining become increasingly challenging. While efforts have been put forward to analyze prokaryotic phenotypes, current computational technologies either lack high throughput capacity for genomic scale analysis, or are limited in their capability to integrate and mine data across different scales of biology. Consequently, simultaneous analysis of associations among genomes, phenotypes, and gene functions is prohibited. Here, we developed a high throughput computational approach, and demonstrated for the first time the feasibility of integrating large quantities of prokaryotic phenotypes along with genomic datasets for mining across multiple scales of biology (protein domains, pathways, molecular functions, and cellular processes). Applying this method over 59 fully sequenced prokaryotic species, we identified genetic basis and molecular mechanisms underlying the phenotypes in bacteria. We identified 3,711 significant correlations between 1,499 distinct Pfam and 63 phenotypes, with 2,650 correlations and 1,061 anti-correlations. Manual evaluation of a random sample of these significant correlations showed a minimal precision of 30% (95% confidence interval: 20%-42%; n = 50). We stratified the most significant 478 predictions and subjected 100 to manual evaluation, of which 60 were corroborated in the literature. We furthermore unveiled 10 significant correlations between phenotypes and KEGG pathways, eight of which were corroborated in the evaluation, and 309 significant correlations between phenotypes and 166 GO concepts evaluated using a random sample (minimal precision = 72%; 95% confidence interval: 60%-80%; n = 50). Additionally, we conducted a novel large-scale phenomic visualization analysis to provide insight into the modular nature of common molecular mechanisms spanning multiple biological scales and reused by related phenotypes (metaphenotypes). We propose that this method elucidates which classes of molecular mechanisms are associated with phenotypes or metaphenotypes and holds promise in facilitating a computable systems biology approach to genomic and biomedical research.
Collapse
Affiliation(s)
- Yang Liu
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
- Center for Biomedical Informatics, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Jianrong Li
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
- Center for Biomedical Informatics, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Lee Sam
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
- Center for Biomedical Informatics, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Chern-Sing Goh
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Department of Computer Science, Yale University, New Haven, Connecticut, United States of America
- * To whom correspondence should be addressed. E-mail: (MG); (YAL)
| | - Yves A Lussier
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
- Center for Biomedical Informatics, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
- Department of Biomedical Informatics, Columbia University, New York, New York, United States of America
- * To whom correspondence should be addressed. E-mail: (MG); (YAL)
| |
Collapse
|
4
|
Abstract
We present a study on computational identification of uber-operons in a prokaryotic genome, each of which represents a group of operons that are evolutionarily or functionally associated through operons in other (reference) genomes. Uber-operons represent a rich set of footprints of operon evolution, whose full utilization could lead to new and more powerful tools for elucidation of biological pathways and networks than what operons have provided, and a better understanding of prokaryotic genome structures and evolution. Our prediction algorithm predicts uber-operons through identifying groups of functionally or transcriptionally related operons, whose gene sets are conserved across the target and multiple reference genomes. Using this algorithm, we have predicted uber-operons for each of a group of 91 genomes, using the other 90 genomes as references. In particular, we predicted 158 uber-operons in Escherichia coli K12 covering 1830 genes, and found that many of the uber-operons correspond to parts of known regulons or biological pathways or are involved in highly related biological processes based on their Gene Ontology (GO) assignments. For some of the predicted uber-operons that are not parts of known regulons or pathways, our analyses indicate that their genes are highly likely to work together in the same biological processes, suggesting the possibility of new regulons and pathways. We believe that our uber-operon prediction provides a highly useful capability and a rich information source for elucidation of complex biological processes, such as pathways in microbes. All the prediction results are available at our Uber-Operon Database: , the first of its kind.
Collapse
Affiliation(s)
- Dongsheng Che
- Department of Computer Science, University of GeorgiaUSA
| | - Guojun Li
- Department of Biochemistry and Molecular Biology, University of GeorgiaUSA
- School of Mathematics and System Sciences, Shandong UniversityChina
| | - Fenglou Mao
- Department of Biochemistry and Molecular Biology, University of GeorgiaUSA
| | - Hongwei Wu
- Department of Biochemistry and Molecular Biology, University of GeorgiaUSA
| | - Ying Xu
- Department of Biochemistry and Molecular Biology, University of GeorgiaUSA
- Department of Computer Science, University of GeorgiaUSA
- To whom correspondence should be addressed. Tel: 1 706 542 9779; Fax: 1 706 542 9751; Ying Xu
| |
Collapse
|