1
|
Liu N, Maser E, Zhang T. Genomic analysis of Gordonia polyisoprenivorans strain R9, a highly effective 17 beta-estradiol- and steroid-degrading bacterium. Chem Biol Interact 2021; 350:109685. [PMID: 34653397 DOI: 10.1016/j.cbi.2021.109685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 09/25/2021] [Accepted: 10/02/2021] [Indexed: 10/20/2022]
Abstract
The increasing levels of estrogens and pollution by other steroids pose considerable challenges to the environment. In this study, the genome of Gordonia polyisoprenivorans strain R9, one of the most effective 17 beta-estradiol- and steroid-degrading bacteria, was sequenced and annotated. The circular chromosome of G. polyisoprenivorans R9 was 6,033,879 bp in size, with an average GC content of 66.91%. More so, 5213 putative protein-coding sequences, 9 rRNA, 49 tRNA, and 3 sRNA genes were predicted. The core-pan gene evolutionary tree for the genus Gordonia showed that G. polyisoprenivorans R9 is clustered with G. polyisoprenivorans VH2 and G. polyisoprenivorans C, with 93.75% and 93.8% similarity to these two strains, respectively. Altogether, the three G. polyisoprenivorans strains contained 3890 core gene clusters. Strain R9 contained 785 specific gene clusters, while 501 and 474 specific gene clusters were identified in strains VH2 and C, respectively. Furthermore, whole genome analysis revealed the existence of the steroids and estrogens degradation pathway in the core genome of all three G. polyisoprenivorans strains, although the G. polyisoprenivorans R9 genome contained more specific estrogen and steroid degradation genes. In strain R9, 207 ABC transporters, 95 short-chain dehydrogenases (SDRs), 26 monooxygenases, 21 dioxygenases, 7 aromatic ring-hydroxylating dioxygenases, and 3 CoA esters were identified, and these are very important for estrogen and steroid transport, and degradation. The results of this study could enhance our understanding of the role of G. polyisoprenivorans R9 in estradiol and steroid degradation as well as evolution within the G. polyisoprenivorans species.
Collapse
Affiliation(s)
- Na Liu
- Key Laboratory of Groundwater Resources and Environment (Jilin University), Ministry of Education, Jilin Provincial Key Laboratory of Water Resources and Environment, College of New Energy and Environment, Jilin University, Changchun 130021, China
| | - Edmund Maser
- Institute of Toxicology and Pharmacology for Natural Scientists, University Medical School, Schleswig-Holstein, Campus Kiel, Brunswiker Str. 10, D-24105 Kiel, Germany
| | - Tingdi Zhang
- Key Laboratory of Groundwater Resources and Environment (Jilin University), Ministry of Education, Jilin Provincial Key Laboratory of Water Resources and Environment, College of New Energy and Environment, Jilin University, Changchun 130021, China.
| |
Collapse
|
2
|
Feau N, Beauseigle S, Bergeron MJ, Bilodeau GJ, Birol I, Cervantes-Arango S, Dhillon B, Dale AL, Herath P, Jones SJ, Lamarche J, Ojeda DI, Sakalidis ML, Taylor G, Tsui CK, Uzunovic A, Yueh H, Tanguay P, Hamelin RC. Genome-Enhanced Detection and Identification (GEDI) of plant pathogens. PeerJ 2018; 6:e4392. [PMID: 29492338 PMCID: PMC5825881 DOI: 10.7717/peerj.4392] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Accepted: 01/29/2018] [Indexed: 12/17/2022] Open
Abstract
Plant diseases caused by fungi and Oomycetes represent worldwide threats to crops and forest ecosystems. Effective prevention and appropriate management of emerging diseases rely on rapid detection and identification of the causal pathogens. The increase in genomic resources makes it possible to generate novel genome-enhanced DNA detection assays that can exploit whole genomes to discover candidate genes for pathogen detection. A pipeline was developed to identify genome regions that discriminate taxa or groups of taxa and can be converted into PCR assays. The modular pipeline is comprised of four components: (1) selection and genome sequencing of phylogenetically related taxa, (2) identification of clusters of orthologous genes, (3) elimination of false positives by filtering, and (4) assay design. This pipeline was applied to some of the most important plant pathogens across three broad taxonomic groups: Phytophthoras (Stramenopiles, Oomycota), Dothideomycetes (Fungi, Ascomycota) and Pucciniales (Fungi, Basidiomycota). Comparison of 73 fungal and Oomycete genomes led the discovery of 5,939 gene clusters that were unique to the targeted taxa and an additional 535 that were common at higher taxonomic levels. Approximately 28% of the 299 tested were converted into qPCR assays that met our set of specificity criteria. This work demonstrates that a genome-wide approach can efficiently identify multiple taxon-specific genome regions that can be converted into highly specific PCR assays. The possibility to easily obtain multiple alternative regions to design highly specific qPCR assays should be of great help in tackling challenging cases for which higher taxon-resolution is needed.
Collapse
Affiliation(s)
- Nicolas Feau
- Department of Forest and Conservation Sciences, Forest Sciences Centre, University of British Columbia, Vancouver, BC, Canada
| | | | | | | | - Inanc Birol
- BC Cancer agency, Genome Sciences Centre, Vancouver, BC, Canada
| | - Sandra Cervantes-Arango
- Department of Forest and Conservation Sciences, Forest Sciences Centre, University of British Columbia, Vancouver, BC, Canada
| | - Braham Dhillon
- Department of Plant Pathology, University of Arkansas at Fayetteville, Fayetteville, AR, United States of America
| | - Angela L. Dale
- Department of Forest and Conservation Sciences, Forest Sciences Centre, University of British Columbia, Vancouver, BC, Canada
- FPInnovations, Vancouver, BC, Canada
| | - Padmini Herath
- Department of Forest and Conservation Sciences, Forest Sciences Centre, University of British Columbia, Vancouver, BC, Canada
| | - Steven J.M. Jones
- BC Cancer agency, Genome Sciences Centre, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Josyanne Lamarche
- Canadian Forest Service, Natural Resources Canada, Quebec city, Quebec, Canada
| | - Dario I. Ojeda
- Department of Biology Unit of Ecology and Genetics, University of Oulu, Oulu, Finland
| | - Monique L. Sakalidis
- Department of Plant, Soil & Microbial Sciences and Department of Forestry, Michigan State University, East Lansing, MI, United States of America
| | - Greg Taylor
- BC Cancer agency, Genome Sciences Centre, Vancouver, BC, Canada
| | - Clement K.M. Tsui
- Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | | | - Hesther Yueh
- Department of Forest and Conservation Sciences, Forest Sciences Centre, University of British Columbia, Vancouver, BC, Canada
| | - Philippe Tanguay
- Canadian Forest Service, Natural Resources Canada, Quebec city, Quebec, Canada
| | - Richard C. Hamelin
- Department of Forest and Conservation Sciences, Forest Sciences Centre, University of British Columbia, Vancouver, BC, Canada
- Foresterie et géomatique, Institut de Biologie Intégrative des Systèmes, Laval University, Quebec city, Quebec, Canada
| |
Collapse
|
3
|
Gong C, Chen H, He W, Zhang Z. Improved multi-objective clustering algorithm using particle swarm optimization. PLoS One 2017; 12:e0188815. [PMID: 29206880 PMCID: PMC5716574 DOI: 10.1371/journal.pone.0188815] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2017] [Accepted: 09/11/2017] [Indexed: 11/20/2022] Open
Abstract
Multi-objective clustering has received widespread attention recently, as it can obtain more accurate and reasonable solution. In this paper, an improved multi-objective clustering framework using particle swarm optimization (IMCPSO) is proposed. Firstly, a novel particle representation for clustering problem is designed to help PSO search clustering solutions in continuous space. Secondly, the distribution of Pareto set is analyzed. The analysis results are applied to the leader selection strategy, and make algorithm avoid trapping in local optimum. Moreover, a clustering solution-improved method is proposed, which can increase the efficiency in searching clustering solution greatly. In the experiments, 28 datasets are used and nine state-of-the-art clustering algorithms are compared, the proposed method is superior to other approaches in the evaluation index ARI.
Collapse
Affiliation(s)
- Congcong Gong
- PLA University of Science and Technology, Nanjing, PR China
| | - Haisong Chen
- PLA University of Science and Technology, Nanjing, PR China
- * E-mail:
| | - Weixiong He
- PLA University of Science and Technology, Nanjing, PR China
| | | |
Collapse
|
4
|
Szilágyi L, Szilágyi SM. A modified two-stage Markov clustering algorithm for large and sparse networks. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2016; 135:15-26. [PMID: 27586476 DOI: 10.1016/j.cmpb.2016.07.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2016] [Revised: 05/27/2016] [Accepted: 07/01/2016] [Indexed: 06/06/2023]
Abstract
BACKGROUND Graph-based hierarchical clustering algorithms become prohibitively costly in both execution time and storage space, as the number of nodes approaches the order of millions. OBJECTIVE A fast and highly memory efficient Markov clustering algorithm is proposed to perform the classification of huge sparse networks using an ordinary personal computer. METHODS Improvements compared to previous versions are achieved through adequately chosen data structures that facilitate the efficient handling of symmetric sparse matrices. Clustering is performed in two stages: the initial connected network is processed in a sparse matrix until it breaks into isolated, small, and relatively dense subgraphs, which are then processed separately until convergence is obtained. An intelligent stopping criterion is also proposed to quit further processing of a subgraph that tends toward completeness with equal edge weights. The main advantage of this algorithm is that the necessary number of iterations is separately decided for each graph node. RESULTS The proposed algorithm was tested using the SCOP95 and large synthetic protein sequence data sets. The validation process revealed that the proposed method can reduce 3-6 times the processing time of huge sequence networks compared to previous Markov clustering solutions, without losing anything from the partition quality. CONCLUSIONS A one-million-node and one-billion-edge protein sequence network defined by a BLAST similarity matrix can be processed with an upper-class personal computer in 100 minutes. Further improvement in speed is possible via parallel data processing, while the extension toward several million nodes needs intermediary data storage, for example on solid state drives.
Collapse
Affiliation(s)
- László Szilágyi
- Faculty of Technical and Human Sciences, Sapientia University of Transylvania,Şoseaua Sighişoarei 1/C, 540485 Tîrgu Mureş, Romania; Department of Informatics, Petru Maior University, Str. N. Iorga Nr. 1, 540088 Tîrgu Mureş, Romania
| | - Sándor M Szilágyi
- Budapest University of Technology and Economics, Department of Control Engineering and Information Technology, Magyar tudósok krt. 2, H-1117 Budapest, Hungary; Department of Informatics, Petru Maior University, Str. N. Iorga Nr. 1, 540088 Tîrgu Mureş, Romania.
| |
Collapse
|
5
|
Gibbons TR, Mount SM, Cooper ED, Delwiche CF. Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm. BMC Bioinformatics 2015; 16:218. [PMID: 26160651 PMCID: PMC4496851 DOI: 10.1186/s12859-015-0625-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2014] [Accepted: 05/20/2015] [Indexed: 11/10/2022] Open
Abstract
Background Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. Results All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. Conclusions The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0625-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Theodore R Gibbons
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Baltimore, 20742, Maryland.
| | - Stephen M Mount
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Baltimore, 20742, Maryland. .,Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Baltimore, 20742, Maryland.
| | - Endymion D Cooper
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Baltimore, 20742, Maryland.
| | - Charles F Delwiche
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Baltimore, 20742, Maryland. .,Maryland Agricultural Experiment Station, University of Maryland, College Park, Baltimore, 20742, Maryland.
| |
Collapse
|