1
|
Doonan J, Denman S, Pachebat JA, McDonald JE. Genomic analysis of bacteria in the Acute Oak Decline pathobiome. Microb Genom 2019; 5. [PMID: 30625111 PMCID: PMC6412055 DOI: 10.1099/mgen.0.000240] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The UK’s native oak is under serious threat from Acute Oak Decline (AOD). Stem tissue necrosis is a primary symptom of AOD and several bacteria are associated with necrotic lesions. Two members of the lesion pathobiome, Brenneria goodwinii and Gibbsiella quercinecans, have been identified as causative agents of tissue necrosis. However, additional bacteria including Lonsdalea britannica and Rahnella species have been detected in the lesion microbiome, but their role in tissue degradation is unclear. Consequently, information on potential genome-encoded mechanisms for tissue necrosis is critical to understand the role and mechanisms used by bacterial members of the lesion pathobiome in the aetiology of AOD. Here, the whole genomes of bacteria isolated from AOD-affected trees were sequenced, annotated and compared against canonical bacterial phytopathogens and non-pathogenic symbionts. Using orthologous gene inference methods, shared virulence genes that retain the same function were identified. Furthermore, functional annotation of phytopathogenic virulence genes demonstrated that all studied members of the AOD lesion microbiota possessed genes associated with phytopathogens. However, the genome of B. goodwinii was the most characteristic of a necrogenic phytopathogen, corroborating previous pathological and metatranscriptomic studies that implicate it as the key causal agent of AOD lesions. Furthermore, we investigated the genome sequences of other AOD lesion microbiota to understand the potential ability of microbes to cause disease or contribute to pathogenic potential of organisms isolated from this complex pathobiome. The role of these members remains uncertain but some such as G. quercinecans may contribute to tissue necrosis through the release of necrotizing enzymes and may help more dangerous pathogens activate and realize their pathogenic potential or they may contribute as secondary/opportunistic pathogens with the potential to act as accessory species for B. goodwinii. We demonstrate that in combination with ecological data, whole genome sequencing provides key insights into the pathogenic potential of bacterial species whether they be phytopathogens, part-contributors or stimulators of the pathobiome.
Collapse
Affiliation(s)
- James Doonan
- 1School of Biological Sciences, Bangor University, Bangor, UK
| | - Sandra Denman
- 2Forest Research, Centre for Forestry and Climate Change, Farnham, UK
| | - Justin A Pachebat
- 3Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, UK
| | | |
Collapse
|
2
|
Barbosa E, Röttger R, Hauschild AC, de Castro Soares S, Böcker S, Azevedo V, Baumbach J. LifeStyle-Specific-Islands (LiSSI): Integrated Bioinformatics Platform for Genomic Island Analysis. J Integr Bioinform 2017; 14:/j/jib.2017.14.issue-2/jib-2017-0010/jib-2017-0010.xml. [PMID: 28678736 PMCID: PMC6042826 DOI: 10.1515/jib-2017-0010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2017] [Revised: 04/10/2017] [Accepted: 04/19/2017] [Indexed: 11/20/2022] Open
Abstract
Distinct bacteria are able to cope with highly diverse lifestyles; for instance, they can be free living or host-associated. Thus, these organisms must possess a large and varied genomic arsenal to withstand different environmental conditions. To facilitate the identification of genomic features that might influence bacterial adaptation to a specific niche, we introduce LifeStyle-Specific-Islands (LiSSI). LiSSI combines evolutionary sequence analysis with statistical learning (Random Forest with feature selection, model tuning and robustness analysis). In summary, our strategy aims to identify conserved consecutive homology sequences (islands) in genomes and to identify the most discriminant islands for each lifestyle.
Collapse
Affiliation(s)
- Eudes Barbosa
- University of Southern Denmark, Department of Mathematics and Computer Science, Odense, Denmark
- Federal University of Minas Gerais, Institute of Biological Sciences, Belo Horizonte, Brazil
| | - Richard Röttger
- University of Southern Denmark, Department of Mathematics and Computer Science, Odense, Denmark
| | - Anne-Christin Hauschild
- University of Southern Denmark, Department of Mathematics and Computer Science, Odense, Denmark
| | - Siomar de Castro Soares
- Federal University of Minas Gerais, Institute of Biological Sciences, Belo Horizonte, Brazil
- Federal University of Triângulo Mineiro, Department of Immunology, Microbiology and Parasitology, Uberaba, Brazil
| | - Sebastian Böcker
- Friedrich-Schiller-Universität Jena, Faculty of Mathematics and Computer Science, Jena, Germany
| | - Vasco Azevedo
- Federal University of Minas Gerais, Institute of Biological Sciences, Belo Horizonte, Brazil
| | - Jan Baumbach
- University of Southern Denmark, Department of Mathematics and Computer Science, Odense, Denmark
| |
Collapse
|
3
|
Neumann U, Genze N, Heider D. EFS: an ensemble feature selection tool implemented as R-package and web-application. BioData Min 2017; 10:21. [PMID: 28674556 PMCID: PMC5488355 DOI: 10.1186/s13040-017-0142-8] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Accepted: 06/12/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Feature selection methods aim at identifying a subset of features that improve the prediction performance of subsequent classification models and thereby also simplify their interpretability. Preceding studies demonstrated that single feature selection methods can have specific biases, whereas an ensemble feature selection has the advantage to alleviate and compensate for these biases. RESULTS The software EFS (Ensemble Feature Selection) makes use of multiple feature selection methods and combines their normalized outputs to a quantitative ensemble importance. Currently, eight different feature selection methods have been integrated in EFS, which can be used separately or combined in an ensemble. CONCLUSION EFS identifies relevant features while compensating specific biases of single methods due to an ensemble approach. Thereby, EFS can improve the prediction accuracy and interpretability in subsequent binary classification models. AVAILABILITY EFS can be downloaded as an R-package from CRAN or used via a web application at http://EFS.heiderlab.de.
Collapse
Affiliation(s)
- Ursula Neumann
- Straubing Center of Science, Schulgasse 22, Straubing, 94315 Germany.,University of Applied Science, Weihenstephan-Triesdorf, Freising, 85354 Germany.,Wissenschaftszentrum Weihenstephan, Technische Universität München, Freising, 85354 Germany
| | - Nikita Genze
- Straubing Center of Science, Schulgasse 22, Straubing, 94315 Germany
| | - Dominik Heider
- Straubing Center of Science, Schulgasse 22, Straubing, 94315 Germany.,University of Applied Science, Weihenstephan-Triesdorf, Freising, 85354 Germany.,Wissenschaftszentrum Weihenstephan, Technische Universität München, Freising, 85354 Germany
| |
Collapse
|
4
|
PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data. Sci Rep 2017; 7:39194. [PMID: 28051068 PMCID: PMC5209729 DOI: 10.1038/srep39194] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 11/18/2016] [Indexed: 12/20/2022] Open
Abstract
The reliable detection of novel bacterial pathogens from next-generation sequencing data is a key challenge for microbial diagnostics. Current computational tools usually rely on sequence similarity and often fail to detect novel species when closely related genomes are unavailable or missing from the reference database. Here we present the machine learning based approach PaPrBaG (Pathogenicity Prediction for Bacterial Genomes). PaPrBaG overcomes genetic divergence by training on a wide range of species with known pathogenicity phenotype. To that end we compiled a comprehensive list of pathogenic and non-pathogenic bacteria with human host, using various genome metadata in conjunction with a rule-based protocol. A detailed comparative study reveals that PaPrBaG has several advantages over sequence similarity approaches. Most importantly, it always provides a prediction whereas other approaches discard a large number of sequencing reads with low similarity to currently known reference genomes. Furthermore, PaPrBaG remains reliable even at very low genomic coverages. CombiningPaPrBaG with existing approaches further improves prediction results.
Collapse
|
5
|
Neumann U, Riemenschneider M, Sowa JP, Baars T, Kälsch J, Canbay A, Heider D. Compensation of feature selection biases accompanied with improved predictive performance for binary classification by using a novel ensemble feature selection approach. BioData Min 2016; 9:36. [PMID: 27891179 PMCID: PMC5116216 DOI: 10.1186/s13040-016-0114-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2016] [Accepted: 10/27/2016] [Indexed: 11/10/2022] Open
Abstract
MOTIVATION Biomarker discovery methods are essential to identify a minimal subset of features (e.g., serum markers in predictive medicine) that are relevant to develop prediction models with high accuracy. By now, there exist diverse feature selection methods, which either are embedded, combined, or independent of predictive learning algorithms. Many preceding studies showed the defectiveness of single feature selection results, which cause difficulties for professionals in a variety of fields (e.g., medical practitioners) to analyze and interpret the obtained feature subsets. Whereas each of these methods is highly biased, an ensemble feature selection has the advantage to alleviate and compensate for such biases. Concerning the reliability, validity, and reproducibility of these methods, we examined eight different feature selection methods for binary classification datasets and developed an ensemble feature selection system. RESULTS By using an ensemble of feature selection methods, a quantification of the importance of the features could be obtained. The prediction models that have been trained on the selected features showed improved prediction performance.
Collapse
Affiliation(s)
- Ursula Neumann
- Department of Bioinformatics, Straubing, 94315 Germany ; University of Applied Science, Weihenstephan-Triesdorf, Freising, 85354 Germany ; Wissenschaftszentrum Weihenstephan, Technische Universität München, Freising, 85354 Germany
| | - Mona Riemenschneider
- Department of Bioinformatics, Straubing, 94315 Germany ; University of Applied Science, Weihenstephan-Triesdorf, Freising, 85354 Germany
| | - Jan-Peter Sowa
- Department of Gastroenterology and Hepatology, University Hospital, University Duisburg-Essen, Essen, 45122 Germany
| | - Theodor Baars
- Clinic for Cardiology, West German Heart and Vascular Centre Essen, University Hospital, University Duisburg-Essen, Essen, 45122 Germany
| | - Julia Kälsch
- Department of Gastroenterology and Hepatology, University Hospital, University Duisburg-Essen, Essen, 45122 Germany
| | - Ali Canbay
- Department of Gastroenterology and Hepatology, University Hospital, University Duisburg-Essen, Essen, 45122 Germany
| | - Dominik Heider
- Department of Bioinformatics, Straubing, 94315 Germany ; University of Applied Science, Weihenstephan-Triesdorf, Freising, 85354 Germany ; Wissenschaftszentrum Weihenstephan, Technische Universität München, Freising, 85354 Germany
| |
Collapse
|
6
|
Soares SC, Geyik H, Ramos RT, de Sá PH, Barbosa EG, Baumbach J, Figueiredo HC, Miyoshi A, Tauch A, Silva A, Azevedo V. GIPSy: Genomic island prediction software. J Biotechnol 2016; 232:2-11. [DOI: 10.1016/j.jbiotec.2015.09.008] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2015] [Revised: 08/28/2015] [Accepted: 09/11/2015] [Indexed: 10/23/2022]
|
7
|
Martínez-García PM, López-Solanilla E, Ramos C, Rodríguez-Palenzuela P. Prediction of bacterial associations with plants using a supervised machine-learning approach. Environ Microbiol 2016; 18:4847-4861. [PMID: 27234490 DOI: 10.1111/1462-2920.13389] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2015] [Revised: 05/20/2016] [Accepted: 05/20/2016] [Indexed: 12/11/2022]
Abstract
Recent scenarios of fresh produce contamination by human enteric pathogens have resulted in severe food-borne outbreaks, and a new paradigm has emerged stating that some human-associated bacteria can use plants as secondary hosts. As a consequence, there has been growing concern in the scientific community about these interactions that have not yet been elucidated. Since this is a relatively new area, there is a lack of strategies to address the problem of food-borne illnesses due to the ingestion of fruits and vegetables. In the present study, we performed specific genome annotations to train a supervised machine-learning model that allows for the identification of plant-associated bacteria with a precision of ∼93%. The application of our method to approximately 9500 genomes predicted several unknown interactions between well-known human pathogens and plants, and it also confirmed several cases for which evidence has been reported. We observed that factors involved in adhesion, the deconstruction of the plant cell wall and detoxifying activities were highlighted as the most predictive features. The application of our strategy to sequenced strains that are involved in food poisoning can be used as a primary screening tool to determine the possible causes of contaminations.
Collapse
Affiliation(s)
- Pedro Manuel Martínez-García
- Área de Genética, Facultad de Ciencias, Instituto de Hortofruticultura Subtropical y Mediterránea 'La Mayora', Universidad de Málaga, Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), Málaga, E-29071, Spain.,Centro de Biotecnología y Genómica de Plantas (CBGP), Universidad Politécnica de Madrid-Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria, Parque Científico y Tecnológico de la Universidad Politécnica de Madrid. Campus de Montegancedo, Pozuelo de Alarcón, Madrid, 28223, Spain
| | - Emilia López-Solanilla
- Centro de Biotecnología y Genómica de Plantas (CBGP), Universidad Politécnica de Madrid-Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria, Parque Científico y Tecnológico de la Universidad Politécnica de Madrid. Campus de Montegancedo, Pozuelo de Alarcón, Madrid, 28223, Spain.,Departamento de Biología Vegetal. Escuela Técnica Superior de Ingenieros Agrónomos, Universidad Politécnica de Madrid, Avenida Complutense, 3, Madrid, 28040, Spain
| | - Cayo Ramos
- Área de Genética, Facultad de Ciencias, Instituto de Hortofruticultura Subtropical y Mediterránea 'La Mayora', Universidad de Málaga, Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), Málaga, E-29071, Spain
| | - Pablo Rodríguez-Palenzuela
- Centro de Biotecnología y Genómica de Plantas (CBGP), Universidad Politécnica de Madrid-Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria, Parque Científico y Tecnológico de la Universidad Politécnica de Madrid. Campus de Montegancedo, Pozuelo de Alarcón, Madrid, 28223, Spain.,Departamento de Biología Vegetal. Escuela Técnica Superior de Ingenieros Agrónomos, Universidad Politécnica de Madrid, Avenida Complutense, 3, Madrid, 28040, Spain
| |
Collapse
|
8
|
Genotypic Prediction of Co-receptor Tropism of HIV-1 Subtypes A and C. Sci Rep 2016; 6:24883. [PMID: 27126912 PMCID: PMC4850382 DOI: 10.1038/srep24883] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Accepted: 04/07/2016] [Indexed: 02/06/2023] Open
Abstract
Antiretroviral treatment of Human Immunodeficiency Virus type-1 (HIV-1) infections with CCR5-antagonists requires the co-receptor usage prediction of viral strains. Currently available tools are mostly designed based on subtype B strains and thus are in general not applicable to non-B subtypes. However, HIV-1 infections caused by subtype B only account for approximately 11% of infections worldwide. We evaluated the performance of several sequence-based algorithms for co-receptor usage prediction employed on subtype A V3 sequences including circulating recombinant forms (CRFs) and subtype C strains. We further analysed sequence profiles of gp120 regions of subtype A, B and C to explore functional relationships to entry phenotypes. Our analyses clearly demonstrate that state-of-the-art algorithms are not useful for predicting co-receptor tropism of subtype A and its CRFs. Sequence profile analysis of gp120 revealed molecular variability in subtype A viruses. Especially, the V2 loop region could be associated with co-receptor tropism, which might indicate a unique pattern that determines co-receptor tropism in subtype A strains compared to subtype B and C strains. Thus, our study demonstrates that there is a need for the development of novel algorithms facilitating tropism prediction of HIV-1 subtype A to improve effective antiretroviral treatment in patients.
Collapse
|
9
|
Riemenschneider M, Senge R, Neumann U, Hüllermeier E, Heider D. Exploiting HIV-1 protease and reverse transcriptase cross-resistance information for improved drug resistance prediction by means of multi-label classification. BioData Min 2016; 9:10. [PMID: 26933450 PMCID: PMC4772363 DOI: 10.1186/s13040-016-0089-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 02/20/2016] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Antiretroviral therapy is essential for human immunodeficiency virus (HIV) infected patients to inhibit viral replication and therewith to slow progression of disease and prolong a patient's life. However, the high mutation rate of HIV can lead to a fast adaptation of the virus under drug pressure and thereby to the evolution of resistant variants. In turn, these variants will lead to the failure of antiretroviral treatment. Moreover, these mutations cannot only lead to resistance against single drugs, but also to cross-resistance, i.e., resistance against drugs that have not yet been applied. METHODS 662 protease sequences and 715 reverse transcriptase sequences with complete resistance profiles were analyzed using machine learning techniques, namely binary relevance classifiers, classifier chains, and ensembles of classifier chains. RESULTS In our study, we applied multi-label classification models incorporating cross-resistance information to predict drug resistance for two of the major drug classes used in antiretroviral therapy for HIV-1, namely protease inhibitors (PIs) and non-nucleoside reverse transcriptase inhibitors (NNRTIs). By means of multi-label learning, namely classifier chains (CCs) and ensembles of classifier chains (ECCs), we were able to improve overall prediction accuracy for all drugs compared to hitherto applied binary classification models. CONCLUSIONS The development of fast and precise models to predict drug resistance in HIV-1 is highly important to enable a highly effective personalized therapy. Cross-resistance information can be exploited to improve prediction accuracy of computational drug resistance models.
Collapse
Affiliation(s)
- Mona Riemenschneider
- Department of Bioinformatics, Straubing Center of Science, Petersgasse 18, Straubing, 94315 Germany ; University of Applied Science Weihenstephan-Triesdorf, Am Hofgarten 4, Freising, 85354 Germany
| | - Robin Senge
- Department of Computer Science, University of Paderborn, Pohlweg 47, Paderborn, 33098 Germany
| | - Ursula Neumann
- Department of Bioinformatics, Straubing Center of Science, Petersgasse 18, Straubing, 94315 Germany ; Wissenschaftszentrum Weihenstephan, Technische Universität München, Alte Akademie 8, Freising, 85354 Germany ; University of Applied Science Weihenstephan-Triesdorf, Am Hofgarten 4, Freising, 85354 Germany
| | - Eyke Hüllermeier
- Department of Computer Science, University of Paderborn, Pohlweg 47, Paderborn, 33098 Germany
| | - Dominik Heider
- Department of Bioinformatics, Straubing Center of Science, Petersgasse 18, Straubing, 94315 Germany ; Wissenschaftszentrum Weihenstephan, Technische Universität München, Alte Akademie 8, Freising, 85354 Germany ; University of Applied Science Weihenstephan-Triesdorf, Am Hofgarten 4, Freising, 85354 Germany
| |
Collapse
|
10
|
Abstract
Essential genes are thought to encode proteins that carry out the basic functions to sustain a cellular life, and genomic islands (GIs) usually contain clusters of horizontally transferred genes. It has been assumed that essential genes are not likely to be located in GIs, but systematical analysis of essential genes in GIs has not been explored before. Here, we have analyzed the essential genes in 28 prokaryotes by statistical method and reached a conclusion that essential genes in GIs are significantly fewer than those outside GIs. The function of 362 essential genes found in GIs has been explored further by BLAST against the Virulence Factor Database (VFDB) and the phage/prophage sequence database of PHAge Search Tool (PHAST). Consequently, 64 and 60 eligible essential genes are found to share the sequence similarity with the virulence factors and phage/prophages-related genes, respectively. Meanwhile, we find several toxin-related proteins and repressors encoded by these essential genes in GIs. The comparative analysis of essential genes in genomic islands will not only shed new light on the development of the prediction algorithm of essential genes, but also give a clue to detect the functionality of essential genes in genomic islands.
Collapse
Affiliation(s)
- Xi Zhang
- Department of Physics, Tianjin University, Tianjin 300072, China
| | - Chong Peng
- Department of Physics, Tianjin University, Tianjin 300072, China
| | - Ge Zhang
- Department of Physics, Tianjin University, Tianjin 300072, China
| | - Feng Gao
- 1] Department of Physics, Tianjin University, Tianjin 300072, China [2] Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China [3] SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering, Tianjin 300072, China
| |
Collapse
|