1
|
Galperin MY, Koonin EV. Comparative Genomics Approaches to Identifying Functionally Related Genes. ALGORITHMS FOR COMPUTATIONAL BIOLOGY 2014. [DOI: 10.1007/978-3-319-07953-0_1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
2
|
Affiliation(s)
- George P Tegos
- Center for Molecular Discovery; University of New Mexico; Albuquerque, NM USA; Department of Pathology; University of New Mexico; Albuquerque, NM USA; Wellman Center for Photomedicine; Massachusetts General Hospital; Boston, MA USA; Department of Dermatology; Harvard Medical School; Boston, MA USA
| |
Collapse
|
3
|
Valdivia-Granda WA. Biodefense Oriented Genomic-Based Pathogen Classification Systems: Challenges and Opportunities. ACTA ACUST UNITED AC 2012; 3:1000113. [PMID: 25587492 PMCID: PMC4289626 DOI: 10.4172/2157-2526.1000113] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Countermeasures that will effectively prevent or diminish the impact of a biological attack will depend on the rapid and accurate generation and analysis of genomic information. Because of their increasing level of sensitivity, rapidly decreasing cost, and their ability to effectively interrogate the genomes of previously unknown organisms, Next Generation Sequencing (NGS) technologies are revolutionizing the biological sciences. However, the exponential accumulation microbial data is equally outpacing the computational performance of existing analytical tools in their ability to translate DNA information into reliable detection, prophylactic and therapeutic countermeasures. It is now evident that the bottleneck for next-generation sequence data analysis will not be solved simply by scaling up our computational resources, but rather accomplished by implementing novel biodefense-oriented algorithms that overcome exiting vulnerabilities of speed, sensitivity and accuracy. Considering these circumstances, this document highlights the challenges and opportunities that biodefense stakeholders must consider in order to exploit more efficiently genomic information and translate this data into integrated countermeasures. The document overviews different genome analysis methods and explains concepts of DNA fingerprints, motif fingerprints, genomic barcodes and genomic signatures. A series of recommendations to promote genomics and bioinformatics as an effective form of deterrence and a valuable scientific platform for rapid technological insertion of detection, prophylactic, therapeutic countermeasures are discussed.
Collapse
|
4
|
Kumar K, Desai V, Cheng L, Khitrov M, Grover D, Satya RV, Yu C, Zavaljevski N, Reifman J. AGeS: a software system for microbial genome sequence annotation. PLoS One 2011; 6:e17469. [PMID: 21408217 PMCID: PMC3049762 DOI: 10.1371/journal.pone.0017469] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2010] [Accepted: 02/01/2011] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND The annotation of genomes from next-generation sequencing platforms needs to be rapid, high-throughput, and fully integrated and automated. Although a few Web-based annotation services have recently become available, they may not be the best solution for researchers that need to annotate a large number of genomes, possibly including proprietary data, and store them locally for further analysis. To address this need, we developed a standalone software application, the Annotation of microbial Genome Sequences (AGeS) system, which incorporates publicly available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. METHODOLOGY The AGeS system supports three main capabilities. The first is the storage of input contig sequences and the resulting annotation data in a central, customized database. The second is the annotation of microbial genomes using an integrated software pipeline, which first analyzes contigs from high-throughput sequencing by locating genomic regions that code for proteins, RNA, and other genomic elements through the Do-It-Yourself Annotation (DIYA) framework. The identified protein-coding regions are then functionally annotated using the in-house-developed Pipeline for Protein Annotation (PIPA). The third capability is the visualization of annotated sequences using GBrowse. To date, we have implemented these capabilities for bacterial genomes. AGeS was evaluated by comparing its genome annotations with those provided by three other methods. Our results indicate that the software tools integrated into AGeS provide annotations that are in general agreement with those provided by the compared methods. This is demonstrated by a >94% overlap in the number of identified genes, a significant number of identical annotated features, and a >90% agreement in enzyme function predictions.
Collapse
Affiliation(s)
- Kamal Kumar
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Valmik Desai
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Li Cheng
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Maxim Khitrov
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Deepak Grover
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Ravi Vijaya Satya
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Chenggang Yu
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Nela Zavaljevski
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Jaques Reifman
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
- * E-mail:
| |
Collapse
|
5
|
Sintchenko V. Informatics for Infectious Disease Research and Control. INFECTIOUS DISEASE INFORMATICS 2010. [PMCID: PMC7120928 DOI: 10.1007/978-1-4419-1327-2_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The goal of infectious disease informatics is to optimize the clinical and public health management of infectious diseases through improvements in the development and use of antimicrobials, the design of more effective vaccines, the identification of biomarkers for life-threatening infections, a better understanding of host-pathogen interactions, and biosurveillance and clinical decision support. Infectious disease informatics can lead to more targeted and effective approaches for the prevention, diagnosis and treatment of infections through a comprehensive review of the genetic repertoire and metabolic profiles of a pathogen. The developments in informatics have been critical in boosting the translational science and in supporting both reductionist and integrative research paradigms.
Collapse
|
6
|
Rissman AI, Mau B, Biehl BS, Darling AE, Glasner JD, Perna NT. Reordering contigs of draft genomes using the Mauve aligner. ACTA ACUST UNITED AC 2009; 25:2071-3. [PMID: 19515959 PMCID: PMC2723005 DOI: 10.1093/bioinformatics/btp356] [Citation(s) in RCA: 423] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Summary: Mauve Contig Mover provides a new method for proposing the relative order of contigs that make up a draft genome based on comparison to a complete or draft reference genome. A novel application of the Mauve aligner and viewer provides an automated reordering algorithm coupled with a powerful drill-down display allowing detailed exploration of results. Availability: The software is available for download at http://gel.ahabs.wisc.edu/mauve. Contact:rissman@wisc.edu Supplementary information:Supplementary data are available at Bioinformatics online and http://gel.ahabs.wisc.edu
Collapse
Affiliation(s)
- Anna I Rissman
- Genome Evolution Laboratory, University of Wisconsin-Madison, 425G Henry Mall Suite 4400V, Madison, WI 53706, USA.
| | | | | | | | | | | |
Collapse
|
7
|
Zaremba S, Ramos-Santacruz M, Hampton T, Shetty P, Fedorko J, Whitmore J, Greene JM, Perna NT, Glasner JD, Plunkett G, Shaker M, Pot D. Text-mining of PubMed abstracts by natural language processing to create a public knowledge base on molecular mechanisms of bacterial enteropathogens. BMC Bioinformatics 2009; 10:177. [PMID: 19515247 PMCID: PMC2704210 DOI: 10.1186/1471-2105-10-177] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2008] [Accepted: 06/10/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Enteropathogen Resource Integration Center (ERIC; http://www.ericbrc.org) has a goal of providing bioinformatics support for the scientific community researching enteropathogenic bacteria such as Escherichia coli and Salmonella spp. Rapid and accurate identification of experimental conclusions from the scientific literature is critical to support research in this field. Natural Language Processing (NLP), and in particular Information Extraction (IE) technology, can be a significant aid to this process. DESCRIPTION We have trained a powerful, state-of-the-art IE technology on a corpus of abstracts from the microbial literature in PubMed to automatically identify and categorize biologically relevant entities and predicative relations. These relations include: Genes/Gene Products and their Roles; Gene Mutations and the resulting Phenotypes; and Organisms and their associated Pathogenicity. Evaluations on blind datasets show an F-measure average of greater than 90% for entities (genes, operons, etc.) and over 70% for relations (gene/gene product to role, etc). This IE capability, combined with text indexing and relational database technologies, constitute the core of our recently deployed text mining application. CONCLUSION Our Text Mining application is available online on the ERIC website (http://www.ericbrc.org/portal/eric/articles). The information retrieval interface displays a list of recently published enteropathogen literature abstracts, and also provides a search interface to execute custom queries by keyword, date range, etc. Upon selection, processed abstracts and the entities and relations extracted from them are retrieved from a relational database and marked up to highlight the entities and relations. The abstract also provides links from extracted genes and gene products to the ERIC Annotations database, thus providing access to comprehensive genomic annotations and adding value to both the text-mining and annotations systems.
Collapse
Affiliation(s)
- Sam Zaremba
- ERIC-BRC, SRA International Inc, Global Health Sector, Rockville MD, 20852, USA
| | | | | | - Panna Shetty
- ERIC-BRC, SRA International Inc, Global Health Sector, Rockville MD, 20852, USA
| | - Joel Fedorko
- ERIC-BRC, SRA International Inc, Global Health Sector, Rockville MD, 20852, USA
| | - Jon Whitmore
- ERIC-BRC, SRA International Inc, Global Health Sector, Rockville MD, 20852, USA
| | - John M Greene
- ERIC-BRC, SRA International Inc, Global Health Sector, Rockville MD, 20852, USA
| | - Nicole T Perna
- Genome Center, University of Wisconsin, Madison WI, 53706, USA
- Laboratory of Genetics, University of Wisconsin, Madison WI, 53706, USA
| | | | - Guy Plunkett
- Laboratory of Genetics, University of Wisconsin, Madison WI, 53706, USA
| | - Matthew Shaker
- ERIC-BRC, SRA International Inc, Global Health Sector, Rockville MD, 20852, USA
| | - David Pot
- ERIC-BRC, SRA International Inc, Global Health Sector, Rockville MD, 20852, USA
| |
Collapse
|
8
|
Lindeberg M, Biehl BS, Glasner JD, Perna NT, Collmer A, Collmer CW. Gene Ontology annotation highlights shared and divergent pathogenic strategies of type III effector proteins deployed by the plant pathogen Pseudomonas syringae pv tomato DC3000 and animal pathogenic Escherichia coli strains. BMC Microbiol 2009; 9 Suppl 1:S4. [PMID: 19278552 PMCID: PMC2654664 DOI: 10.1186/1471-2180-9-s1-s4] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Genome-informed identification and characterization of Type III effector repertoires in various bacterial strains and species is revealing important insights into the critical roles that these proteins play in the pathogenic strategies of diverse bacteria. However, non-systematic discipline-specific approaches to their annotation impede analysis of the accumulating wealth of data and inhibit easy communication of findings among researchers working on different experimental systems. The development of Gene Ontology (GO) terms to capture biological processes occurring during the interaction between organisms creates a common language that facilitates cross-genome analyses. The application of these terms to annotate type III effector genes in different bacterial species – the plant pathogen Pseudomonas syringae pv tomato DC3000 and animal pathogenic strains of Escherichia coli – illustrates how GO can effectively describe fundamental similarities and differences among different gene products deployed as part of diverse pathogenic strategies. In depth descriptions of the GO annotations for P. syringae pv tomato DC3000 effector AvrPtoB and the E. coli effector Tir are described, with special emphasis given to GO capability for capturing information about interacting proteins and taxa. GO-highlighted similarities in biological process and molecular function for effectors from additional pathosystems are also discussed.
Collapse
Affiliation(s)
- Magdalen Lindeberg
- Department of Plant Pathology, Cornell University, Ithaca, NY 14850, USA.
| | | | | | | | | | | |
Collapse
|
9
|
Genetics and environmental regulation of Shigella iron transport systems. Biometals 2009; 22:43-51. [PMID: 19130265 DOI: 10.1007/s10534-008-9188-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2008] [Accepted: 12/07/2008] [Indexed: 10/21/2022]
Abstract
Shigella spp. have transport systems for both ferric and ferrous iron. The iron can be taken up as free iron or complexed to a variety of carriers. All Shigella species have both the Feo and Sit systems for acquisition of ferrous iron, and all have at least one siderophore-mediated system for transport of ferric iron. Several of the transport systems, including Sit, Iuc/IutA (aerobactin synthesis and transport), Fec (ferric di-citrate uptake), and Shu (heme transport) are encoded within pathogenicity islands. The presence and the genomic locations of these islands vary considerably among the Shigella species, and even between isolates of the same species. The expression of the iron transport systems is influenced by the concentration of iron and by environmental conditions including the level of oxygen. ArcA and FNR regulate iron transport gene expression as a function of oxygen tension, with the sit and iuc promoters being highly expressed in aerobic conditions, while the feo ferrous iron transporter promoter is most active under anaerobic conditions. The effects of oxygen are also seen in infection of cultured cells by Shigella flexneri; the Sit and Iuc systems support plaque formation under aerobic conditions, whereas Feo allows plaque formation anaerobically.
Collapse
|
10
|
Winsor GL, Van Rossum T, Lo R, Khaira B, Whiteside MD, Hancock REW, Brinkman FSL. Pseudomonas Genome Database: facilitating user-friendly, comprehensive comparisons of microbial genomes. Nucleic Acids Res 2008; 37:D483-8. [PMID: 18978025 PMCID: PMC2686508 DOI: 10.1093/nar/gkn861] [Citation(s) in RCA: 193] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Pseudomonas aeruginosa is a well-studied opportunistic pathogen that is particularly known for its intrinsic antimicrobial resistance, diverse metabolic capacity, and its ability to cause life threatening infections in cystic fibrosis patients. The Pseudomonas Genome Database (http://www.pseudomonas.com) was originally developed as a resource for peer-reviewed, continually updated annotation for the Pseudomonas aeruginosa PAO1 reference strain genome. In order to facilitate cross-strain and cross-species genome comparisons with other Pseudomonas species of importance, we have now expanded the database capabilities to include all Pseudomonas species, and have developed or incorporated methods to facilitate high quality comparative genomics. The database contains robust assessment of orthologs, a novel ortholog clustering method, and incorporates five views of the data at the sequence and annotation levels (Gbrowse, Mauve and custom views) to facilitate genome comparisons. A choice of simple and more flexible user-friendly Boolean search features allows researchers to search and compare annotations or sequences within or between genomes. Other features include more accurate protein subcellular localization predictions and a user-friendly, Boolean searchable log file of updates for the reference strain PAO1. This database aims to continue to provide a high quality, annotated genome resource for the research community and is available under an open source license.
Collapse
Affiliation(s)
- Geoffrey L Winsor
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | | | | | | | | | | | | |
Collapse
|
11
|
Winsor GL, Khaira B, Van Rossum T, Lo R, Whiteside MD, Brinkman FSL. The Burkholderia Genome Database: facilitating flexible queries and comparative analyses. Bioinformatics 2008; 24:2803-4. [PMID: 18842600 PMCID: PMC2639269 DOI: 10.1093/bioinformatics/btn524] [Citation(s) in RCA: 199] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Summary: As the genome sequences of multiple strains of a given bacterial species are obtained, more generalized bacterial genome databases may be complemented by databases that are focused on providing more information geared for a distinct bacterial phylogenetic group and its associated research community. The Burkholderia Genome Database represents a model for such a database, providing a powerful, user-friendly search and comparative analysis interface that contains features not found in other genome databases. It contains continually updated, curated and tracked information about Burkholderia cepacia complex genome annotations, plus other Burkholderia species genomes for comparison, providing a high-quality resource for its targeted cystic fibrosis research community. Availability:http://www.burkholderia.com. Source code: GNU GPL. Contact:brinkman@sfu.ca.
Collapse
Affiliation(s)
- Geoffrey L Winsor
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada V5A 1S6
| | | | | | | | | | | |
Collapse
|
12
|
Angiuoli SV, Gussman A, Klimke W, Cochrane G, Field D, Garrity G, Kodira CD, Kyrpides N, Madupu R, Markowitz V, Tatusova T, Thomson N, White O. Toward an online repository of Standard Operating Procedures (SOPs) for (meta)genomic annotation. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2008; 12:137-41. [PMID: 18416670 PMCID: PMC3196215 DOI: 10.1089/omi.2008.0017] [Citation(s) in RCA: 548] [Impact Index Per Article: 32.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The methodologies used to generate genome and metagenome annotations are diverse and vary between groups and laboratories. Descriptions of the annotation process are helpful in interpreting genome annotation data. Some groups have produced Standard Operating Procedures (SOPs) that describe the annotation process, but standards are lacking for structure and content of these descriptions. In addition, there is no central repository to store and disseminate procedures and protocols for genome annotation. We highlight the importance of SOPs for genome annotation and endorse an online repository of SOPs.
Collapse
Affiliation(s)
- Samuel V Angiuoli
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland 21201, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Cardinale CJ, Washburn RS, Tadigotla VR, Brown LM, Gottesman ME, Nudler E. Termination factor Rho and its cofactors NusA and NusG silence foreign DNA in E. coli. Science 2008; 320:935-8. [PMID: 18487194 DOI: 10.1126/science.1152763] [Citation(s) in RCA: 239] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Transcription of the bacterial genome by the RNA polymerase must terminate at specific points. Transcription can be terminated by Rho factor, an essential protein in enterobacteria. We used the antibiotic bicyclomycin, which inhibits Rho, to assess its role on a genome-wide scale. Rho is revealed as a global regulator of gene expression that matches Escherichia coli transcription to translational needs. We also found that genes in E. coli that are most repressed by Rho are prophages and other horizontally acquired portions of the genome. Elimination of these foreign DNA elements increases resistance to bicyclomycin. Although rho remains essential, such reduced-genome bacteria no longer require Rho cofactors NusA and NusG. Deletion of the cryptic rac prophage in wild-type E. coli increases bicyclomycin resistance and permits deletion of nusG. Thus, Rho termination, supported by NusA and NusG, is required to suppress the toxic activity of foreign genes.
Collapse
|