1
|
Petrova B, Kanarek N. Potential Benefits and Pitfalls of Histidine Supplementation for Cancer Therapy Enhancement. J Nutr 2020; 150:2580S-2587S. [PMID: 33000153 DOI: 10.1093/jn/nxaa132] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 02/27/2020] [Accepted: 04/15/2020] [Indexed: 12/31/2022] Open
Abstract
Dietary supplementation of the amino acid histidine has demonstrable benefits in various clinical conditions. Recent work in a pediatric leukemia mouse model exposed a surprising potential application of histidine supplementation for cancer therapy enhancement. These findings demand a deeper reassessment of the physiological effects and potential drawbacks of histidine supplementation. As pertinent to this question, we discuss the safety of high doses of histidine and its relevant metabolic fates in the human body. We refrain from recommendations or final conclusions because comprehensive preclinical evidence for safety and efficacy of histidine supplementation is still lacking. However, we emphasize the incentive to study the safety of histidine supplementation and its potential to improve the clinical outcome of pediatric blood cancers through a simple dietary supplementation. The need for comprehensive preclinical testing of histidine supplementation in healthy and tumor-bearing mice is fundamental, and we hope that this review will facilitate such studies.
Collapse
Affiliation(s)
- Boryana Petrova
- Department of Pathology, Boston Children's Hospital, Boston, MA, USA
| | - Naama Kanarek
- Department of Pathology, Boston Children's Hospital, Boston, MA, USA.,Harvard Medical School, Boston, MA, USA.,The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| |
Collapse
|
2
|
Histidine catabolism is a major determinant of methotrexate sensitivity. Nature 2018; 559:632-636. [PMID: 29995852 PMCID: PMC6082631 DOI: 10.1038/s41586-018-0316-7] [Citation(s) in RCA: 207] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2017] [Accepted: 06/06/2018] [Indexed: 02/02/2023]
Abstract
The chemotherapeutic drug methotrexate inhibits the enzyme DHFR (dihydrofolate reductase)1, which generates tetrahydrofolate (THF), an essential cofactor in nucleotide synthesis2. Depletion of THF causes cell death by suppressing DNA and RNA production3. While methotrexate is widely used as an anti-cancer agent and the subject of over a thousand ongoing clinical trials4, its high toxicity often leads to the premature termination of its use, diminishing its potential efficacy5. To identify genes that modulate the response of cancer cells to methotrexate, we performed a CRISPR/Cas9-based screen6,7. This screen yielded FTCD, which encodes an enzyme (formimidoyltransferase cyclodeaminase) needed for the catabolism of the amino acid histidine8, a process not previously linked to methotrexate sensitivity. In cultured cancer cells, depletion of multiple genes in the histidine catabolism pathway dramatically decreased sensitivity to methotrexate. Mechanistically, histidine catabolism drains the cellular pool of THF, which is particularly detrimental to methotrexate-treated cells. Moreover, expression of the rate-limiting enzyme in histidine catabolism is associated with methotrexate sensitivity in cancer cell lines and with survival rate in patients. In vivo dietary supplementation of histidine increased flux through the histidine degradation pathway and enhanced the sensitivity of leukemia xenografts to methotrexate. Thus, the histidine degradation pathway significantly influences the sensitivity of cancer cells to methotrexate and may be exploited to improve methotrexate efficacy through a simple dietary intervention.
Collapse
|
3
|
Zhang C, Freddolino PL, Zhang Y. COFACTOR: improved protein function prediction by combining structure, sequence and protein-protein interaction information. Nucleic Acids Res 2017; 45:W291-W299. [PMID: 28472402 PMCID: PMC5793808 DOI: 10.1093/nar/gkx366] [Citation(s) in RCA: 381] [Impact Index Per Article: 54.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2017] [Revised: 04/09/2017] [Accepted: 04/21/2017] [Indexed: 12/22/2022] Open
Abstract
The COFACTOR web server is a unified platform for structure-based multiple-level protein function predictions. By structurally threading low-resolution structural models through the BioLiP library, the COFACTOR server infers three categories of protein functions including gene ontology, enzyme commission and ligand-binding sites from various analogous and homologous function templates. Here, we report recent improvements of the COFACTOR server in the development of new pipelines to infer functional insights from sequence profile alignments and protein-protein interaction networks. Large-scale benchmark tests show that the new hybrid COFACTOR approach significantly improves the function annotation accuracy of the former structure-based pipeline and other state-of-the-art functional annotation methods, particularly for targets that have no close homology templates. The updated COFACTOR server and the template libraries are available at http://zhanglab.ccmb.med.umich.edu/COFACTOR/.
Collapse
Affiliation(s)
- Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Peter L. Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
4
|
Feng Q, Gui Y, Yang Z, Wang L, Li Y. Semisupervised Learning Based Disease-Symptom and Symptom-Therapeutic Substance Relation Extraction from Biomedical Literature. BIOMED RESEARCH INTERNATIONAL 2016; 2016:3594937. [PMID: 27822473 PMCID: PMC5086401 DOI: 10.1155/2016/3594937] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2016] [Revised: 07/13/2016] [Accepted: 08/18/2016] [Indexed: 11/18/2022]
Abstract
With the rapid growth of biomedical literature, a large amount of knowledge about diseases, symptoms, and therapeutic substances hidden in the literature can be used for drug discovery and disease therapy. In this paper, we present a method of constructing two models for extracting the relations between the disease and symptom and symptom and therapeutic substance from biomedical texts, respectively. The former judges whether a disease causes a certain physiological phenomenon while the latter determines whether a substance relieves or eliminates a certain physiological phenomenon. These two kinds of relations can be further utilized to extract the relations between disease and therapeutic substance. In our method, first two training sets for extracting the relations between the disease-symptom and symptom-therapeutic substance are manually annotated and then two semisupervised learning algorithms, that is, Co-Training and Tri-Training, are applied to utilize the unlabeled data to boost the relation extraction performance. Experimental results show that exploiting the unlabeled data with both Co-Training and Tri-Training algorithms can enhance the performance effectively.
Collapse
Affiliation(s)
- Qinlin Feng
- College of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Yingyi Gui
- School of Optoelectronics, Beijing Institute of Technology, Beijing 100081, China
| | - Zhihao Yang
- College of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Lei Wang
- Beijing Institute of Health Administration and Medical Information, Beijing 100850, China
| | - Yuxia Li
- Beijing Institute of Health Administration and Medical Information, Beijing 100850, China
| |
Collapse
|
5
|
Hua L, Quan C. A Shortest Dependency Path Based Convolutional Neural Network for Protein-Protein Relation Extraction. BIOMED RESEARCH INTERNATIONAL 2016; 2016:8479587. [PMID: 27493967 PMCID: PMC4963603 DOI: 10.1155/2016/8479587] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Revised: 06/04/2016] [Accepted: 06/15/2016] [Indexed: 12/15/2022]
Abstract
The state-of-the-art methods for protein-protein interaction (PPI) extraction are primarily based on kernel methods, and their performances strongly depend on the handcraft features. In this paper, we tackle PPI extraction by using convolutional neural networks (CNN) and propose a shortest dependency path based CNN (sdpCNN) model. The proposed method (1) only takes the sdp and word embedding as input and (2) could avoid bias from feature selection by using CNN. We performed experiments on standard Aimed and BioInfer datasets, and the experimental results demonstrated that our approach outperformed state-of-the-art kernel based methods. In particular, by tracking the sdpCNN model, we find that sdpCNN could extract key features automatically and it is verified that pretrained word embedding is crucial in PPI task.
Collapse
Affiliation(s)
- Lei Hua
- Department of Computer and Information Sciences, Hefei University of Technology, Hefei 230009, China
| | - Chanqin Quan
- Department of Computer and Information Sciences, Kobe University, Kobe 6578501, Japan
| |
Collapse
|
6
|
Supervised Learning Based Hypothesis Generation from Biomedical Literature. BIOMED RESEARCH INTERNATIONAL 2015; 2015:698527. [PMID: 26380291 PMCID: PMC4561867 DOI: 10.1155/2015/698527] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/16/2015] [Revised: 04/12/2015] [Accepted: 05/24/2015] [Indexed: 11/18/2022]
Abstract
Nowadays, the amount of biomedical literatures is growing at an explosive speed, and there is much useful knowledge undiscovered in this literature. Researchers can form biomedical hypotheses through mining these works. In this paper, we propose a supervised learning based approach to generate hypotheses from biomedical literature. This approach splits the traditional processing of hypothesis generation with classic ABC model into AB model and BC model which are constructed with supervised learning method. Compared with the concept cooccurrence and grammar engineering-based approaches like SemRep, machine learning based models usually can achieve better performance in information extraction (IE) from texts. Then through combining the two models, the approach reconstructs the ABC model and generates biomedical hypotheses from literature. The experimental results on the three classic Swanson hypotheses show that our approach outperforms SemRep system.
Collapse
|
7
|
A Relation Extraction Framework for Biomedical Text Using Hybrid Feature Set. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015; 2015:910423. [PMID: 26347797 PMCID: PMC4546954 DOI: 10.1155/2015/910423] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2015] [Revised: 06/17/2015] [Accepted: 06/29/2015] [Indexed: 11/27/2022]
Abstract
The information extraction from unstructured text segments is a complex task. Although manual information extraction often produces the best results, it is harder to manage biomedical data extraction manually because of the exponential increase in data size. Thus, there is a need for automatic tools and techniques for information extraction in biomedical text mining. Relation extraction is a significant area under biomedical information extraction that has gained much importance in the last two decades. A lot of work has been done on biomedical relation extraction focusing on rule-based and machine learning techniques. In the last decade, the focus has changed to hybrid approaches showing better results. This research presents a hybrid feature set for classification of relations between biomedical entities. The main contribution of this research is done in the semantic feature set where verb phrases are ranked using Unified Medical Language System (UMLS) and a ranking algorithm. Support Vector Machine and Naïve Bayes, the two effective machine learning techniques, are used to classify these relations. Our approach has been validated on the standard biomedical text corpus obtained from MEDLINE 2001. Conclusively, it can be articulated that our framework outperforms all state-of-the-art approaches used for relation extraction on the same corpus.
Collapse
|
8
|
Oden S, Brocchieri L. Quantitative frame analysis and the annotation of GC-rich (and other) prokaryotic genomes. An application to Anaeromyxobacter dehalogenans. Bioinformatics 2015; 31:3254-61. [PMID: 26048600 PMCID: PMC4595893 DOI: 10.1093/bioinformatics/btv339] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2015] [Accepted: 05/28/2015] [Indexed: 11/14/2022] Open
Abstract
Motivation: Graphical representations of contrasts in GC usage among codon frame positions (frame analysis) provide evidence of genes missing from the annotations of prokaryotic genomes of high GC content but the qualitative approach of visual frame analysis prevents its applicability on a genomic scale. Results: We developed two quantitative methods for the identification and statistical characterization in sequence regions of three-base periodicity (hits) associated with open reading frame structures. The methods were implemented in the N-Profile Analysis Computational Tool (NPACT), which highlights in graphical representations inconsistencies between newly identified ORFs and pre-existing annotations of coding-regions. We applied the NPACT procedures to two recently annotated strains of the deltaproteobacterium Anaeromyxobacter dehalogenans, identifying in both genomes numerous conserved ORFs not included in the published annotation of coding regions. Availability and implementation: NPACT is available as a web-based service and for download at http://genome.ufl.edu/npact. Contact:lucianob@ufl.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Steve Oden
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32610, USA and Genetics Institute, University of Florida, Gainesville, FL 32610, USA
| | - Luciano Brocchieri
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32610, USA and Genetics Institute, University of Florida, Gainesville, FL 32610, USA
| |
Collapse
|
9
|
Arora PK, Bae H. Integration of bioinformatics to biodegradation. Biol Proced Online 2014; 16:8. [PMID: 24808763 PMCID: PMC4012781 DOI: 10.1186/1480-9222-16-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2014] [Accepted: 04/19/2014] [Indexed: 12/22/2022] Open
Abstract
Bioinformatics and biodegradation are two primary scientific fields in applied microbiology and biotechnology. The present review describes development of various bioinformatics tools that may be applied in the field of biodegradation. Several databases, including the University of Minnesota Biocatalysis/Biodegradation database (UM-BBD), a database of biodegradative oxygenases (OxDBase), Biodegradation Network-Molecular Biology Database (Bionemo) MetaCyc, and BioCyc have been developed to enable access to information related to biochemistry and genetics of microbial degradation. In addition, several bioinformatics tools for predicting toxicity and biodegradation of chemicals have been developed. Furthermore, the whole genomes of several potential degrading bacteria have been sequenced and annotated using bioinformatics tools.
Collapse
Affiliation(s)
- Pankaj Kumar Arora
- School of Biotechnology, Yeungnam University, Gyeongsan 712-749, Republic of Korea
| | - Hanhong Bae
- School of Biotechnology, Yeungnam University, Gyeongsan 712-749, Republic of Korea
| |
Collapse
|
10
|
Kubiak K, Kurzawa M, Jędrzejczak-Krzepkowska M, Ludwicka K, Krawczyk M, Migdalski A, Kacprzak MM, Loska D, Krystynowicz A, Bielecki S. Complete genome sequence of Gluconacetobacter xylinus E25 strain—Valuable and effective producer of bacterial nanocellulose. J Biotechnol 2014; 176:18-9. [DOI: 10.1016/j.jbiotec.2014.02.006] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 02/07/2014] [Indexed: 10/25/2022]
|
11
|
Memišević V, Zavaljevski N, Pieper R, Rajagopala SV, Kwon K, Townsend K, Yu C, Yu X, DeShazer D, Reifman J, Wallqvist A. Novel Burkholderia mallei virulence factors linked to specific host-pathogen protein interactions. Mol Cell Proteomics 2013; 12:3036-51. [PMID: 23800426 PMCID: PMC3820922 DOI: 10.1074/mcp.m113.029041] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2013] [Revised: 06/10/2013] [Indexed: 11/09/2022] Open
Abstract
Burkholderia mallei is an infectious intracellular pathogen whose virulence and resistance to antibiotics makes it a potential bioterrorism agent. Given its genetic origin as a commensal soil organism, it is equipped with an extensive and varied set of adapted mechanisms to cope with and modulate host-cell environments. One essential virulence mechanism constitutes the specialized secretion systems that are designed to penetrate host-cell membranes and insert pathogen proteins directly into the host cell's cytosol. However, the secretion systems' proteins and, in particular, their host targets are largely uncharacterized. Here, we used a combined in silico, in vitro, and in vivo approach to identify B. mallei proteins required for pathogenicity. We used bioinformatics tools, including orthology detection and ab initio predictions of secretion system proteins, as well as published experimental Burkholderia data to initially select a small number of proteins as putative virulence factors. We then used yeast two-hybrid assays against normalized whole human and whole murine proteome libraries to detect and identify interactions among each of these bacterial proteins and host proteins. Analysis of such interactions provided both verification of known virulence factors and identification of three new putative virulence proteins. We successfully created insertion mutants for each of these three proteins using the virulent B. mallei ATCC 23344 strain. We exposed BALB/c mice to mutant strains and the wild-type strain in an aerosol challenge model using lethal B. mallei doses. In each set of experiments, mice exposed to mutant strains survived for the 21-day duration of the experiment, whereas mice exposed to the wild-type strain rapidly died. Given their in vivo role in pathogenicity, and based on the yeast two-hybrid interaction data, these results point to the importance of these pathogen proteins in modulating host ubiquitination pathways, phagosomal escape, and actin-cytoskeleton rearrangement processes.
Collapse
Affiliation(s)
- Vesna Memišević
- From the ‡Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland 21702
| | - Nela Zavaljevski
- From the ‡Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland 21702
| | | | | | - Keehwan Kwon
- §J. Craig Venter Institute, Rockville, Maryland 20850
| | | | - Chenggang Yu
- From the ‡Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland 21702
| | - Xueping Yu
- From the ‡Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland 21702
| | - David DeShazer
- ¶Bacteriology Division, U.S. Army Medical Research Institute of Infectious Diseases, Fort Detrick, Maryland 21702
| | - Jaques Reifman
- From the ‡Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland 21702
| | - Anders Wallqvist
- From the ‡Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland 21702
| |
Collapse
|
12
|
Roy A, Yang J, Zhang Y. COFACTOR: an accurate comparative algorithm for structure-based protein function annotation. Nucleic Acids Res 2012; 40:W471-7. [PMID: 22570420 PMCID: PMC3394312 DOI: 10.1093/nar/gks372] [Citation(s) in RCA: 460] [Impact Index Per Article: 38.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We have developed a new COFACTOR webserver for automated structure-based protein function annotation. Starting from a structural model, given by either experimental determination or computational modeling, COFACTOR first identifies template proteins of similar folds and functional sites by threading the target structure through three representative template libraries that have known protein-ligand binding interactions, Enzyme Commission number or Gene Ontology terms. The biological function insights in these three aspects are then deduced from the functional templates, the confidence of which is evaluated by a scoring function that combines both global and local structural similarities. The algorithm has been extensively benchmarked by large-scale benchmarking tests and demonstrated significant advantages compared to traditional sequence-based methods. In the recent community-wide CASP9 experiment, COFACTOR was ranked as the best method for protein-ligand binding site predictions. The COFACTOR sever and the template libraries are freely available at http://zhanglab.ccmb.med.umich.edu/COFACTOR.
Collapse
Affiliation(s)
- Ambrish Roy
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109-2218, USA
| | | | | |
Collapse
|
13
|
Abstract
With the development of ultra-high-throughput technologies, the cost of sequencing bacterial genomes has been vastly reduced. As more genomes are sequenced, less time can be spent manually annotating those genomes, resulting in an increased reliance on automatic annotation pipelines. However, automatic pipelines can produce inaccurate genome annotation and their results often require manual curation. Here, we discuss the automatic and manual annotation of bacterial genomes, identify common problems introduced by the current genome annotation process and suggests potential solutions.
Collapse
Affiliation(s)
- Emily J Richardson
- The Roslin Institute, University of Edinburgh, Easter Bush, EH25 9RG, UK
| | | |
Collapse
|
14
|
Gupta S, Wallqvist A, Bondugula R, Ivanic J, Reifman J. Unraveling the conundrum of seemingly discordant protein-protein interaction datasets. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2011; 2010:783-6. [PMID: 21096109 DOI: 10.1109/iembs.2010.5626490] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Most high-throughput experimental results of protein-protein interactions (PPIs) are seemingly inconsistent with each other. In this article, we re-evaluated these contradictions within the context of the underlying domain-domain interactions (DDIs) for two Escherichia coli and four Saccharomyces cerevisiae PPI datasets derived from high-throughput (yeast two-hybrid and tandem affinity purification) experimental platforms. For shared DDIs across pairs of compared datasets, we observed a remarkably high pair-wise correlation (Pearson correlation coefficient between 0.80 and 0.84) between datasets of the same organism derived from the same experimental platform. To a lesser degree, this concordance also held true for more general inter-platform and intra-species comparisons (Pearson correlation coefficient between 0.52 and 0.89). Thus, although varying experimental conditions can influence the ability of individual proteins to interact and, therefore, create apparent differences among PPIs, the physical nature of the underlying interactions, captured by DDIs, is the same and can be used to model and predict PPIs.
Collapse
Affiliation(s)
- Shobhit Gupta
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, MD, USA
| | | | | | | | | |
Collapse
|
15
|
Kumar K, Desai V, Cheng L, Khitrov M, Grover D, Satya RV, Yu C, Zavaljevski N, Reifman J. AGeS: a software system for microbial genome sequence annotation. PLoS One 2011; 6:e17469. [PMID: 21408217 PMCID: PMC3049762 DOI: 10.1371/journal.pone.0017469] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2010] [Accepted: 02/01/2011] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND The annotation of genomes from next-generation sequencing platforms needs to be rapid, high-throughput, and fully integrated and automated. Although a few Web-based annotation services have recently become available, they may not be the best solution for researchers that need to annotate a large number of genomes, possibly including proprietary data, and store them locally for further analysis. To address this need, we developed a standalone software application, the Annotation of microbial Genome Sequences (AGeS) system, which incorporates publicly available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. METHODOLOGY The AGeS system supports three main capabilities. The first is the storage of input contig sequences and the resulting annotation data in a central, customized database. The second is the annotation of microbial genomes using an integrated software pipeline, which first analyzes contigs from high-throughput sequencing by locating genomic regions that code for proteins, RNA, and other genomic elements through the Do-It-Yourself Annotation (DIYA) framework. The identified protein-coding regions are then functionally annotated using the in-house-developed Pipeline for Protein Annotation (PIPA). The third capability is the visualization of annotated sequences using GBrowse. To date, we have implemented these capabilities for bacterial genomes. AGeS was evaluated by comparing its genome annotations with those provided by three other methods. Our results indicate that the software tools integrated into AGeS provide annotations that are in general agreement with those provided by the compared methods. This is demonstrated by a >94% overlap in the number of identified genes, a significant number of identical annotated features, and a >90% agreement in enzyme function predictions.
Collapse
Affiliation(s)
- Kamal Kumar
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Valmik Desai
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Li Cheng
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Maxim Khitrov
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Deepak Grover
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Ravi Vijaya Satya
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Chenggang Yu
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Nela Zavaljevski
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Jaques Reifman
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
- * E-mail:
| |
Collapse
|
16
|
Gong P, Pirooznia M, Guan X, Perkins EJ. Design, validation and annotation of transcriptome-wide oligonucleotide probes for the oligochaete annelid Eisenia fetida. PLoS One 2010; 5:e14266. [PMID: 21170345 PMCID: PMC2999564 DOI: 10.1371/journal.pone.0014266] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2010] [Accepted: 11/14/2010] [Indexed: 11/24/2022] Open
Abstract
High density oligonucleotide probe arrays have increasingly become an important tool in genomics studies. In organisms with incomplete genome sequence, one strategy for oligo probe design is to reduce the number of unique probes that target every non-redundant transcript through bioinformatic analysis and experimental testing. Here we adopted this strategy in making oligo probes for the earthworm Eisenia fetida, a species for which we have sequenced transcriptome-scale expressed sequence tags (ESTs). Our objectives were to identify unique transcripts as targets, to select an optimal and non-redundant oligo probe for each of these target ESTs, and to annotate the selected target sequences. We developed a streamlined and easy-to-follow approach to the design, validation and annotation of species-specific array probes. Four 244K-formatted oligo arrays were designed using eArray and were hybridized to a pooled E. fetida cRNA sample. We identified 63,541 probes with unsaturated signal intensities consistently above the background level. Target transcripts of these probes were annotated using several sequence alignment algorithms. Significant hits were obtained for 37,439 (59%) probed targets. We validated and made publicly available 63.5K oligo probes so the earthworm research community can use them to pursue ecological, toxicological, and other functional genomics questions. Our approach is efficient, cost-effective and robust because it (1) does not require a major genomics core facility; (2) allows new probes to be easily added and old probes modified or eliminated when new sequence information becomes available, (3) is not bioinformatics-intensive upfront but does provide opportunities for more in-depth annotation of biological functions for target genes; and (4) if desired, EST orthologs to the UniGene clusters of a reference genome can be identified and selected in order to improve the target gene specificity of designed probes. This approach is particularly applicable to organisms with a wealth of EST sequences but unfinished genome.
Collapse
Affiliation(s)
- Ping Gong
- Environmental Services, SpecPro Inc., Vicksburg, Mississippi, United States of America.
| | | | | | | |
Collapse
|
17
|
Identification and optimization of classifier genes from multi-class earthworm microarray dataset. PLoS One 2010; 5:e13715. [PMID: 21060837 PMCID: PMC2965664 DOI: 10.1371/journal.pone.0013715] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2010] [Accepted: 10/06/2010] [Indexed: 11/19/2022] Open
Abstract
Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. A variety of toxicological effects have been associated with explosive compounds TNT and RDX. One important goal of microarray experiments is to discover novel biomarkers for toxicity evaluation. We have developed an earthworm microarray containing 15,208 unique oligo probes and have used it to profile gene expression in 248 earthworms exposed to TNT, RDX or neither. We assembled a new machine learning pipeline consisting of several well-established feature filtering/selection and classification techniques to analyze the 248-array dataset in order to construct classifier models that can separate earthworm samples into three groups: control, TNT-treated, and RDX-treated. First, a total of 869 genes differentially expressed in response to TNT or RDX exposure were identified using a univariate statistical algorithm of class comparison. Then, decision tree-based algorithms were applied to select a subset of 354 classifier genes, which were ranked by their overall weight of significance. A multiclass support vector machine (MC-SVM) method and an unsupervised K-mean clustering method were applied to independently refine the classifier, producing a smaller subset of 39 and 30 classifier genes, separately, with 11 common genes being potential biomarkers. The combined 58 genes were considered the refined subset and used to build MC-SVM and clustering models with classification accuracy of 83.5% and 56.9%, respectively. This study demonstrates that the machine learning approach can be used to identify and optimize a small subset of classifier/biomarker genes from high dimensional datasets and generate classification models of acceptable precision for multiple classes.
Collapse
|
18
|
Wallqvist A, Zavaljevski N, Vijaya Satya R, Bondugula R, Desai V, Xin Hu, Kumar K, Lee M, Yeh IC, Chenggang Yu, Reifman J. Accelerating Biomedical Research in Designing Diagnostic Assays, Drugs, and Vaccines. Comput Sci Eng 2010. [DOI: 10.1109/mcse.2010.53] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
19
|
Jung J, Yi G, Sukno SA, Thon MR. PoGO: Prediction of Gene Ontology terms for fungal proteins. BMC Bioinformatics 2010; 11:215. [PMID: 20429880 PMCID: PMC2882390 DOI: 10.1186/1471-2105-11-215] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2010] [Accepted: 04/29/2010] [Indexed: 11/10/2022] Open
Abstract
Background Automated protein function prediction methods are the only practical approach for assigning functions to genes obtained from model organisms. Many of the previously reported function annotation methods are of limited utility for fungal protein annotation. They are often trained only to one species, are not available for high-volume data processing, or require the use of data derived by experiments such as microarray analysis. To meet the increasing need for high throughput, automated annotation of fungal genomes, we have developed a tool for annotating fungal protein sequences with terms from the Gene Ontology. Results We describe a classifier called PoGO (Prediction of Gene Ontology terms) that uses statistical pattern recognition methods to assign Gene Ontology (GO) terms to proteins from filamentous fungi. PoGO is organized as a meta-classifier in which each evidence source (sequence similarity, protein domains, protein structure and biochemical properties) is used to train independent base-level classifiers. The outputs of the base classifiers are used to train a meta-classifier, which provides the final assignment of GO terms. An independent classifier is trained for each GO term, making the system amenable to updating, without having to re-train the whole system. The resulting system is robust. It provides better accuracy and can assign GO terms to a higher percentage of unannotated protein sequences than other methods that we tested. Conclusions Our annotation system overcomes many of the shortcomings that we found in other methods. We also provide a web server where users can submit protein sequences to be annotated.
Collapse
Affiliation(s)
- Jaehee Jung
- Centro Hispano-Luso de Investigaciones Agrarias (CIALE), Department of Microbiology and Genetics, University of Salamanca, Villamayor 37185, Spain
| | | | | | | |
Collapse
|
20
|
Abstract
Summary:DIYA (Do-It-Yourself Annotator) is a modular and configurable open source pipeline software, written in Perl, used for the rapid annotation of bacterial genome sequences. The software is currently used to take DNA contigs as input, either in the form of complete genomes or the result of shotgun sequencing, and produce an annotated sequence in Genbank file format as output. Availability: Distribution and source code are available at (https://sourceforge.net/projects/diyg/). Contact:tread@emory.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Andrew C Stewart
- Genomics Department, Biological Defense Research Directorate, Naval Medical Research Center, Rockville, MD, USA
| | | | | |
Collapse
|