1
|
Lê-Bury P, Druart K, Savin C, Lechat P, Mas Fiol G, Matondo M, Bécavin C, Dussurget O, Pizarro-Cerdá J. Yersiniomics, a Multi-Omics Interactive Database for Yersinia Species. Microbiol Spectr 2023; 11:e0382622. [PMID: 36847572 PMCID: PMC10100798 DOI: 10.1128/spectrum.03826-22] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 01/26/2023] [Indexed: 03/01/2023] Open
Abstract
The genus Yersinia includes a large variety of nonpathogenic and life-threatening pathogenic bacteria, which cause a broad spectrum of diseases in humans and animals, such as plague, enteritis, Far East scarlet-like fever (FESLF), and enteric redmouth disease. Like most clinically relevant microorganisms, Yersinia spp. are currently subjected to intense multi-omics investigations whose numbers have increased extensively in recent years, generating massive amounts of data useful for diagnostic and therapeutic developments. The lack of a simple and centralized way to exploit these data led us to design Yersiniomics, a web-based platform allowing straightforward analysis of Yersinia omics data. Yersiniomics contains a curated multi-omics database at its core, gathering 200 genomic, 317 transcriptomic, and 62 proteomic data sets for Yersinia species. It integrates genomic, transcriptomic, and proteomic browsers, a genome viewer, and a heatmap viewer to navigate within genomes and experimental conditions. For streamlined access to structural and functional properties, it directly links each gene to GenBank, the Kyoto Encyclopedia of Genes and Genomes (KEGG), UniProt, InterPro, IntAct, and the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) and each experiment to Gene Expression Omnibus (GEO), the European Nucleotide Archive (ENA), or the Proteomics Identifications Database (PRIDE). Yersiniomics provides a powerful tool for microbiologists to assist with investigations ranging from specific gene studies to systems biology studies. IMPORTANCE The expanding genus Yersinia is composed of multiple nonpathogenic species and a few pathogenic species, including the deadly etiologic agent of plague, Yersinia pestis. In 2 decades, the number of genomic, transcriptomic, and proteomic studies on Yersinia grew massively, delivering a wealth of data. We developed Yersiniomics, an interactive web-based platform, to centralize and analyze omics data sets on Yersinia species. The platform allows user-friendly navigation between genomic data, expression data, and experimental conditions. Yersiniomics will be a valuable tool to microbiologists.
Collapse
Affiliation(s)
- Pierre Lê-Bury
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Paris, France
| | - Karen Druart
- Institut Pasteur, Université Paris Cité, CNRS USR2000, Mass Spectrometry for Biology Unit, Proteomic Platform, Paris, France
| | - Cyril Savin
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Paris, France
- Institut Pasteur, Université Paris Cité, Yersinia National Reference Laboratory, WHO Collaborating Research & Reference Centre for Plague FRA-140, Paris, France
| | - Pierre Lechat
- Institut Pasteur, Université Paris Cité, ALPS, Bioinformatic Hub, Paris, France
| | - Guillem Mas Fiol
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Paris, France
| | - Mariette Matondo
- Institut Pasteur, Université Paris Cité, CNRS USR2000, Mass Spectrometry for Biology Unit, Proteomic Platform, Paris, France
| | | | - Olivier Dussurget
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Paris, France
| | - Javier Pizarro-Cerdá
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Paris, France
- Institut Pasteur, Université Paris Cité, Yersinia National Reference Laboratory, WHO Collaborating Research & Reference Centre for Plague FRA-140, Paris, France
| |
Collapse
|
2
|
Wanichthanarak K, Nookaew I, Pasookhush P, Wongsurawat T, Jenjaroenpun P, Leeratsuwan N, Wattanachaisaereekul S, Visessanguan W, Sirivatanauksorn Y, Nuntasaen N, Kuhakarn C, Reutrakul V, Ajawatanawong P, Khoomrung S. Revisiting chloroplast genomic landscape and annotation towards comparative chloroplast genomes of Rhamnaceae. BMC PLANT BIOLOGY 2023; 23:59. [PMID: 36707785 PMCID: PMC9883906 DOI: 10.1186/s12870-023-04074-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 01/18/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND Massive parallel sequencing technologies have enabled the elucidation of plant phylogenetic relationships from chloroplast genomes at a high pace. These include members of the family Rhamnaceae. The current Rhamnaceae phylogenetic tree is from 13 out of 24 Rhamnaceae chloroplast genomes, and only one chloroplast genome of the genus Ventilago is available. Hence, the phylogenetic relationships in Rhamnaceae remain incomplete, and more representative species are needed. RESULTS The complete chloroplast genome of Ventilago harmandiana Pierre was outlined using a hybrid assembly of long- and short-read technologies. The accuracy and validity of the final genome were confirmed with PCR amplifications and investigation of coverage depth. Sanger sequencing was used to correct for differences in lengths and nucleotide bases between inverted repeats because of the homopolymers. The phylogenetic trees reconstructed using prevalent methods for phylogenetic inference were topologically similar. The clustering based on codon usage was congruent with the molecular phylogenetic tree. The groups of genera in each tribe were in accordance with tribal classification based on molecular markers. We resolved the phylogenetic relationships among six Hovenia species, three Rhamnus species, and two Ventilago species. Our reconstructed tree provides the most complete and reliable low-level taxonomy to date for the family Rhamnaceae. Similar to other higher plants, the RNA editing mostly resulted in converting serine to leucine. Besides, most genes were subjected to purifying selection. Annotation anomalies, including indel calling errors, unaligned open reading frames of the same gene, inconsistent prediction of intergenic regions, and misannotated genes, were identified in the published chloroplast genomes used in this study. These could be a result of the usual imperfections in computational tools, and/or existing errors in reference genomes. Importantly, these are points of concern with regards to utilizing published chloroplast genomes for comparative genomic analysis. CONCLUSIONS In summary, we successfully demonstrated the use of comprehensive genomic data, including DNA and amino acid sequences, to build a reliable and high-resolution phylogenetic tree for the family Rhamnaceae. Additionally, our study indicates that the revision of genome annotation before comparative genomic analyses is necessary to prevent the propagation of errors and complications in downstream analysis and interpretation.
Collapse
Affiliation(s)
- Kwanjeera Wanichthanarak
- Metabolomics and Systems Biology, Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, 10700, Thailand
- Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, 10700, Thailand
| | - Intawat Nookaew
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
| | - Phongthana Pasookhush
- Division of Bioinformatics and Data Management for Research, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, 10700, Thailand
| | - Thidathip Wongsurawat
- Division of Bioinformatics and Data Management for Research, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, 10700, Thailand
| | - Piroon Jenjaroenpun
- Division of Bioinformatics and Data Management for Research, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, 10700, Thailand
| | - Namkhang Leeratsuwan
- Department of Biology, Faculty of Science, Mahidol University, Bangkok, 10400, Thailand
| | | | - Wonnop Visessanguan
- Functional Ingredients and Food Biotechnology Research Unit, National Center for Genetic Engineering and Biotechnology (BIOTEC), Phathumthani, 12120, Thailand
| | - Yongyut Sirivatanauksorn
- Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, 10700, Thailand
| | - Narong Nuntasaen
- Department of Chemistry and Center of Excellence for Innovation in Chemistry (PERCH-CIC), Faculty of Science, Mahidol University, Bangkok, 10400, Thailand
- Department of National Parks, Wildlife and Plant Conservation, Ministry of Natural Resources and Environment, Bangkok, 10900, Thailand
| | - Chutima Kuhakarn
- Department of Chemistry and Center of Excellence for Innovation in Chemistry (PERCH-CIC), Faculty of Science, Mahidol University, Bangkok, 10400, Thailand
| | - Vichai Reutrakul
- Department of Chemistry and Center of Excellence for Innovation in Chemistry (PERCH-CIC), Faculty of Science, Mahidol University, Bangkok, 10400, Thailand
| | - Pravech Ajawatanawong
- Division of Bioinformatics and Data Management for Research, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, 10700, Thailand.
| | - Sakda Khoomrung
- Metabolomics and Systems Biology, Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, 10700, Thailand.
- Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, 10700, Thailand.
- Department of Chemistry and Center of Excellence for Innovation in Chemistry (PERCH-CIC), Faculty of Science, Mahidol University, Bangkok, 10400, Thailand.
| |
Collapse
|
3
|
Feng Y, Wang Z, Chien KY, Chen HL, Liang YH, Hua X, Chiu CH. "Pseudo-pseudogenes" in bacterial genomes: Proteogenomics reveals a wide but low protein expression of pseudogenes in Salmonella enterica. Nucleic Acids Res 2022; 50:5158-5170. [PMID: 35489061 PMCID: PMC9122581 DOI: 10.1093/nar/gkac302] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 04/11/2022] [Accepted: 04/14/2022] [Indexed: 12/03/2022] Open
Abstract
Pseudogenes (genes disrupted by frameshift or in-frame stop codons) are ubiquitously present in the bacterial genome and considered as nonfunctional fossil. Here, we used RNA-seq and mass-spectrometry technologies to measure the transcriptomes and proteomes of Salmonella enterica serovars Paratyphi A and Typhi. All pseudogenes’ mRNA sequences remained disrupted, and were present at comparable levels to their intact homologs. At the protein level, however, 101 out of 161 pseudogenes suggested successful translation, with their low expression regardless of growth conditions, genetic background and pseudogenization causes. The majority of frameshifting detected was compensatory for -1 frameshift mutations. Readthrough of in-frame stop codons primarily involved UAG; and cytosine was the most frequent base adjacent to the codon. Using a fluorescence reporter system, fifteen pseudogenes were confirmed to express successfully in vivo in Escherichia coli. Expression of the intact copy of the fifteen pseudogenes in S. Typhi affected bacterial pathogenesis as revealed in human macrophage and epithelial cell infection models. The above findings suggest the need to revisit the nonstandard translation mechanism as well as the biological role of pseudogenes in the bacterial genome.
Collapse
Affiliation(s)
- Ye Feng
- Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, People's Republic of China.,Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, People's Republic of China
| | - Zeyu Wang
- Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, People's Republic of China.,Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, People's Republic of China
| | - Kun-Yi Chien
- Graduate Institute of Biomedical Sciences, Chang Gung University College of Medicine, Taoyuan, Republic of China
| | - Hsiu-Ling Chen
- Molecular Infectious Disease Research Center, Chang Gung Memorial Hospital, Taoyuan, Republic of China
| | - Yi-Hua Liang
- Molecular Infectious Disease Research Center, Chang Gung Memorial Hospital, Taoyuan, Republic of China
| | - Xiaoting Hua
- Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, People's Republic of China
| | - Cheng-Hsun Chiu
- Graduate Institute of Biomedical Sciences, Chang Gung University College of Medicine, Taoyuan, Republic of China.,Molecular Infectious Disease Research Center, Chang Gung Memorial Hospital, Taoyuan, Republic of China.,Division of Pediatric Infectious Diseases, Department of Pediatrics, Chang Gung Memorial Hospital, Chang Gung University College of Medicine, Taoyuan, Republic of China
| |
Collapse
|
4
|
Belinky F, Ganguly I, Poliakov E, Yurchenko V, Rogozin IB. Analysis of Stop Codons within Prokaryotic Protein-Coding Genes Suggests Frequent Readthrough Events. Int J Mol Sci 2021; 22:ijms22041876. [PMID: 33672790 PMCID: PMC7918605 DOI: 10.3390/ijms22041876] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 02/05/2021] [Accepted: 02/09/2021] [Indexed: 02/07/2023] Open
Abstract
Nonsense mutations turn a coding (sense) codon into an in-frame stop codon that is assumed to result in a truncated protein product. Thus, nonsense substitutions are the hallmark of pseudogenes and are used to identify them. Here we show that in-frame stop codons within bacterial protein-coding genes are widespread. Their evolutionary conservation suggests that many of them are not pseudogenes, since they maintain dN/dS values (ratios of substitution rates at non-synonymous and synonymous sites) significantly lower than 1 (this is a signature of purifying selection in protein-coding regions). We also found that double substitutions in codons—where an intermediate step is a nonsense substitution—show a higher rate of evolution compared to null models, indicating that a stop codon was introduced and then changed back to sense via positive selection. This further supports the notion that nonsense substitutions in bacteria are relatively common and do not necessarily cause pseudogenization. In-frame stop codons may be an important mechanism of regulation: Such codons are likely to cause a substantial decrease of protein expression levels.
Collapse
Affiliation(s)
- Frida Belinky
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (F.B.); (I.G.)
| | - Ishan Ganguly
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (F.B.); (I.G.)
| | - Eugenia Poliakov
- National Eye Institute, National Institutes of Health, Bethesda, MD 20892, USA;
| | - Vyacheslav Yurchenko
- Life Science Research Centre, Faculty of Science, University of Ostrava, 710 00 Ostrava, Czech Republic
- Martsinovsky Institute of Medical Parasitology, Tropical and Vector Borne Diseases, Sechenov University, 119435 Moscow, Russia
- Correspondence: (V.Y.); (I.B.R.)
| | - Igor B. Rogozin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (F.B.); (I.G.)
- Correspondence: (V.Y.); (I.B.R.)
| |
Collapse
|
5
|
Koch L, Poyot T, Schnetterle M, Guillier S, Soulé E, Nolent F, Gorgé O, Neulat-Ripoll F, Valade E, Sebbane F, Biot F. Transcriptomic studies and assessment of Yersinia pestis reference genes in various conditions. Sci Rep 2019; 9:2501. [PMID: 30792499 PMCID: PMC6385181 DOI: 10.1038/s41598-019-39072-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Accepted: 12/14/2018] [Indexed: 12/27/2022] Open
Abstract
Reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR) is a very sensitive widespread technique considered as the gold standard to explore transcriptional variations. While a particular methodology has to be followed to provide accurate results many published studies are likely to misinterpret results due to lack of minimal quality requirements. Yersinia pestis is a highly pathogenic bacterium responsible for plague. It has been used to propose a ready-to-use and complete approach to mitigate the risk of technical biases in transcriptomic studies. The selection of suitable reference genes (RGs) among 29 candidates was performed using four different methods (GeNorm, NormFinder, BestKeeper and the Delta-Ct method). An overall comprehensive ranking revealed that 12 following candidate RGs are suitable for accurate normalization: gmk, proC, fabD, rpoD, nadB, rho, thrA, ribD, mutL, rpoB, adk and tmk. Some frequently used genes like 16S RNA had even been found as unsuitable to study Y. pestis. This methodology allowed us to demonstrate, under different temperatures and states of growth, significant transcriptional changes of six efflux pumps genes involved in physiological aspects as antimicrobial resistance or virulence. Previous transcriptomic studies done under comparable conditions had not been able to highlight these transcriptional modifications. These results highlight the importance of validating RGs prior to the normalization of transcriptional expression levels of targeted genes. This accurate methodology can be extended to any gene of interest in Y. pestis. More generally, the same workflow can be applied to identify and validate appropriate RGs in other bacteria to study transcriptional variations.
Collapse
Affiliation(s)
- Lionel Koch
- Institut de Recherche Biomédicale des Armées (IRBA), Brétigny-sur-Orge, France
- Ecole du Val de Grace (EVDG), Paris, France
- Aix Marseille University, INSERM, SSA, IRBA, MCT, Marseille, France
| | - Thomas Poyot
- Institut de Recherche Biomédicale des Armées (IRBA), Brétigny-sur-Orge, France
| | - Marine Schnetterle
- Institut de Recherche Biomédicale des Armées (IRBA), Brétigny-sur-Orge, France
- Aix Marseille University, INSERM, SSA, IRBA, MCT, Marseille, France
| | - Sophie Guillier
- Institut de Recherche Biomédicale des Armées (IRBA), Brétigny-sur-Orge, France
- Aix Marseille University, INSERM, SSA, IRBA, MCT, Marseille, France
| | - Estelle Soulé
- Institut de Recherche Biomédicale des Armées (IRBA), Brétigny-sur-Orge, France
- Aix Marseille University, INSERM, SSA, IRBA, MCT, Marseille, France
| | - Flora Nolent
- Institut de Recherche Biomédicale des Armées (IRBA), Brétigny-sur-Orge, France
- Aix Marseille University, INSERM, SSA, IRBA, MCT, Marseille, France
| | - Olivier Gorgé
- Institut de Recherche Biomédicale des Armées (IRBA), Brétigny-sur-Orge, France
- Aix Marseille University, INSERM, SSA, IRBA, MCT, Marseille, France
| | - Fabienne Neulat-Ripoll
- Institut de Recherche Biomédicale des Armées (IRBA), Brétigny-sur-Orge, France
- Aix Marseille University, INSERM, SSA, IRBA, MCT, Marseille, France
| | - Eric Valade
- Institut de Recherche Biomédicale des Armées (IRBA), Brétigny-sur-Orge, France
- Ecole du Val de Grace (EVDG), Paris, France
- Aix Marseille University, INSERM, SSA, IRBA, MCT, Marseille, France
| | - Florent Sebbane
- Inserm, University of Lille, CNRS, CHU Lille, Institut Pasteur de Lille, U1019-UMR8204-CIIL-Center for Infection and Immunity of Lille, Lille, France
| | - Fabrice Biot
- Institut de Recherche Biomédicale des Armées (IRBA), Brétigny-sur-Orge, France.
- Aix Marseille University, INSERM, SSA, IRBA, MCT, Marseille, France.
| |
Collapse
|
6
|
Herrera CM, Henderson JC, Crofts AA, Trent MS. Novel coordination of lipopolysaccharide modifications in Vibrio cholerae promotes CAMP resistance. Mol Microbiol 2017; 106:582-596. [PMID: 28906060 DOI: 10.1111/mmi.13835] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/10/2017] [Indexed: 01/02/2023]
Abstract
In the environment and during infection, the human intestinal pathogen Vibrio cholerae must overcome noxious compounds that damage the bacterial outer membrane. The El Tor and classical biotypes of O1 V. cholerae show striking differences in their resistance to membrane disrupting cationic antimicrobial peptides (CAMPs), such as polymyxins. The classical biotype is susceptible to CAMPs, but current pandemic El Tor biotype isolates gain CAMP resistance by altering the net charge of their cell surface through glycine modification of lipid A. Here we report a second lipid A modification mechanism that only functions in the V. cholerae El Tor biotype. We identify a functional EptA ortholog responsible for the transfer of the amino-residue phosphoethanolamine (pEtN) to the lipid A of V. cholerae El Tor that is not functional in the classical biotype. We previously reported that mildly acidic growth conditions (pH 5.8) downregulate expression of genes encoding the glycine modification machinery. In this report, growth at pH 5.8 increases expression of eptA with concomitant pEtN modification suggesting coordinated regulation of these LPS modification systems. Similarly, efficient pEtN lipid A substitution is seen in the absence of lipid A glycinylation. We further demonstrate EptA orthologs from non-cholerae Vibrio species are functional.
Collapse
Affiliation(s)
- Carmen M Herrera
- Department of Infectious Diseases, Center for Vaccines and Immunology, University of Georgia, College of Veterinary Medicine, Athens, GA 30602, USA
| | - Jeremy C Henderson
- Department of Infectious Diseases, Center for Vaccines and Immunology, University of Georgia, College of Veterinary Medicine, Athens, GA 30602, USA
| | - Alexander A Crofts
- Department of Molecular Biosciences, Institute for Cellular and Molecular Biology, The University of Texas at Austin, TX 78712, USA
| | - M Stephen Trent
- Department of Infectious Diseases, Center for Vaccines and Immunology, University of Georgia, College of Veterinary Medicine, Athens, GA 30602, USA
| |
Collapse
|
7
|
Merkley ED, Sego LH, Lin A, Leiser OP, Kaiser BLD, Adkins JN, Keim PS, Wagner DM, Kreuzer HW. Protein abundances can distinguish between naturally-occurring and laboratory strains of Yersinia pestis, the causative agent of plague. PLoS One 2017; 12:e0183478. [PMID: 28854255 PMCID: PMC5576697 DOI: 10.1371/journal.pone.0183478] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2016] [Accepted: 08/05/2017] [Indexed: 11/19/2022] Open
Abstract
The rapid pace of bacterial evolution enables organisms to adapt to the laboratory environment with repeated passage and thus diverge from naturally-occurring environmental ("wild") strains. Distinguishing wild and laboratory strains is clearly important for biodefense and bioforensics; however, DNA sequence data alone has thus far not provided a clear signature, perhaps due to lack of understanding of how diverse genome changes lead to convergent phenotypes, difficulty in detecting certain types of mutations, or perhaps because some adaptive modifications are epigenetic. Monitoring protein abundance, a molecular measure of phenotype, can overcome some of these difficulties. We have assembled a collection of Yersinia pestis proteomics datasets from our own published and unpublished work, and from a proteomics data archive, and demonstrated that protein abundance data can clearly distinguish laboratory-adapted from wild. We developed a lasso logistic regression classifier that uses binary (presence/absence) or quantitative protein abundance measures to predict whether a sample is laboratory-adapted or wild that proved to be ~98% accurate, as judged by replicated 10-fold cross-validation. Protein features selected by the classifier accord well with our previous study of laboratory adaptation in Y. pestis. The input data was derived from a variety of unrelated experiments and contained significant confounding variables. We show that the classifier is robust with respect to these variables. The methodology is able to discover signatures for laboratory facility and culture medium that are largely independent of the signature of laboratory adaptation. Going beyond our previous laboratory evolution study, this work suggests that proteomic differences between laboratory-adapted and wild Y. pestis are general, potentially pointing to a process that could apply to other species as well. Additionally, we show that proteomics datasets (even archived data collected for different purposes) contain the information necessary to distinguish wild and laboratory samples. This work has clear applications in biomarker detection as well as biodefense.
Collapse
Affiliation(s)
- Eric D. Merkley
- Chemical and Biological Signature Sciences, Pacific Northwest National Laboratory, Richland, Washington, United States of America
- * E-mail:
| | - Landon H. Sego
- Applied Statistics and Computational Modeling, Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - Andy Lin
- Chemical and Biological Signature Sciences, Pacific Northwest National Laboratory, Richland, Washington, United States of America
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Owen P. Leiser
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, United States of America
| | - Brooke L. Deatherage Kaiser
- Chemical and Biological Signature Sciences, Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - Joshua N. Adkins
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - Paul S. Keim
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, United States of America
| | - David M. Wagner
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, United States of America
| | - Helen W. Kreuzer
- Chemical and Biological Signature Sciences, Pacific Northwest National Laboratory, Richland, Washington, United States of America
| |
Collapse
|
8
|
Mao Y, Yang X, Liu Y, Yan Y, Du Z, Han Y, Song Y, Zhou L, Cui Y, Yang R. Reannotation of Yersinia pestis Strain 91001 Based on Omics Data. Am J Trop Med Hyg 2016; 95:562-70. [PMID: 27382076 DOI: 10.4269/ajtmh.16-0215] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Accepted: 05/17/2016] [Indexed: 12/16/2022] Open
Abstract
Yersinia pestis is among the most dangerous human pathogens, and systematic research of this pathogen is important in bacterial pathogenomics research. To fully interpret the biological functions, physiological characteristics, and pathogenesis of Y. pestis, a comprehensive annotation of its entire genome is necessary. The emergence of omics-based research has brought new opportunities to better annotate the genome of this pathogen. Here, the complete genome of Y. pestis strain 91001 was reannotated using genomics and proteogenomics data. One hundred and thirty-seven unreliable coding sequences were removed, and 41 homologous genes were relocated with their translational initiation sites, while the functions of seven pseudogenes and 392 hypothetical genes were revised. Moreover, annotations of noncoding RNAs, repeat sequences, and transposable elements have also been incorporated. The reannotated results are freely available at http://tody.bmi.ac.cn.
Collapse
Affiliation(s)
- Yiqing Mao
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, People's Republic of China. Center of Information Technology, Beijing Institute of Health and Medical Information, Beijing, People's Republic of China
| | - Xianwei Yang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, People's Republic of China
| | - Yang Liu
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing, People's Republic of China
| | - Yanfeng Yan
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, People's Republic of China
| | - Zongmin Du
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, People's Republic of China
| | - Yanping Han
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, People's Republic of China
| | - Yajun Song
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, People's Republic of China
| | - Lei Zhou
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, People's Republic of China
| | - Yujun Cui
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, People's Republic of China.
| | - Ruifu Yang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, People's Republic of China.
| |
Collapse
|
9
|
Alves G, Yu YK. Confidence assignment for mass spectrometry based peptide identifications via the extreme value distribution. Bioinformatics 2016; 32:2642-9. [PMID: 27153659 DOI: 10.1093/bioinformatics/btw225] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Accepted: 04/16/2016] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION There is a growing trend for biomedical researchers to extract evidence and draw conclusions from mass spectrometry based proteomics experiments, the cornerstone of which is peptide identification. Inaccurate assignments of peptide identification confidence thus may have far-reaching and adverse consequences. Although some peptide identification methods report accurate statistics, they have been limited to certain types of scoring function. The extreme value statistics based method, while more general in the scoring functions it allows, demands accurate parameter estimates and requires, at least in its original design, excessive computational resources. Improving the parameter estimate accuracy and reducing the computational cost for this method has two advantages: it provides another feasible route to accurate significance assessment, and it could provide reliable statistics for scoring functions yet to be developed. RESULTS We have formulated and implemented an efficient algorithm for calculating the extreme value statistics for peptide identification applicable to various scoring functions, bypassing the need for searching large random databases. AVAILABILITY AND IMPLEMENTATION The source code, implemented in C ++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit CONTACT yyu@ncbi.nlm.nih.gov SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gelio Alves
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Yi-Kuo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
10
|
Alves G, Wang G, Ogurtsov AY, Drake SK, Gucek M, Suffredini AF, Sacks DB, Yu YK. Identification of Microorganisms by High Resolution Tandem Mass Spectrometry with Accurate Statistical Significance. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2016; 27:194-210. [PMID: 26510657 PMCID: PMC4723618 DOI: 10.1007/s13361-015-1271-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Revised: 09/04/2015] [Accepted: 09/05/2015] [Indexed: 05/13/2023]
Abstract
Correct and rapid identification of microorganisms is the key to the success of many important applications in health and safety, including, but not limited to, infection treatment, food safety, and biodefense. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is challenging correct microbial identification because of the large number of choices present. To properly disentangle candidate microbes, one needs to go beyond apparent morphology or simple 'fingerprinting'; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptidome profiles of microbes to better separate them and by designing an analysis method that yields accurate statistical significance. Here, we present an analysis pipeline that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using MS/MS data of 81 samples, each composed of a single known microorganism, that the proposed pipeline can correctly identify microorganisms at least at the genus and species levels. We have also shown that the proposed pipeline computes accurate statistical significances, i.e., E-values for identified peptides and unified E-values for identified microorganisms. The proposed analysis pipeline has been implemented in MiCId, a freely available software for Microorganism Classification and Identification. MiCId is available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Gelio Alves
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Guanghui Wang
- Proteomics Core, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Aleksey Y Ogurtsov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Steven K Drake
- Critical Care Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Marjan Gucek
- Proteomics Core, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Anthony F Suffredini
- Critical Care Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - David B Sacks
- Department of Laboratory Medicine, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Yi-Kuo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
11
|
Locard-Paulet M, Pible O, Gonzalez de Peredo A, Alpha-Bazin B, Almunia C, Burlet-Schiltz O, Armengaud J. Clinical implications of recent advances in proteogenomics. Expert Rev Proteomics 2016; 13:185-99. [DOI: 10.1586/14789450.2016.1132169] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
12
|
Yang R, Motin VL. Yersinia pestis in the Age of Big Data. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 918:257-272. [PMID: 27722866 DOI: 10.1007/978-94-024-0890-4_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/28/2023]
Abstract
As omics-driven technologies developed rapidly, genomics, transcriptomics, proteomics, metabolomics and other omics-based data have been accumulated in unprecedented speed. Omics-driven big data in biology have changed our way of research. "Big science" has promoted our understanding of biology in a holistic overview that is impossibly achieved by traditional hypothesis-driven research. In this chapter, we gave an overview of omics-driven research on Y. pestis, provided a way of thinking on Yersinia pestis research in the age of big data, and made some suggestions to integrate omics-based data for systems understanding of Y. pestis.
Collapse
Affiliation(s)
- Ruifu Yang
- Beijing Institute of Microbiology and Epidemiology, No. Dongdajie, Fengtai, Beijing, 100071, China.
| | - Vladimir L Motin
- Departments of Pathology and Microbiology & Immunology, University of Texas Medical Branch, Galveston, TX, 77555, USA
| |
Collapse
|
13
|
Kumar D, Mondal AK, Kutum R, Dash D. Proteogenomics of rare taxonomic phyla: A prospective treasure trove of protein coding genes. Proteomics 2015; 16:226-40. [PMID: 26773550 DOI: 10.1002/pmic.201500263] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 09/18/2015] [Accepted: 09/28/2015] [Indexed: 01/04/2023]
Abstract
Sustainable innovations in sequencing technologies have resulted in a torrent of microbial genome sequencing projects. However, the prokaryotic genomes sequenced so far are unequally distributed along their phylogenetic tree; few phyla contain the majority, the rest only a few representatives. Accurate genome annotation lags far behind genome sequencing. While automated computational prediction, aided by comparative genomics, remains a popular choice for genome annotation, substantial fraction of these annotations are erroneous. Proteogenomics utilizes protein level experimental observations to annotate protein coding genes on a genome wide scale. Benefits of proteogenomics include discovery and correction of gene annotations regardless of their phylogenetic conservation. This not only allows detection of common, conserved proteins but also the discovery of protein products of rare genes that may be horizontally transferred or taxonomy specific. Chances of encountering such genes are more in rare phyla that comprise a small number of complete genome sequences. We collated all bacterial and archaeal proteogenomic studies carried out to date and reviewed them in the context of genome sequencing projects. Here, we present a comprehensive list of microbial proteogenomic studies, their taxonomic distribution, and also urge for targeted proteogenomics of underexplored taxa to build an extensive reference of protein coding genes.
Collapse
Affiliation(s)
- Dhirendra Kumar
- G. N. Ramachandran Knowledge Center of Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Delhi, India
| | - Anupam Kumar Mondal
- G. N. Ramachandran Knowledge Center of Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Delhi, India
| | - Rintu Kutum
- G. N. Ramachandran Knowledge Center of Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Delhi, India
| | - Debasis Dash
- G. N. Ramachandran Knowledge Center of Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Delhi, India
| |
Collapse
|
14
|
Zimbler DL, Schroeder JA, Eddy JL, Lathem WW. Early emergence of Yersinia pestis as a severe respiratory pathogen. Nat Commun 2015; 6:7487. [PMID: 26123398 PMCID: PMC4491175 DOI: 10.1038/ncomms8487] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 05/12/2015] [Indexed: 11/09/2022] Open
Abstract
Yersinia pestis causes the fatal respiratory disease pneumonic plague. Y. pestis recently evolved from the gastrointestinal pathogen Y. pseudotuberculosis; however, it is not known at what point Y. pestis gained the ability to induce a fulminant pneumonia. Here we show that the acquisition of a single gene encoding the protease Pla was sufficient for the most ancestral, deeply rooted strains of Y. pestis to cause pneumonic plague, indicating that Y. pestis was primed to infect the lungs at a very early stage in its evolution. As Y. pestis further evolved, modern strains acquired a single amino-acid modification within Pla that optimizes protease activity. While this modification is unnecessary to cause pneumonic plague, the substitution is instead needed to efficiently induce the invasive infection associated with bubonic plague. These findings indicate that Y. pestis was capable of causing pneumonic plague before it evolved to optimally cause invasive infections in mammals.
Collapse
Affiliation(s)
- Daniel L Zimbler
- Department of Microbiology-Immunology, Northwestern University Feinberg School of Medicine, Chicago, Illinois 60611, USA
| | - Jay A Schroeder
- Department of Microbiology-Immunology, Northwestern University Feinberg School of Medicine, Chicago, Illinois 60611, USA
| | - Justin L Eddy
- Department of Microbiology-Immunology, Northwestern University Feinberg School of Medicine, Chicago, Illinois 60611, USA
| | - Wyndham W Lathem
- Department of Microbiology-Immunology, Northwestern University Feinberg School of Medicine, Chicago, Illinois 60611, USA
| |
Collapse
|
15
|
Kucharova V, Wiker HG. Proteogenomics in microbiology: taking the right turn at the junction of genomics and proteomics. Proteomics 2014; 14:2360-675. [PMID: 25263021 DOI: 10.1002/pmic.201400168] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2014] [Revised: 08/18/2014] [Accepted: 09/23/2014] [Indexed: 12/14/2022]
Abstract
High-accuracy and high-throughput proteomic methods have completely changed the way we can identify and characterize proteins. MS-based proteomics can now provide a unique supplement to genomic data and add a new level of information to the interpretation of genomic sequences. Proteomics-driven genome annotation has become especially relevant in microbiology where genomes are sequenced on a daily basis and limitations of an in silico driven annotation process are well recognized. In this review paper, we outline different strategies on how one can design a proteogenomic experiment, for example on genome-sequenced (synonymous proteogenomics) versus unsequenced organisms (ortho-proteogenomics) or with the aid of other "omic" data such as RNA-seq. We touch upon many challenges that are encountered during a typical proteogenomic study, mostly concerning bioinformatics methods and downstream data analysis, but also related to creation and use of sequence databases. A large list of proteogenomic case studies of different microorganisms is provided to illustrate the mapping of MS/MS-derived peptide spectra to genomic DNA sequences. These investigations have led to accurate determination of translational initiation sites, pointed out eventual read-throughs or programmed frameshifts, detected signal peptide processing or other protein maturation events, removed questionable annotation assignments, and provided evidence for predicted hypothetical proteins.
Collapse
Affiliation(s)
- Veronika Kucharova
- Department of Clinical Science, The Gade Research Group for Infection and Immunity, University of Bergen, Norway
| | | |
Collapse
|
16
|
Schellenberg JJ, Verbeke TJ, McQueen P, Krokhin OV, Zhang X, Alvare G, Fristensky B, Thallinger GG, Henrissat B, Wilkins JA, Levin DB, Sparling R. Enhanced whole genome sequence and annotation of Clostridium stercorarium DSM8532T using RNA-seq transcriptomics and high-throughput proteomics. BMC Genomics 2014; 15:567. [PMID: 24998381 PMCID: PMC4102724 DOI: 10.1186/1471-2164-15-567] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2013] [Accepted: 06/26/2014] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Growing interest in cellulolytic clostridia with potential for consolidated biofuels production is mitigated by low conversion of raw substrates to desired end products. Strategies to improve conversion are likely to benefit from emerging techniques to define molecular systems biology of these organisms. Clostridium stercorarium DSM8532T is an anaerobic thermophile with demonstrated high ethanol production on cellulose and hemicellulose. Although several lignocellulolytic enzymes in this organism have been well-characterized, details concerning carbohydrate transporters and central metabolism have not been described. Therefore, the goal of this study is to define an improved whole genome sequence (WGS) for this organism using in-depth molecular profiling by RNA-seq transcriptomics and tandem mass spectrometry-based proteomics. RESULTS A paired-end Roche/454 WGS assembly was closed through application of an in silico algorithm designed to resolve repetitive sequence regions, resulting in a circular replicon with one gap and a region of 2 kilobases with 10 ambiguous bases. RNA-seq transcriptomics resulted in nearly complete coverage of the genome, identifying errors in homopolymer length attributable to 454 sequencing. Peptide sequences resulting from high-throughput tandem mass spectrometry of trypsin-digested protein extracts were mapped to 1,755 annotated proteins (68% of all protein-coding regions). Proteogenomic analysis confirmed the quality of annotation and improvement pipelines, identifying a missing gene and an alternative reading frame. Peptide coverage of genes hypothetically involved in substrate hydrolysis, transport and utilization confirmed multiple pathways for glycolysis, pyruvate conversion and recycling of intermediates. No sequences homologous to transaldolase, a central enzyme in the pentose phosphate pathway, were observed by any method, despite demonstrated growth of this organism on xylose and xylan hemicellulose. CONCLUSIONS Complementary omics techniques confirm the quality of genome sequence assembly, annotation and error-reporting. Nearly complete genome coverage by RNA-seq likely indicates background DNA in RNA extracts, however these preps resulted in WGS enhancement and transcriptome profiling in a single Illumina run. No detection of transaldolase by any method despite xylose utilization by this organism indicates an alternative pathway for sedoheptulose-7-phosphate degradation. This report combines next-generation omics techniques to elucidate previously undefined features of substrate transport and central metabolism for this organism and its potential for consolidated biofuels production from lignocellulose.
Collapse
Affiliation(s)
| | - Tobin J Verbeke
- />Department of Microbiology, University of Manitoba, Winnipeg, Canada
| | - Peter McQueen
- />Manitoba Centre for Proteomics and Systems Biology, University of Manitoba, Winnipeg, Canada
| | - Oleg V Krokhin
- />Manitoba Centre for Proteomics and Systems Biology, University of Manitoba, Winnipeg, Canada
| | - Xiangli Zhang
- />Department of Plant Sciences, University of Manitoba, Winnipeg, Canada
| | - Graham Alvare
- />Department of Plant Sciences, University of Manitoba, Winnipeg, Canada
| | - Brian Fristensky
- />Department of Plant Sciences, University of Manitoba, Winnipeg, Canada
| | - Gerhard G Thallinger
- />Core Facility Bioinformatics, Austrian Centre of Industrial Biotechnology (ACIB), Graz, Austria
- />Institute for Genomics and Bioinformatics, Graz University of Technology, Graz, Austria
| | - Bernard Henrissat
- />Architecture et Fonction des Macromolécules Biologiques, Université Aix-Marseille, Marseille, France
- />UMR 7257, Centre National de Recherche Scientifique, 163 ave. de Luminy, Marseille, 13288 France
| | - John A Wilkins
- />Manitoba Centre for Proteomics and Systems Biology, University of Manitoba, Winnipeg, Canada
| | - David B Levin
- />Department of Biosystems Engineering, University of Manitoba, Winnipeg, Canada
| | - Richard Sparling
- />Department of Microbiology, University of Manitoba, Winnipeg, Canada
| |
Collapse
|
17
|
Aryal UK, Callister SJ, McMahon BH, McCue LA, Brown J, Stöckel J, Liberton M, Mishra S, Zhang X, Nicora CD, Angel TE, Koppenaal DW, Smith RD, Pakrasi HB, Sherman LA. Proteomic Profiles of Five Strains of Oxygenic Photosynthetic Cyanobacteria of the Genus Cyanothece. J Proteome Res 2014; 13:3262-76. [DOI: 10.1021/pr5000889] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Uma K. Aryal
- Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Stephen J. Callister
- Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | | | - Lee-Ann McCue
- Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Joseph Brown
- Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Jana Stöckel
- Department
of Biology, Washington University, St. Louis, Missouri 63130, United States
- MOgene Green Chemicals LC, St. Louis, Missouri 63132, United States
| | - Michelle Liberton
- Department
of Biology, Washington University, St. Louis, Missouri 63130, United States
| | - Sujata Mishra
- Department
of Biological Sciences, Purdue University, West Lafayette, Indiana 47907, United States
| | - Xiaohui Zhang
- Department
of Biological Sciences, Purdue University, West Lafayette, Indiana 47907, United States
| | - Carrie D. Nicora
- Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Thomas E. Angel
- Pacific Northwest National Laboratory, Richland, Washington 99352, United States
- Kinemed, Inc., Horton Street, Emeryville, California 94608, United States
| | - David W. Koppenaal
- Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Richard D. Smith
- Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Himadri B. Pakrasi
- Department
of Biology, Washington University, St. Louis, Missouri 63130, United States
| | - Louis A. Sherman
- Department
of Biological Sciences, Purdue University, West Lafayette, Indiana 47907, United States
| |
Collapse
|
18
|
Hilker R, Stadermann KB, Doppmeier D, Kalinowski J, Stoye J, Straube J, Winnebald J, Goesmann A. ReadXplorer--visualization and analysis of mapped sequences. Bioinformatics 2014; 30:2247-54. [PMID: 24790157 PMCID: PMC4217279 DOI: 10.1093/bioinformatics/btu205] [Citation(s) in RCA: 92] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION Fast algorithms and well-arranged visualizations are required for the comprehensive analysis of the ever-growing size of genomic and transcriptomic next-generation sequencing data. RESULTS ReadXplorer is a software offering straightforward visualization and extensive analysis functions for genomic and transcriptomic DNA sequences mapped on a reference. A unique specialty of ReadXplorer is the quality classification of the read mappings. It is incorporated in all analysis functions and displayed in ReadXplorer's various synchronized data viewers for (i) the reference sequence, its base coverage as (ii) normalizable plot and (iii) histogram, (iv) read alignments and (v) read pairs. ReadXplorer's analysis capability covers RNA secondary structure prediction, single nucleotide polymorphism and deletion-insertion polymorphism detection, genomic feature and general coverage analysis. Especially for RNA-Seq data, it offers differential gene expression analysis, transcription start site and operon detection as well as RPKM value and read count calculations. Furthermore, ReadXplorer can combine or superimpose coverage of different datasets. AVAILABILITY AND IMPLEMENTATION ReadXplorer is available as open-source software at http://www.readxplorer.org along with a detailed manual.
Collapse
Affiliation(s)
- Rolf Hilker
- Institute of Medical Microbiology, Justus-Liebig-University, 35392 Giessen, Germany, Faculty of Biology, Institute for Bioinformatics, Center for Biotechnology, Computational Genomics, Center for Biotechnology, Technology Platform Genomics, Center for Biotechnology, Genome Informatics, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany and Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, 35392 Giessen, Germany
| | - Kai Bernd Stadermann
- Institute of Medical Microbiology, Justus-Liebig-University, 35392 Giessen, Germany, Faculty of Biology, Institute for Bioinformatics, Center for Biotechnology, Computational Genomics, Center for Biotechnology, Technology Platform Genomics, Center for Biotechnology, Genome Informatics, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany and Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, 35392 Giessen, GermanyInstitute of Medical Microbiology, Justus-Liebig-University, 35392 Giessen, Germany, Faculty of Biology, Institute for Bioinformatics, Center for Biotechnology, Computational Genomics, Center for Biotechnology, Technology Platform Genomics, Center for Biotechnology, Genome Informatics, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany and Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, 35392 Giessen, Germany
| | - Daniel Doppmeier
- Institute of Medical Microbiology, Justus-Liebig-University, 35392 Giessen, Germany, Faculty of Biology, Institute for Bioinformatics, Center for Biotechnology, Computational Genomics, Center for Biotechnology, Technology Platform Genomics, Center for Biotechnology, Genome Informatics, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany and Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, 35392 Giessen, Germany
| | - Jörn Kalinowski
- Institute of Medical Microbiology, Justus-Liebig-University, 35392 Giessen, Germany, Faculty of Biology, Institute for Bioinformatics, Center for Biotechnology, Computational Genomics, Center for Biotechnology, Technology Platform Genomics, Center for Biotechnology, Genome Informatics, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany and Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, 35392 Giessen, Germany
| | - Jens Stoye
- Institute of Medical Microbiology, Justus-Liebig-University, 35392 Giessen, Germany, Faculty of Biology, Institute for Bioinformatics, Center for Biotechnology, Computational Genomics, Center for Biotechnology, Technology Platform Genomics, Center for Biotechnology, Genome Informatics, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany and Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, 35392 Giessen, GermanyInstitute of Medical Microbiology, Justus-Liebig-University, 35392 Giessen, Germany, Faculty of Biology, Institute for Bioinformatics, Center for Biotechnology, Computational Genomics, Center for Biotechnology, Technology Platform Genomics, Center for Biotechnology, Genome Informatics, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany and Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, 35392 Giessen, Germany
| | - Jasmin Straube
- Institute of Medical Microbiology, Justus-Liebig-University, 35392 Giessen, Germany, Faculty of Biology, Institute for Bioinformatics, Center for Biotechnology, Computational Genomics, Center for Biotechnology, Technology Platform Genomics, Center for Biotechnology, Genome Informatics, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany and Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, 35392 Giessen, Germany
| | - Jörn Winnebald
- Institute of Medical Microbiology, Justus-Liebig-University, 35392 Giessen, Germany, Faculty of Biology, Institute for Bioinformatics, Center for Biotechnology, Computational Genomics, Center for Biotechnology, Technology Platform Genomics, Center for Biotechnology, Genome Informatics, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany and Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, 35392 Giessen, Germany
| | - Alexander Goesmann
- Institute of Medical Microbiology, Justus-Liebig-University, 35392 Giessen, Germany, Faculty of Biology, Institute for Bioinformatics, Center for Biotechnology, Computational Genomics, Center for Biotechnology, Technology Platform Genomics, Center for Biotechnology, Genome Informatics, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany and Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, 35392 Giessen, Germany
| |
Collapse
|
19
|
Ucciferri N, Rocchiccioli S. Proteomics techniques for the detection of translated pseudogenes. Methods Mol Biol 2014; 1167:187-95. [PMID: 24823778 DOI: 10.1007/978-1-4939-0835-6_12] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Increasing evidence indicates that pseudogenes can reach the translational process. Translated pseudogene products have in fact been found in various organisms, confuting the original definition of pseudogenes as genes without any coding potential. Proteomics is the main technology allowing the study of proteins and, when integrated with genomics, is defined as proteogenomics. In proteogenomics, the peptide-genome alignment drives the identification and annotation of gene products and allows for a better understanding of their function. In this chapter, we give a brief overview of the proteomic techniques applied to pseudogenes. In particular, we discuss peptide spectrum acquisition, mass data analysis, and genome database matching.
Collapse
Affiliation(s)
- Nadia Ucciferri
- CNR, Institute of Clinical Physiology, Via Moruzzi 1, 56124, Pisa, Italy
| | | |
Collapse
|
20
|
Carlier AL, Omasits U, Ahrens CH, Eberl L. Proteomics analysis of Psychotria leaf nodule symbiosis: improved genome annotation and metabolic predictions. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2013; 26:1325-1333. [PMID: 23902262 DOI: 10.1094/mpmi-05-13-0152-r] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Several plant species of the genus Psychotria (Rubiaceae) harbor Burkholderia sp. bacteria within specialized leaf nodules. The bacteria are transmitted vertically between plant generations and have not yet been cultured outside of their host. This symbiosis is considered to be obligatory because plants devoid of symbionts fail to develop into mature individuals. The genome of 'Candidatus Burkholderia kirkii' has been sequenced recently and has revealed evidence of reductive genome evolution, as shown by the proliferation of insertion sequences and the presence of numerous pseudogenes. We employed shotgun proteomics to investigate the expression of 'Ca. B. kirkii' proteins in the leaf nodule. Drawing from this dataset and refined comparative genomics analyses, we designed a new pseudogene prediction algorithm and improved the genome annotation. We also found conclusive evidence that nodule bacteria allocate vast resources to synthesis of secondary metabolites, possibly of the C7N aminocyclitol family. Expression of a putative 2-epi-5-valiolone synthase, a key enzyme of the C7N aminocyclitol synthesis, is high in the nodule population but downregulated in bacteria residing in the shoot apex, suggesting that production of secondary metabolites is particularly important in the leaf nodule.
Collapse
|
21
|
Zickmann F, Lindner MS, Renard BY. GIIRA--RNA-Seq driven gene finding incorporating ambiguous reads. ACTA ACUST UNITED AC 2013; 30:606-13. [PMID: 24123675 DOI: 10.1093/bioinformatics/btt577] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
MOTIVATION The reliable identification of genes is a major challenge in genome research, as further analysis depends on the correctness of this initial step. With high-throughput RNA-Seq data reflecting currently expressed genes, a particularly meaningful source of information has become commonly available for gene finding. However, practical application in automated gene identification is still not the standard case. A particular challenge in including RNA-Seq data is the difficult handling of ambiguously mapped reads. RESULTS We present GIIRA (Gene Identification Incorporating RNA-Seq data and Ambiguous reads), a novel prokaryotic and eukaryotic gene finder that is exclusively based on a RNA-Seq mapping and inherently includes ambiguously mapped reads. GIIRA extracts candidate regions supported by a sufficient number of mappings and reassigns ambiguous reads to their most likely origin using a maximum-flow approach. This avoids the exclusion of genes that are predominantly supported by ambiguous mappings. Evaluation on simulated and real data and comparison with existing methods incorporating RNA-Seq information highlight the accuracy of GIIRA in identifying the expressed genes. AVAILABILITY AND IMPLEMENTATION GIIRA is implemented in Java and is available from https://sourceforge.net/projects/giira/.
Collapse
Affiliation(s)
- Franziska Zickmann
- Research Group Bioinformatics (NG4), Robert Koch-Institute, Nordufer 20, 13353 Berlin, Germany
| | | | | |
Collapse
|
22
|
Armengaud J, Hartmann EM, Bland C. Proteogenomics for environmental microbiology. Proteomics 2013; 13:2731-42. [PMID: 23636904 DOI: 10.1002/pmic.201200576] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2012] [Revised: 03/06/2013] [Accepted: 04/09/2013] [Indexed: 11/09/2022]
Abstract
Proteogenomics sensu stricto refers to the use of proteomic data to refine the annotation of genomes from model organisms. Because of the limitations of automatic annotation pipelines, a relatively high number of errors occur during the structural annotation of genes coding for proteins. Whether putative orphan sequences or short genes encoding low-molecular-weight proteins really exist is still frequently a mystery. Whether start codons are well defined is also an open debate. These problems are exacerbated for genomes of microorganisms belonging to poorly documented genera, as related sequences are not always available for homology-guided annotation. The functional annotation of a significant proportion of genes is also another well-known issue when annotating environmental microorganisms. High-throughput shotgun proteomics has recently greatly evolved, allowing the exploration of the proteome from any microorganism at an unprecedented depth. The structural and functional annotation process may be usefully complemented with experimental data. Indeed, proteogenomic mapping has been successfully performed for a wide variety of organisms. Specific approaches devoted to systematically establishing the N-termini of a large set of proteins are being developed. N-terminomics is giving rise to datasets of experimentally proven translational start codons as well as validated peptide signals for secreted proteins. By extension, combining genomic and proteomic data is becoming routine in many research projects. The proteomic analysis of organisms with unfinished genome sequences, the so-called composite proteomics, and the search for microbial biomarkers by bottom-up and top-down combined approaches are some examples of proteogenomic-flavored studies. They illustrate the advent of a new era of environmental microbiology where proteomics and genomics are intimately integrated to answer key biological questions.
Collapse
Affiliation(s)
- Jean Armengaud
- CEA, DSV, IBEB, Lab Biochim System Perturb, Bagnols-sur-Cèze, France
| | | | | |
Collapse
|
23
|
Bertaccini D, Vaca S, Carapito C, Arsène-Ploetze F, Van Dorsselaer A, Schaeffer-Reiss C. An Improved Stable Isotope N-Terminal Labeling Approach with Light/Heavy TMPP To Automate Proteogenomics Data Validation: dN-TOP. J Proteome Res 2013; 12:3063-70. [DOI: 10.1021/pr4002993] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Diego Bertaccini
- Laboratoire de Spectrométrie
de Masse BioOrganique, IPHC, Université de Strasbourg, CNRS, UMR7178, Strasbourg, France
| | - Sebastian Vaca
- Laboratoire de Spectrométrie
de Masse BioOrganique, IPHC, Université de Strasbourg, CNRS, UMR7178, Strasbourg, France
| | - Christine Carapito
- Laboratoire de Spectrométrie
de Masse BioOrganique, IPHC, Université de Strasbourg, CNRS, UMR7178, Strasbourg, France
| | - Florence Arsène-Ploetze
- Laboratoire de Génétique
Moléculaire, Génomique et Microbiologie, Université de Strasbourg, CNRS UMR7156, Strasbourg,
France
| | - Alain Van Dorsselaer
- Laboratoire de Spectrométrie
de Masse BioOrganique, IPHC, Université de Strasbourg, CNRS, UMR7178, Strasbourg, France
| | - Christine Schaeffer-Reiss
- Laboratoire de Spectrométrie
de Masse BioOrganique, IPHC, Université de Strasbourg, CNRS, UMR7178, Strasbourg, France
| |
Collapse
|
24
|
The genome organization of Thermotoga maritima reflects its lifestyle. PLoS Genet 2013; 9:e1003485. [PMID: 23637642 PMCID: PMC3636130 DOI: 10.1371/journal.pgen.1003485] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2012] [Accepted: 03/13/2013] [Indexed: 01/01/2023] Open
Abstract
The generation of genome-scale data is becoming more routine, yet the subsequent analysis of omics data remains a significant challenge. Here, an approach that integrates multiple omics datasets with bioinformatics tools was developed that produces a detailed annotation of several microbial genomic features. This methodology was used to characterize the genome of Thermotoga maritima—a phylogenetically deep-branching, hyperthermophilic bacterium. Experimental data were generated for whole-genome resequencing, transcription start site (TSS) determination, transcriptome profiling, and proteome profiling. These datasets, analyzed in combination with bioinformatics tools, served as a basis for the improvement of gene annotation, the elucidation of transcription units (TUs), the identification of putative non-coding RNAs (ncRNAs), and the determination of promoters and ribosome binding sites. This revealed many distinctive properties of the T. maritima genome organization relative to other bacteria. This genome has a high number of genes per TU (3.3), a paucity of putative ncRNAs (12), and few TUs with multiple TSSs (3.7%). Quantitative analysis of promoters and ribosome binding sites showed increased sequence conservation relative to other bacteria. The 5′UTRs follow an atypical bimodal length distribution comprised of “Short” 5′UTRs (11–17 nt) and “Common” 5′UTRs (26–32 nt). Transcriptional regulation is limited by a lack of intergenic space for the majority of TUs. Lastly, a high fraction of annotated genes are expressed independent of growth state and a linear correlation of mRNA/protein is observed (Pearson r = 0.63, p<2.2×10−16 t-test). These distinctive properties are hypothesized to be a reflection of this organism's hyperthermophilic lifestyle and could yield novel insights into the evolutionary trajectory of microbial life on earth. Genomic studies have greatly benefited from the advent of high-throughput technologies and bioinformatics tools. Here, a methodology integrating genome-scale data and bioinformatics tools is developed to characterize the genome organization of the hyperthermophilic, phylogenetically deep-branching bacterium Thermotoga maritima. This approach elucidates several features of the genome organization and enables comparative analysis of these features across diverse taxa. Our results suggest that the genome of T. maritima is reflective of its hyperthermophilic lifestyle. Ultimately, constraints imposed on the genome have negative impacts on regulatory complexity and phenotypic diversity. Investigating the genome organization of Thermotogae species will help resolve various causal factors contributing to the genome organization such as phylogeny and environment. Applying a similar analysis of the genome organization to numerous taxa will likely provide insights into microbial evolution.
Collapse
|
25
|
Ansong C, Deatherage BL, Hyduke D, Schmidt B, McDermott JE, Jones MB, Chauhan S, Charusanti P, Kim YM, Nakayasu ES, Li J, Kidwai A, Niemann G, Brown RN, Metz TO, McAteer K, Heffron F, Peterson SN, Motin V, Palsson BO, Smith RD, Adkins JN. Studying Salmonellae and Yersiniae host-pathogen interactions using integrated 'omics and modeling. Curr Top Microbiol Immunol 2013; 363:21-41. [PMID: 22886542 DOI: 10.1007/82_2012_247] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Salmonella and Yersinia are two distantly related genera containing species with wide host-range specificity and pathogenic capacity. The metabolic complexity of these organisms facilitates robust lifestyles both outside of and within animal hosts. Using a pathogen-centric systems biology approach, we are combining a multi-omics (transcriptomics, proteomics, metabolomics) strategy to define properties of these pathogens under a variety of conditions including those that mimic the environments encountered during pathogenesis. These high-dimensional omics datasets are being integrated in selected ways to improve genome annotations, discover novel virulence-related factors, and model growth under infectious states. We will review the evolving technological approaches toward understanding complex microbial life through multi-omic measurements and integration, while highlighting some of our most recent successes in this area.
Collapse
Affiliation(s)
- Charles Ansong
- Biological Separations and Mass Spectroscopy Group, Pacific Northwest National Laboratory, PO Box 999, MSIN: K8-98, Richland, WA, 99352, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Yang R, Du Z, Han Y, Zhou L, Song Y, Zhou D, Cui Y. Omics strategies for revealing Yersinia pestis virulence. Front Cell Infect Microbiol 2012; 2:157. [PMID: 23248778 PMCID: PMC3521224 DOI: 10.3389/fcimb.2012.00157] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2012] [Accepted: 11/27/2012] [Indexed: 01/12/2023] Open
Abstract
Omics has remarkably changed the way we investigate and understand life. Omics differs from traditional hypothesis-driven research because it is a discovery-driven approach. Mass datasets produced from omics-based studies require experts from different fields to reveal the salient features behind these data. In this review, we summarize omics-driven studies to reveal the virulence features of Yersinia pestis through genomics, trascriptomics, proteomics, interactomics, etc. These studies serve as foundations for further hypothesis-driven research and help us gain insight into Y. pestis pathogenesis.
Collapse
Affiliation(s)
- Ruifu Yang
- Beijing Institute of Microbiology and Epidemiology Beijing, China.
| | | | | | | | | | | | | |
Collapse
|
27
|
Ansong C, Schrimpe-Rutledge AC, Mitchell HD, Chauhan S, Jones MB, Kim YM, McAteer K, Deatherage Kaiser BL, Dubois JL, Brewer HM, Frank BC, McDermott JE, Metz TO, Peterson SN, Smith RD, Motin VL, Adkins JN. A multi-omic systems approach to elucidating Yersinia virulence mechanisms. MOLECULAR BIOSYSTEMS 2012; 9:44-54. [PMID: 23147219 DOI: 10.1039/c2mb25287b] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The underlying mechanisms that lead to dramatic differences between closely related pathogens are not always readily apparent. For example, the genomes of Yersinia pestis (YP) the causative agent of plague with a high mortality rate and Yersinia pseudotuberculosis (YPT) an enteric pathogen with a modest mortality rate are highly similar with some species specific differences; however the molecular causes of their distinct clinical outcomes remain poorly understood. In this study, a temporal multi-omic analysis of YP and YPT at physiologically relevant temperatures was performed to gain insights into how an acute and highly lethal bacterial pathogen, YP, differs from its less virulent progenitor, YPT. This analysis revealed higher gene and protein expression levels of conserved major virulence factors in YP relative to YPT, including the Yop virulon and the pH6 antigen. This suggests that adaptation in the regulatory architecture, in addition to the presence of unique genetic material, may contribute to the increased pathogenecity of YP relative to YPT. Additionally, global transcriptome and proteome responses of YP and YPT revealed conserved post-transcriptional control of metabolism and the translational machinery including the modulation of glutamate levels in Yersiniae. Finally, the omics data was coupled with a computational network analysis, allowing an efficient prediction of novel Yersinia virulence factors based on gene and protein expression patterns.
Collapse
Affiliation(s)
- Charles Ansong
- Biological Sciences Division, Pacific Northwest National Laboratory, P. O. Box 999, Richland, WA 99352, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Peterson ES, McCue LA, Schrimpe-Rutledge AC, Jensen JL, Walker H, Kobold MA, Webb SR, Payne SH, Ansong C, Adkins JN, Cannon WR, Webb-Robertson BJM. VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data. BMC Genomics 2012; 13:131. [PMID: 22480257 PMCID: PMC3364912 DOI: 10.1186/1471-2164-13-131] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2011] [Accepted: 04/05/2012] [Indexed: 11/10/2022] Open
Abstract
Background The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. Results VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. Conclusions VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php.
Collapse
Affiliation(s)
- Elena S Peterson
- Scientific Data Management, Pacific Northwest National Laboratory, Richland, WA, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|