1
|
Patole MS, Sharma J, Pawar H. Comparative Proteogenomic Approaches for Mapping the Global Proteome of the Unsequenced Leishmania Vector Phlebotomus papatasi. Methods Mol Biol 2025; 2859:265-277. [PMID: 39436607 DOI: 10.1007/978-1-0716-4152-1_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Abstract
The rapid improvements in next-generation sequencing technologies have made it possible to quickly access in-depth genome sequence data. This has resulted in a flurry of genome sequences of various organisms being published and made publicly available in the last two decades. However, not all organisms have genome sequence data available. Various factors play a role, such as the importance of the organism, either medically or economically, and the genome complexity of the organisms. Phlebotomus papatasi is the sandfly vector for the Leishmania parasite, which is the causative agent for leishmaniasis. P. papatasi is a hematophagous vector, and the female flies feed on human blood to complete their reproductive cycle. The P. papatasi genome is currently being sequenced as part of a multicentric consortium, and the genome sequence is not published to date. Hence, efforts to map its global proteome are hindered in P. papatasi. In such cases, comparative proteogenomic approaches can help map the global proteome of an unsequenced organism using homology-based methods.
Collapse
Affiliation(s)
| | - Jyoti Sharma
- Manipal Academy of Higher Education, Manipal, Karnataka, India
- Institute of Bioinformatics, Bangalore, India
| | - Harsh Pawar
- Biomedical and Life Sciences Division, Lancaster University, Lancaster, UK
| |
Collapse
|
2
|
Munjal NS, Dey G, Parthasarathi KTS, Chauhan K, Pai K, Patole MS, Pawar H, Sharma J. A Proteogenomic Approach for the Identification of Virulence Factors in Leishmania Parasites. Methods Mol Biol 2025; 2859:279-296. [PMID: 39436608 DOI: 10.1007/978-1-0716-4152-1_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Abstract
Identifying new genes involved in virulence and drug resistance may hold the key to a better understanding of parasitic diseases. The proteogenomic profiling of various Leishmania species, the causative agents of leishmaniasis, has identified several novel genes, N- and C-terminal extensions of proteins, and corrections of existing gene models. Various virulence factors (VFs) responsible for leishmaniasis have been previously annotated through a proteogenomic approach, including the C-terminal extension of heat shock protein 70 (HSP70). Furthermore, the diversity of VFs across Leishmania donovani, L. infantum, L. major, and L. mexicana was determined using phylogenetic analysis. Moreover, protein-protein interaction networks (PPINs) of VFs with HSPs aid in making significant biological interpretations. Overall, an integrated omics approach involving proteogenomics was used to identify and study the relationship among VFs with other interacting proteins, including HSPs. This chapter provides a step-by-step guide to the identification of new genes in Leishmania using a proteogenomic approach and their functional assignment using a bioinformatics-based approach.
Collapse
Affiliation(s)
| | - Gourav Dey
- Institute of Bioinformatics, Bangalore, India
| | - K T Shreya Parthasarathi
- Institute of Bioinformatics, Bangalore, India
- Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - Kshipra Chauhan
- School of Applied Sciences and Technology, Gujarat Technological University, Ahmedabad, India
| | - Kalpana Pai
- Department of Zoology, Savitribai Phule Pune University, Pune, India
| | | | - Harsh Pawar
- Biomedical and Life Sciences Division, Lancaster University, Lancaster, UK
| | - Jyoti Sharma
- Institute of Bioinformatics, Bangalore, India.
- Manipal Academy of Higher Education, Manipal, Karnataka, India.
| |
Collapse
|
3
|
Zouré AA, Serteyn L, Somda Z, Badolo A, Francis F. Proteomic Investigation on Anopheles gambiae in Burkina Faso Related to Insecticide Pressures from Different Climatic Regions. Proteomics 2020; 20:e1900400. [PMID: 32108434 DOI: 10.1002/pmic.201900400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 02/21/2020] [Indexed: 11/09/2022]
Abstract
In Sub-Saharan Africa, An. gambiae sensu lato (s.l.) Giles 190, largely contributes to malaria transmission. Therefore, the authors carry out a proteomic analysis to compare its metabolic state, depending on different pesticide pressures by selecting areas with/without cotton crops. The proteomes data are available via ProteomeXchange with identifier PXD016300. From a total of 1.182 identified proteins, 648 are retained for further statistical analysis and are attributed to biological functions, the most important of which being energy metabolism (120 proteins) followed by translation-biogenesis (74), cytoskeleton (71), stress response (62), biosynthetic process (60), signalling (44), cellular respiration (38), cell redox homeostasis (25), DNA processing (17), pheromone binding (10), protein folding (9), RNA processing (9), other proteins (26) and unknown functions (83). In the Sudano-Sahelian region, 421 (91.3%) proteins are found in samples from areas both with and without cotton crops. By contrast, in the Sahelian region, only 271 (55.0%) are common to both crop areas, and 233 proteins are up-regulated from the cotton area. The focus is placed on proteins with putative roles in insecticide resistance, according to literature. This study provides the first whole-body proteomic characterisation of An. gambiae s.l. in Burkina Faso, as a framework to strengthen vector control strategies.
Collapse
Affiliation(s)
- Abdou Azaque Zouré
- Institute of Health Sciences Research, (IRSS/CNRST)/Department of Biomedical and Public Health, Ouagadougou, 03 BP 7192, Burkina Faso.,Functional and Evolutionary Entomology, TERRA, Gembloux Agro-Bio Tech, University of Liège, Passage des Déportés 2, Gembloux, 5030, Belgium
| | - Laurent Serteyn
- Functional and Evolutionary Entomology, TERRA, Gembloux Agro-Bio Tech, University of Liège, Passage des Déportés 2, Gembloux, 5030, Belgium
| | - Zéphirin Somda
- Laboratoire d'Entomologie Fondamentale et Appliquée, UFR/SVT, Université Joseph Ki-Zerbo, BP 7021, Ouagadougou, 03, Burkina Faso
| | - Athanase Badolo
- Laboratoire d'Entomologie Fondamentale et Appliquée, UFR/SVT, Université Joseph Ki-Zerbo, BP 7021, Ouagadougou, 03, Burkina Faso
| | - Frédéric Francis
- Functional and Evolutionary Entomology, TERRA, Gembloux Agro-Bio Tech, University of Liège, Passage des Déportés 2, Gembloux, 5030, Belgium
| |
Collapse
|
4
|
Guillot L, Delage L, Viari A, Vandenbrouck Y, Com E, Ritter A, Lavigne R, Marie D, Peterlongo P, Potin P, Pineau C. Peptimapper: proteogenomics workflow for the expert annotation of eukaryotic genomes. BMC Genomics 2019; 20:56. [PMID: 30654742 PMCID: PMC6337836 DOI: 10.1186/s12864-019-5431-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 01/03/2019] [Indexed: 01/02/2023] Open
Abstract
Background Accurate structural annotation of genomes is still a challenge, despite the progress made over the past decade. The prediction of gene structure remains difficult, especially for eukaryotic species, and is often erroneous and incomplete. We used a proteogenomics strategy, taking advantage of the combination of proteomics datasets and bioinformatics tools, to identify novel protein coding-genes and splice isoforms, assign correct start sites, and validate predicted exons and genes. Results Our proteogenomics workflow, Peptimapper, was applied to the genome annotation of Ectocarpus sp., a key reference genome for both the brown algal lineage and stramenopiles. We generated proteomics data from various life cycle stages of Ectocarpus sp. strains and sub-cellular fractions using a shotgun approach. First, we directly generated peptide sequence tags (PSTs) from the proteomics data. Second, we mapped PSTs onto the translated genomic sequence. Closely located hits (i.e., PSTs locations on the genome) were then clustered to detect potential coding regions based on parameters optimized for the organism. Third, we evaluated each cluster and compared it to gene predictions from existing conventional genome annotation approaches. Finally, we integrated cluster locations into GFF files to use a genome viewer. We identified two potential novel genes, a ribosomal protein L22 and an aryl sulfotransferase and corrected the gene structure of a dihydrolipoamide acetyltransferase. We experimentally validated the results by RT-PCR and using transcriptomics data. Conclusions Peptimapper is a complementary tool for the expert annotation of genomes. It is suitable for any organism and is distributed through a Docker image available on two public bioinformatics docker repositories: Docker Hub and BioShaDock. This workflow is also accessible through the Galaxy framework and for use by non-computer scientists at https://galaxy.protim.eu. Data are available via ProteomeXchange under identifier PXD010618. Electronic supplementary material The online version of this article (10.1186/s12864-019-5431-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Laetitia Guillot
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France.,Protim, Univ Rennes, F-35042, Rennes cedex, France
| | - Ludovic Delage
- Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France
| | - Alain Viari
- INRIA Grenoble-Rhône-Alpes, F-38330, Montbonnot-Saint-Martin, France
| | - Yves Vandenbrouck
- University Grenoble Alpes, CEA, Inserm, BIG-BGE, 38000, Grenoble, France
| | - Emmanuelle Com
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France.,Protim, Univ Rennes, F-35042, Rennes cedex, France
| | - Andrés Ritter
- Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France.,Present address: Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, F-75005, Paris, France
| | - Régis Lavigne
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France.,Protim, Univ Rennes, F-35042, Rennes cedex, France
| | - Dominique Marie
- Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France
| | | | - Philippe Potin
- Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France
| | - Charles Pineau
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France. .,Protim, Univ Rennes, F-35042, Rennes cedex, France.
| |
Collapse
|
5
|
Hugo RLE, Birrell GW. Proteomics of Anopheles Vectors of Malaria. Trends Parasitol 2018; 34:961-981. [DOI: 10.1016/j.pt.2018.08.009] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Revised: 08/08/2018] [Accepted: 08/10/2018] [Indexed: 12/12/2022]
|
6
|
Wu XJ, Dinguirard N, Sabat G, Lui HD, Gonzalez L, Gehring M, Bickham-Wright U, Yoshino TP. Proteomic analysis of Biomphalaria glabrata plasma proteins with binding affinity to those expressed by early developing larval Schistosoma mansoni. PLoS Pathog 2017; 13:e1006081. [PMID: 28520808 PMCID: PMC5433772 DOI: 10.1371/journal.ppat.1006081] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Accepted: 11/20/2016] [Indexed: 11/19/2022] Open
Abstract
Interactions between early developing Schistosoma mansoni larval stages and the hemolymph of its snail intermediate host represent the first molecular encounter with the snail’s immune system. To gain a more comprehensive understanding of this early parasite-host interaction, biotinylated sporocyst tegumental membrane (Mem) proteins and larval transformation proteins (LTP) were affixed to streptavidin-agarose beads and used as affinity matrices to enrich for larval-reactive plasma proteins from susceptible (NMRI) and resistant (BS-90) strains of the snail Biomphalaria glabrata. Nano-LC/MS-MS proteomic analyses of isolated plasma proteins revealed a diverse array of 94 immune-and nonimmune-related plasma proteins. Included among the immune-related subset were pattern recognition receptors (lectins, LPS-binding protein, thioester-containing proteins-TEPs), stress proteins (HSP60 and 70), adhesion proteins (dermatopontins), metalloproteases (A Disintegrin And Metalloproteinase (ADAM), ADAM-related Zn proteinases), cytotoxins (biomphalysin) and a Ca2+-binding protein (neo-calmodulin). Variable immunoglobulin and lectin domain (VIgL) gene family members, including fibrinogen-related proteins (FREPs), galectin-related proteins (GREPs) and C-type lectin-related proteins (CREPs), were the most prevalent of larval-reactive immune lectins present in plasma. FREPs were highly represented, although only a subset of FREP subfamilies (FREP 2, 3 and 12) were identified, suggesting potential selectivity in the repertoire of plasma lectins recognizing larval glycoconjugates. Other larval-binding FREP-like and CREP-like proteins possessing a C-terminal fibrinogen-related domain (FReD) or C-type lectin binding domain, respectively, and an Ig-fold domain also were identified as predicted proteins from the B. glabrata genome, although incomplete sequence data precluded their placement into specific FREP/CREP subfamilies. Similarly, a group of FReD-containing proteins (angiopoeitin-4, ficolin-2) that lacked N-terminal Ig-fold(s) were identified as a distinct group of FREP-like proteins, separate from the VIgL lectin family. Finally, differential appearance of GREPs in BS-90 plasma eluates, and others proteins exclusively found in eluates of the NMRI strain, suggested snail strain differences in the expression of select larval-reactive immune proteins. This hypothesis was supported by the finding that differential gene expression of the GREP in BS-90 and ADAM in NMRI snail strains generally correlated with their patterns of protein expression. In summary, this study is the first to provide a global comparative proteomic analysis of constitutively expressed plasma proteins from susceptible and resistant B. glabrata strains capable of binding early-expressed larval S. mansoni proteins. Identified proteins, especially those exhibiting differential expression, may play a role in determining immune compatibility in this snail host-parasite system. A complete listing of raw peptide data are available via ProteomeXchange using identifier PXD004942. Transmission of the human blood fluke Schistosoma mansoni critically depends on the successful establishment of infections within species of its snail intermediate host, Biomphalaria. One of the most important barriers to infection is the host’s innate immune system, comprised of plasma proteins and immunocytes (hemocytes) circulating in the hemolymph. Although expression of plasma lectin genes appears to be associated with larval resistance in B. glabrata, few studies have attempted an in depth analysis of gene-encoded lectins, and other immune proteins, that are capable of directly binding schistosome larvae. Using affinity matrices linked to schistosome proteins expressed during early larval development, we identified and compared the parasite-reactive plasma proteins from the susceptible NMRI and resistant BS-90 strains of B. glabrata. Proteomic analyses of isolated plasma proteins revealed a diversity immune-related proteins including lectins, pathogen recognition receptors, cytotoxins, adhesion proteins, metalloproteinases, and Ca2+-binding proteins. Of the lectins, the variable immunoglobulin and lectin domain (VIgL) gene family of proteins comprised of fibrinogen-related proteins (FREPs), galectin-related proteins (GREPs) and C-type lectin-related proteins (CREPs), were highly represented, and consistent with their role in host immunity. Two proteins (GREP and a Zn-metalloproteinase) exhibited snail strain-associated protein and gene expression patterns suggesting their involvement in innate immune responses to larval infection. This comparative proteomic analysis of larval S. mansoni-reactive plasma proteins from susceptible and resistant B. glabrata strains represents the first of its kind and provides valuable insights into possible pathogen recognition receptors and other immune factors regulating parasite-host compatibility in this model system.
Collapse
Affiliation(s)
- Xiao-Jun Wu
- Department of Pathobiological Sciences, University of Wisconsin, Madison, WI, United States of America
| | - Nathalie Dinguirard
- Department of Pathobiological Sciences, University of Wisconsin, Madison, WI, United States of America
| | - Grzegorz Sabat
- Biotechnology Center, Mass Spectrometry/Proteomics Facility, University of Wisconsin, Madison, WI, United States of America
| | - Hong-di Lui
- Department of Pathobiological Sciences, University of Wisconsin, Madison, WI, United States of America
| | - Laura Gonzalez
- Department of Pathobiological Sciences, University of Wisconsin, Madison, WI, United States of America
| | - Michael Gehring
- Department of Pathobiological Sciences, University of Wisconsin, Madison, WI, United States of America
| | - Utibe Bickham-Wright
- Department of Pathobiological Sciences, University of Wisconsin, Madison, WI, United States of America
| | - Timothy P. Yoshino
- Department of Pathobiological Sciences, University of Wisconsin, Madison, WI, United States of America
- * E-mail:
| |
Collapse
|
7
|
Ruggles KV, Krug K, Wang X, Clauser KR, Wang J, Payne SH, Fenyö D, Zhang B, Mani DR. Methods, Tools and Current Perspectives in Proteogenomics. Mol Cell Proteomics 2017; 16:959-981. [PMID: 28456751 DOI: 10.1074/mcp.mr117.000024] [Citation(s) in RCA: 95] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Indexed: 12/20/2022] Open
Abstract
With combined technological advancements in high-throughput next-generation sequencing and deep mass spectrometry-based proteomics, proteogenomics, i.e. the integrative analysis of proteomic and genomic data, has emerged as a new research field. Early efforts in the field were focused on improving protein identification using sample-specific genomic and transcriptomic sequencing data. More recently, integrative analysis of quantitative measurements from genomic and proteomic studies have identified novel insights into gene expression regulation, cell signaling, and disease. Many methods and tools have been developed or adapted to enable an array of integrative proteogenomic approaches and in this article, we systematically classify published methods and tools into four major categories, (1) Sequence-centric proteogenomics; (2) Analysis of proteogenomic relationships; (3) Integrative modeling of proteogenomic data; and (4) Data sharing and visualization. We provide a comprehensive review of methods and available tools in each category and highlight their typical applications.
Collapse
Affiliation(s)
- Kelly V Ruggles
- From the ‡Department of Medicine, New York University School of Medicine, New York, New York 10016
| | - Karsten Krug
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | - Xiaojing Wang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030.,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - Karl R Clauser
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | - Jing Wang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030.,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - Samuel H Payne
- **Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354
| | - David Fenyö
- ‡‡Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, New York 10016; .,§§Institute for Systems Genetics, New York University School of Medicine, New York, New York 10016
| | - Bing Zhang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030; .,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - D R Mani
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142;
| |
Collapse
|
8
|
Jakharia A, Borkakoty B, Singh S. Expression of SPARC like protein 1 (SPARCL1), extracellular matrix-associated protein is down regulated in gastric adenocarcinoma. J Gastrointest Oncol 2016; 7:278-83. [PMID: 27034797 DOI: 10.3978/j.issn.2078-6891.2015.064] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND SPARC-like protein 1 (SPARCL1/Hevin), a member of the SPARC family is defined by the presence of a highly acidic domain-I, a follistatin-like domain, and an extracellular calcium (EC) binding domain. SPARCL1 has been shown to be down-regulated in many types of cancer and may serve as a negative regulator of cell growth and proliferation. METHODS Both tumor and adjacent normal tissue were collected from patients with gastric adenocarcinoma. Monoclonal antibody developed against recombinant SPARCL1 was used to analyze the expression of SPARCL1 by immunohisto chemical and western blotting (WB) analysis. RESULTS The expression of SPARCL1 was found to be significantly lower or negligible in gastric adenocarcinoma tissues in nearly all of the cases in comparison with adjacent normal tissue. This comparison was found to be independent of the patient's age, sex, and stage of cancer. CONCLUSIONS We postulate that down regulation of SPARCL1 may be related to inactivation of its tumor suppressor functions and might play an important role in the development of gastric adenocarcinoma.
Collapse
Affiliation(s)
- Aniruddha Jakharia
- 1 Imgenex India Pvt. Ltd., Bhubaneswar, India ; 2 Regional Medical Research Centre for NE Region (Indian Council of Medical Research), Assam, India
| | - Biswajyoti Borkakoty
- 1 Imgenex India Pvt. Ltd., Bhubaneswar, India ; 2 Regional Medical Research Centre for NE Region (Indian Council of Medical Research), Assam, India
| | - Sujay Singh
- 1 Imgenex India Pvt. Ltd., Bhubaneswar, India ; 2 Regional Medical Research Centre for NE Region (Indian Council of Medical Research), Assam, India
| |
Collapse
|
9
|
Next Generation Sequencing Data and Proteogenomics. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 926:11-19. [DOI: 10.1007/978-3-319-42316-6_2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
10
|
Pawar H, Chavan S, Mahale K, Khobragade S, Kulkarni A, Patil A, Chaphekar D, Varriar P, Sudeep A, Pai K, Prasad T, Gowda H, Patole MS. A proteomic map of the unsequenced kala-azar vector Phlebotomus papatasi using cell line. Acta Trop 2015; 152:80-89. [PMID: 26307495 DOI: 10.1016/j.actatropica.2015.08.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2015] [Revised: 07/16/2015] [Accepted: 08/18/2015] [Indexed: 11/25/2022]
Abstract
The debilitating disease kala-azar or visceral leishmaniasis is caused by the kinetoplastid protozoan parasite Leishmania donovani. The parasite is transmitted by the hematophagous sand fly vector of the genus Phlebotomus in the old world and Lutzomyia in the new world. The predominant Phlebotomine species associated with the transmission of kala-azar are Phlebotomus papatasi and Phlebotomus argentipes. Understanding the molecular interaction of the sand fly and Leishmania, during the development of parasite within the sand fly gut is crucial to the understanding of the parasite life cycle. The complete genome sequences of sand flies (Phlebotomus and Lutzomyia) are currently not available and this hinders identification of proteins in the sand fly vector. The current study utilizes a three frame translated transcriptomic data of P. papatasi in the absence of genomic sequences to analyze the mass spectrometry data of P. papatasi cell line using a proteogenomic approach. Additionally, we have carried out the proteogenomic analysis of P. papatasi by comparative homology-based searches using related sequenced dipteran protein data. This study resulted in the identification of 1313 proteins from P. papatasi based on homology. Our study demonstrates the power of proteogenomic approaches in mapping the proteomes of unsequenced organisms.
Collapse
|
11
|
Li HD, Menon R, Omenn GS, Guan Y. Revisiting the identification of canonical splice isoforms through integration of functional genomics and proteomics evidence. Proteomics 2014; 14:2709-18. [PMID: 25265570 DOI: 10.1002/pmic.201400170] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2014] [Revised: 08/11/2014] [Accepted: 09/23/2014] [Indexed: 01/08/2023]
Abstract
Canonical isoforms in different databases have been defined as the most prevalent, most conserved, most expressed, longest, or the one with the clearest description of domains or posttranslational modifications. In this article, we revisit these definitions of canonical isoforms based on functional genomics and proteomics evidence, focusing on mouse data. We report a novel functional relationship network-based approach for identifying the highest connected isoforms (HCIs). We show that 46% of these HCIs are not the longest transcripts. In addition, this approach revealed many genes that have more than one highly connected isoforms. Averaged across 175 RNA-seq datasets covering diverse tissues and conditions, 65% of the HCIs show higher expression levels than nonhighest connected isoforms at the transcript level. At the protein level, these HCIs highly overlap with the expressed splice variants, based on proteomic data from eight different normal tissues. These results suggest that a more confident definition of canonical isoforms can be made through integration of multiple lines of evidence, including HCIs defined by biological processes and pathways, expression prevalence at the transcript level, and relative or absolute abundance at the protein level. This integrative proteogenomics approach can successfully identify principal isoforms that are responsible for the canonical functions of genes.
Collapse
Affiliation(s)
- Hong-Dong Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | | | | | | |
Collapse
|
12
|
Pawar H, Renuse S, Khobragade SN, Chavan S, Sathe G, Kumar P, Mahale KN, Gore K, Kulkarni A, Dixit T, Raju R, Prasad TSK, Harsha HC, Patole MS, Pandey A. Neglected Tropical Diseases and Omics Science: Proteogenomics Analysis of the Promastigote Stage ofLeishmania majorParasite. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2014; 18:499-512. [DOI: 10.1089/omi.2013.0159] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Harsh Pawar
- Institute of Bioinformatics, International Technology Park, Bangalore, India
- Rajiv Gandhi University of Health Sciences, Bangalore, India
| | - Santosh Renuse
- Institute of Bioinformatics, International Technology Park, Bangalore, India
- Department of Biotechnology, Amrita Vishwa Vidyapeetham, Kollam, India
| | | | - Sandip Chavan
- Institute of Bioinformatics, International Technology Park, Bangalore, India
- Manipal University, Madhav Nagar, Manipal, India
| | - Gajanan Sathe
- Institute of Bioinformatics, International Technology Park, Bangalore, India
- Manipal University, Madhav Nagar, Manipal, India
| | - Praveen Kumar
- Institute of Bioinformatics, International Technology Park, Bangalore, India
| | | | | | | | - Tanwi Dixit
- National Centre for Cell Sciences, Pune, India
| | - Rajesh Raju
- Institute of Bioinformatics, International Technology Park, Bangalore, India
| | | | - H. C. Harsha
- Institute of Bioinformatics, International Technology Park, Bangalore, India
| | | | - Akhilesh Pandey
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Department of Biological Chemistry, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland
| |
Collapse
|
13
|
Dwivedi SB, Muthusamy B, Kumar P, Kim MS, Nirujogi RS, Getnet D, Ahiakonu P, De G, Nair B, Gowda H, Prasad TSK, Kumar N, Pandey A, Okulate M. Brain proteomics of Anopheles gambiae. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2014; 18:421-37. [PMID: 24937107 DOI: 10.1089/omi.2014.0007] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Anopheles gambiae has a well-adapted system for host localization, feeding, and mating behavior, which are all governed by neuronal processes in the brain. However, there are no published reports characterizing the brain proteome to elucidate neuronal signaling mechanisms in the vector. To this end, a large-scale mapping of the brain proteome of An. gambiae was carried out using high resolution tandem mass spectrometry, revealing a repertoire of >1800 proteins, of which 15% could not be assigned any function. A large proportion of the identified proteins were predicted to be involved in diverse biological processes including metabolism, transport, protein synthesis, and olfaction. This study also led to the identification of 10 GPCR classes of proteins, which could govern sensory pathways in mosquitoes. Proteins involved in metabolic and neural processes, chromatin modeling, and synaptic vesicle transport associated with neuronal transmission were predominantly expressed in the brain. Proteogenomic analysis expanded our findings with the identification of 15 novel genes and 71 cases of gene refinements, a subset of which were validated by RT-PCR and sequencing. Overall, our study offers valuable insights into the brain physiology of the vector that could possibly open avenues for intervention strategies for malaria in the future.
Collapse
Affiliation(s)
- Sutopa B Dwivedi
- 1 Institute of Bioinformatics , International Technology Park, Bangalore, Karnataka, India
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Nirujogi RS, Pawar H, Renuse S, Kumar P, Chavan S, Sathe G, Sharma J, Khobragade S, Pande J, Modak B, Prasad TSK, Harsha HC, Patole MS, Pandey A. Moving from unsequenced to sequenced genome: reanalysis of the proteome of Leishmania donovani. J Proteomics 2014; 97:48-61. [PMID: 23665000 PMCID: PMC4710096 DOI: 10.1016/j.jprot.2013.04.021] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2012] [Revised: 04/02/2013] [Accepted: 04/11/2013] [Indexed: 10/26/2022]
Abstract
The kinetoplastid protozoan parasite, Leishmania donovani, is the causative agent of kala azar or visceral leishmaniasis. Kala azar is a severe form of leishmaniasis that is fatal in the majority of untreated cases. Studies on proteomic analysis of L. donovani thus far have been carried out using homology-based identification based on related Leishmania species (L. infantum, L. major and L. braziliensis) whose genomes have been sequenced. Recently, the genome of L. donovani was fully sequenced and the data became publicly available. We took advantage of the availability of its genomic sequence to carry out a more accurate proteogenomic analysis of L. donovani proteome using our previously generated dataset. This resulted in identification of 17,504 unique peptides upon database-dependent search against the annotated proteins in L. donovani. These peptides were assigned to 3999 unique proteins in L. donovani. 2296 proteins were identified in both the life stages of L. donovani, while 613 and 1090 proteins were identified only from amastigote and promastigote stages, respectively. The proteomic data was also searched against six-frame translated L. donovani genome, which led to 255 genome search-specific peptides (GSSPs) resulting in identification of 20 novel genes and correction of 40 existing gene models in L. donovani. BIOLOGICAL SIGNIFICANCE Leishmania donovani genome sequencing was recently completed, which permitted us to use a proteogenomic approach to map its proteome and to carry out annotation of it genome. This resulted in mapping of 50% (3999 proteins) of L. donovani proteome. Our study identified 20 novel genes previously not predicted from the L. donovani genome in addition to correcting annotations of 40 existing gene models. The identified proteins may help in better understanding of stage-specific protein expression profiles in L. donovani and to identify novel stage-specific drug targets in L. donovani which could be used in the treatment of leishmaniasis. This article is part of a Special Issue entitled: Trends in Microbial Proteomics.
Collapse
Affiliation(s)
- Raja Sekhar Nirujogi
- Institute of Bioinformatics, International Technology Park, Bangalore 560066, India; Bioinformatics Centre, School of Life Sciences, Pondicherry University, Puducherry 605014, India
| | - Harsh Pawar
- Institute of Bioinformatics, International Technology Park, Bangalore 560066, India; Rajiv Gandhi University of Health Sciences, Bangalore 560041, India
| | - Santosh Renuse
- Institute of Bioinformatics, International Technology Park, Bangalore 560066, India; Department of Biotechnology, Amrita Vishwa Vidyapeetham, Kollam 690525, India
| | - Praveen Kumar
- Institute of Bioinformatics, International Technology Park, Bangalore 560066, India
| | - Sandip Chavan
- Institute of Bioinformatics, International Technology Park, Bangalore 560066, India; Manipal University, Madhav Nagar, Manipal 576104, India
| | - Gajanan Sathe
- Institute of Bioinformatics, International Technology Park, Bangalore 560066, India; Manipal University, Madhav Nagar, Manipal 576104, India
| | - Jyoti Sharma
- Institute of Bioinformatics, International Technology Park, Bangalore 560066, India; Manipal University, Madhav Nagar, Manipal 576104, India
| | | | | | - Bhakti Modak
- National Centre for Cell Sciences, Pune 411007, India
| | - T S Keshava Prasad
- Institute of Bioinformatics, International Technology Park, Bangalore 560066, India; Bioinformatics Centre, School of Life Sciences, Pondicherry University, Puducherry 605014, India; Manipal University, Madhav Nagar, Manipal 576104, India
| | - H C Harsha
- Institute of Bioinformatics, International Technology Park, Bangalore 560066, India
| | | | - Akhilesh Pandey
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore 21205, MD, USA; Department of Biological Chemistry, Johns Hopkins University School of Medicine, Baltimore 21205, MD, USA; Department of Oncology, Johns Hopkins University School of Medicine, Baltimore 21205, MD, USA; Department of Pathology, Johns Hopkins University School of Medicine, Baltimore 21205, MD, USA.
| |
Collapse
|
15
|
Wu L, Han DK. Overcoming the dynamic range problem in mass spectrometry-based shotgun proteomics. Expert Rev Proteomics 2014; 3:611-9. [PMID: 17181475 DOI: 10.1586/14789450.3.6.611] [Citation(s) in RCA: 74] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Protein profiling using mass spectrometry technology has emerged as a powerful method for analyzing large-scale protein-expression patterns in cells and tissues. However, a number of challenges are present in proteomics research, one of the greatest being the high degree of protein complexity and huge dynamic range of proteins expressed in the complex biological mixtures, which exceeds six orders of magnitude in cells and ten orders of magnitude in body fluids. Since many important signaling proteins have low expression levels, methods to detect the low-abundance proteins in a complex sample are required. This review will focus on the fundamental fractionation and mass spectrometry techniques currently used for large-scale shotgun proteomics research.
Collapse
Affiliation(s)
- Linfeng Wu
- University of Connecticut, School of Medicine, Department of Cell Biology, Farmington, Connecticut, CT 06030, USA.
| | | |
Collapse
|
16
|
Krug K, Carpy A, Behrends G, Matic K, Soares NC, Macek B. Deep coverage of the Escherichia coli proteome enables the assessment of false discovery rates in simple proteogenomic experiments. Mol Cell Proteomics 2013; 12:3420-30. [PMID: 23908556 DOI: 10.1074/mcp.m113.029165] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Recent advances in mass spectrometry (MS) have led to increased applications of shotgun proteomics to the refinement of genome annotation. The typical "proteo-genomic" workflows rely on the mapping of peptide MS/MS spectra onto databases derived via six-frame translation of the genome sequence. These databases contain a large proportion of spurious protein sequences which make the statistical confidence of the resulting peptide spectrum matches difficult to assess. Here we performed a comprehensive analysis of the Escherichia coli proteome using LTQ-Orbitrap MS and mapped the corresponding MS/MS spectra onto a six-frame translation of the E. coli genome. We hypothesized that the protein-coding part of the E. coli genome approaches complete annotation and that the majority of six frame-specific (novel) peptide spectrum matches can be considered as false positive identifications. We confirm our hypothesis by showing that the posterior error probability distribution of novel hits is almost identical to that of reversed (decoy) hits; this enables us to estimate the sensitivity, specificity, accuracy, and false discovery rate in a typical bacterial proteo-genomic dataset. We use two complementary computational frameworks for processing and statistical assessment of MS/MS data: MaxQuant and Trans-Proteomic Pipeline. We show that MaxQuant achieves a more sensitive six-frame database search with an acceptable false discovery rate and is therefore well suited for global genome reannotation applications, whereas the Trans-Proteomic Pipeline achieves higher specificity and is well suited for high-confidence validation. The use of a small and well-annotated bacterial genome enables us to address genome coverage achieved in state-of-the-art bacterial proteomics: identified peptide sequences mapped to all expressed E. coli proteins but covered 31.7% of the protein-coding genome sequence. Our results show that false discovery rates can be substantially underestimated even in "simple" proteo-genomic experiments obtained by means of high-accuracy MS and point to the necessity of further improvements concerning the coverage of peptide sequences by MS-based methods.
Collapse
Affiliation(s)
- Karsten Krug
- Proteome Center Tuebingen, University of Tuebingen, 72076 Tuebingen, Germany
| | | | | | | | | | | |
Collapse
|
17
|
Costa EP, Menschaert G, Luyten W, De Grave K, Ramon J. PIUS: peptide identification by unbiased search. Bioinformatics 2013; 29:1913-4. [DOI: 10.1093/bioinformatics/btt298] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
18
|
Kuhring M, Renard BY. iPiG: integrating peptide spectrum matches into genome browser visualizations. PLoS One 2012; 7:e50246. [PMID: 23226516 PMCID: PMC3514238 DOI: 10.1371/journal.pone.0050246] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 10/22/2012] [Indexed: 11/18/2022] Open
Abstract
Proteogenomic approaches have gained increasing popularity, however it is still difficult to integrate mass spectrometry identifications with genomic data due to differing data formats. To address this difficulty, we introduce iPiG as a tool for the integration of peptide identifications from mass spectrometry experiments into existing genome browser visualizations. Thereby, the concurrent analysis of proteomic and genomic data is simplified and proteomic results can directly be compared to genomic data. iPiG is freely available from https://sourceforge.net/projects/ipig/. It is implemented in Java and can be run as a stand-alone tool with a graphical user-interface or integrated into existing workflows. Supplementary data are available at PLOS ONE online.
Collapse
Affiliation(s)
- Mathias Kuhring
- Research Group Bioinformatics (NG4), Robert Koch-Institute, Berlin, Germany
| | - Bernhard Y. Renard
- Research Group Bioinformatics (NG4), Robert Koch-Institute, Berlin, Germany
- * E-mail:
| |
Collapse
|
19
|
Blakeley P, Overton IM, Hubbard SJ. Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies. J Proteome Res 2012; 11:5221-34. [PMID: 23025403 PMCID: PMC3703792 DOI: 10.1021/pr300411q] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five "incorrect" targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes, while still controlling for false positives.
Collapse
Affiliation(s)
- Paul Blakeley
- Faculty of Life Sciences, The University of Manchester, Manchester M13 9PT, UK
| | | | | |
Collapse
|
20
|
Bocchinfuso DG, Taylor P, Ross E, Ignatchenko A, Ignatchenko V, Kislinger T, Pearson BJ, Moran MF. Proteomic profiling of the planarian Schmidtea mediterranea and its mucous reveals similarities with human secretions and those predicted for parasitic flatworms. Mol Cell Proteomics 2012; 11:681-91. [PMID: 22653920 PMCID: PMC3434776 DOI: 10.1074/mcp.m112.019026] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2012] [Revised: 05/17/2012] [Indexed: 11/06/2022] Open
Abstract
The freshwater planarian Schmidtea mediterranea has been used in research for over 100 years, and is an emerging stem cell model because of its capability of regenerating large portions of missing body parts. Exteriorly, planarians are covered in mucous secretions of unknown composition, implicated in locomotion, predation, innate immunity, and substrate adhesion. Although the planarian genome has been sequenced, it remains mostly unannotated, challenging both genomic and proteomic analyses. The goal of the current study was to annotate the proteome of the whole planarian and its mucous fraction. The S. mediterranea proteome was analyzed via mass spectrometry by using multidimensional protein identification technology with whole-worm tryptic digests. By using a proteogenomics approach, MS data were searched against an in silico translated planarian transcript database, and by using the Swiss-Prot BLAST algorithm to identify proteins similar to planarian queries. A total of 1604 proteins were identified. The mucous subproteome was defined through analysis of a mucous trail fraction and an extract obtained by treating whole worms with the mucolytic agent N-acetylcysteine. Gene Ontology analysis confirmed that the mucous fractions were enriched with secreted proteins. The S. mediterranea proteome is highly similar to that predicted for the trematode Schistosoma mansoni associated with intestinal schistosomiasis, with the mucous subproteome particularly highly conserved. Remarkably, orthologs of 119 planarian mucous proteins are present in human mucosal secretions and tear fluid. We suggest planarians have potential to be a model system for the characterization of mucous protein function and relevant to parasitic flatworm infections and diseases underlined by mucous aberrancies, such as cystic fibrosis, asthma, and other lung diseases.
Collapse
Affiliation(s)
- Donald G. Bocchinfuso
- From the ‡Molecular Structure and Function Program, The Hospital for Sick Children, Toronto, Canada
- §Department of Molecular Genetics, University of Toronto, Toronto, Canada
| | - Paul Taylor
- From the ‡Molecular Structure and Function Program, The Hospital for Sick Children, Toronto, Canada
| | - Eric Ross
- ¶Stowers Institute for Medical Research, Kansas City, Missouri
| | | | | | - Thomas Kislinger
- From the ‡Molecular Structure and Function Program, The Hospital for Sick Children, Toronto, Canada
- **Department of Medical Biophysics, University of Toronto, Toronto, Canada
| | - Bret J. Pearson
- §Department of Molecular Genetics, University of Toronto, Toronto, Canada
- ‡‡Developmental and Stem Cell Biology Program, The Hospital for Sick Children; and
| | - Michael F. Moran
- From the ‡Molecular Structure and Function Program, The Hospital for Sick Children, Toronto, Canada
- §Department of Molecular Genetics, University of Toronto, Toronto, Canada
- ‖Ontario Cancer Institute, University Health Network
- §§Banting and Best Department of Medical Research, University of Toronto, MaRS Centre, 101 College Street, Toronto, ON, M5G 1L7, Canada
| |
Collapse
|
21
|
Pawar H, Sahasrabuddhe NA, Renuse S, Keerthikumar S, Sharma J, Kumar GSS, Venugopal A, Sekhar NR, Kelkar DS, Nemade H, Khobragade SN, Muthusamy B, Kandasamy K, Harsha HC, Chaerkady R, Patole MS, Pandey A. A proteogenomic approach to map the proteome of an unsequenced pathogen - Leishmania donovani. Proteomics 2012; 12:832-44. [DOI: 10.1002/pmic.201100505] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Harsh Pawar
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
- Rajiv Gandhi University of Health Sciences; Bangalore Karnataka India
| | - Nandini A. Sahasrabuddhe
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
- Manipal University; Madhav Nagar Manipal Karnataka India
- McKusick-Nathans Institute of Genetic Medicine; Johns Hopkins University School of Medicine; Baltimore MD USA
- Department of Biological Chemistry; Johns Hopkins University School of Medicine; Baltimore MD USA
| | - Santosh Renuse
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
- McKusick-Nathans Institute of Genetic Medicine; Johns Hopkins University School of Medicine; Baltimore MD USA
- Department of Biological Chemistry; Johns Hopkins University School of Medicine; Baltimore MD USA
- Department of Biotechnology; Amrita Vishwa Vidyapeetham; Kollam Kerala India
| | | | - Jyoti Sharma
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
- Manipal University; Madhav Nagar Manipal Karnataka India
| | - Ghantasala. S. Sameer Kumar
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
- Department of Biotechnology; Kuvempu University; Shimoga Karnataka India
| | - Abhilash Venugopal
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
- Department of Biotechnology; Kuvempu University; Shimoga Karnataka India
| | - Nirujogi Raja Sekhar
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
- Bioinformatics Centre; School of Life Sciences; Pondicherry University; Puducherry India
| | - Dhanashree S. Kelkar
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
- Department of Biotechnology; Amrita Vishwa Vidyapeetham; Kollam Kerala India
| | - Harshal Nemade
- National Centre for Cell Sciences; Pune Maharashtra India
| | | | - Babylakshmi Muthusamy
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
- Bioinformatics Centre; School of Life Sciences; Pondicherry University; Puducherry India
| | - Kumaran Kandasamy
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
| | - H. C. Harsha
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
| | - Raghothama Chaerkady
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
- McKusick-Nathans Institute of Genetic Medicine; Johns Hopkins University School of Medicine; Baltimore MD USA
- Department of Biological Chemistry; Johns Hopkins University School of Medicine; Baltimore MD USA
| | | | - Akhilesh Pandey
- McKusick-Nathans Institute of Genetic Medicine; Johns Hopkins University School of Medicine; Baltimore MD USA
- Department of Biological Chemistry; Johns Hopkins University School of Medicine; Baltimore MD USA
- Department of Oncology; Johns Hopkins University School of Medicine; Baltimore MD USA
- Department of Pathology; Johns Hopkins University School of Medicine; Baltimore MD USA
| |
Collapse
|
22
|
Prasad TSK, Harsha HC, Keerthikumar S, Sekhar NR, Selvan LDN, Kumar P, Pinto SM, Muthusamy B, Subbannayya Y, Renuse S, Chaerkady R, Mathur PP, Ravikumar R, Pandey A. Proteogenomic Analysis of Candida glabrata using High Resolution Mass Spectrometry. J Proteome Res 2011; 11:247-60. [DOI: 10.1021/pr200827k] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Affiliation(s)
- T. S. Keshava Prasad
- Institute of Bioinformatics, International Technology Park, Bangalore
-560 066, India
- Centre
of Excellence in Bioinformatics,
Bioinformatics Centre, School of Life Sciences, Pondicherry University, Puducherry -605 014, India
- Manipal University, Madhav Nagar, Manipal, Karnataka 576104; India
- Amrita School of Biotechnology, Amrita University, Kollam -690 525, India
| | - H. C. Harsha
- Institute of Bioinformatics, International Technology Park, Bangalore
-560 066, India
| | | | - Nirujogi Raja Sekhar
- Institute of Bioinformatics, International Technology Park, Bangalore
-560 066, India
- Centre
of Excellence in Bioinformatics,
Bioinformatics Centre, School of Life Sciences, Pondicherry University, Puducherry -605 014, India
| | - Lakshmi Dhevi N. Selvan
- Institute of Bioinformatics, International Technology Park, Bangalore
-560 066, India
- Amrita School of Biotechnology, Amrita University, Kollam -690 525, India
| | - Praveen Kumar
- Institute of Bioinformatics, International Technology Park, Bangalore
-560 066, India
- Amrita School of Biotechnology, Amrita University, Kollam -690 525, India
| | - Sneha M. Pinto
- Institute of Bioinformatics, International Technology Park, Bangalore
-560 066, India
- Manipal University, Madhav Nagar, Manipal, Karnataka 576104; India
| | - Babylakshmi Muthusamy
- Institute of Bioinformatics, International Technology Park, Bangalore
-560 066, India
- Centre
of Excellence in Bioinformatics,
Bioinformatics Centre, School of Life Sciences, Pondicherry University, Puducherry -605 014, India
| | - Yashwanth Subbannayya
- Institute of Bioinformatics, International Technology Park, Bangalore
-560 066, India
- Rajiv Gandhi University of Health Sciences, Jayanagar, Bangalore −560
041, India
| | - Santosh Renuse
- Institute of Bioinformatics, International Technology Park, Bangalore
-560 066, India
- Amrita School of Biotechnology, Amrita University, Kollam -690 525, India
| | - Raghothama Chaerkady
- Institute of Bioinformatics, International Technology Park, Bangalore
-560 066, India
| | - Premendu P. Mathur
- Centre
of Excellence in Bioinformatics,
Bioinformatics Centre, School of Life Sciences, Pondicherry University, Puducherry -605 014, India
| | - Raju Ravikumar
- Department of
Neuromicrobiology, National Institute of Mental Health and Neuro Sciences, Bangalore -560029, India
| | | |
Collapse
|
23
|
Zhao L, Liu L, Leng W, Wei C, Jin Q. A proteogenomic analysis of Shigella flexneri using 2D LC-MALDI TOF/TOF. BMC Genomics 2011; 12:528. [PMID: 22032405 PMCID: PMC3219829 DOI: 10.1186/1471-2164-12-528] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2011] [Accepted: 10/28/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND New strategies for high-throughput sequencing are constantly appearing, leading to a great increase in the number of completely sequenced genomes. Unfortunately, computational genome annotation is out of step with this progress. Thus, the accurate annotation of these genomes has become a bottleneck of knowledge acquisition. RESULTS We exploited a proteogenomic approach to improve conventional genome annotation by integrating proteomic data with genomic information. Using Shigella flexneri 2a as a model, we identified total 823 proteins, including 187 hypothetical proteins. Among them, three annotated ORFs were extended upstream through comprehensive analysis against an in-house N-terminal extension database. Two genes, which could not be translated to their full length because of stop codon 'mutations' induced by genome sequencing errors, were revised and annotated as fully functional genes. Above all, seven new ORFs were discovered, which were not predicted in S. flexneri 2a str.301 by any other annotation approaches. The transcripts of four novel ORFs were confirmed by RT-PCR assay. Additionally, most of these novel ORFs were overlapping genes, some even nested within the coding region of other known genes. CONCLUSIONS Our findings demonstrate that current Shigella genome annotation methods are not perfect and need to be improved. Apart from the validation of predicted genes at the protein level, the additional features of proteogenomic tools include revision of annotation errors and discovery of novel ORFs. The complementary dataset could provide more targets for those interested in Shigella to perform functional studies.
Collapse
Affiliation(s)
- Lina Zhao
- State Key Laboratory for Molecular Virology and Genetic Engineering, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, PR China
| | | | | | | | | |
Collapse
|
24
|
Abstract
The whole-genome sequencing of mosquitoes has facilitated our understanding of fundamental biological processes at their basic molecular levels and holds potential for application to mosquito control and prevention of mosquito-borne disease transmission. Draft genome sequences are available for Anopheles gambiae, Aedes aegypti, and Culex quinquefasciatus. Collectively, these represent the major vectors of African malaria, dengue fever and yellow fever viruses, and lymphatic filariasis, respectively. Rapid advances in genome technologies have revealed detailed information on genome architecture as well as phenotype-specific transcriptomics and proteomics. These resources allow for detailed comparative analyses within and across populations as well as species. Next-generation sequencing technologies will likely promote a proliferation of genome sequences for additional mosquito species as well as for individual insects. Here we review the current status of genome research in mosquitoes and identify potential areas for further investigations.
Collapse
Affiliation(s)
- David W Severson
- Eck Institute for Global Health, Department of Biological Sciences, University of Notre Dame, Notre Dame, Indiana 46556, USA.
| | | |
Collapse
|
25
|
Chaerkady R, Kelkar DS, Muthusamy B, Kandasamy K, Dwivedi SB, Sahasrabuddhe NA, Kim MS, Renuse S, Pinto SM, Sharma R, Pawar H, Sekhar NR, Mohanty AK, Getnet D, Yang Y, Zhong J, Dash AP, MacCallum RM, Delanghe B, Mlambo G, Kumar A, Keshava Prasad TS, Okulate M, Kumar N, Pandey A. A proteogenomic analysis of Anopheles gambiae using high-resolution Fourier transform mass spectrometry. Genome Res 2011; 21:1872-81. [PMID: 21795387 DOI: 10.1101/gr.127951.111] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Anopheles gambiae is a major mosquito vector responsible for malaria transmission, whose genome sequence was reported in 2002. Genome annotation is a continuing effort, and many of the approximately 13,000 genes listed in VectorBase for Anopheles gambiae are predictions that have still not been validated by any other method. To identify protein-coding genes of An. gambiae based on its genomic sequence, we carried out a deep proteomic analysis using high-resolution Fourier transform mass spectrometry for both precursor and fragment ions. Based on peptide evidence, we were able to support or correct more than 6000 gene annotations including 80 novel gene structures and about 500 translational start sites. An additional validation by RT-PCR and cDNA sequencing was successfully performed for 105 selected genes. Our proteogenomic analysis led to the identification of 2682 genome search-specific peptides. Numerous cases of encoded proteins were documented in regions annotated as intergenic, introns, or untranslated regions. Using a database created to contain potential splice sites, we also identified 35 novel splice junctions. This is a first report to annotate the An. gambiae genome using high-accuracy mass spectrometry data as a complementary technology for genome annotation.
Collapse
Affiliation(s)
- Raghothama Chaerkady
- McKusick-Nathans Institute of Genetic Medicine and Department of Biological Chemistry, Johns Hopkins University, Baltimore, Maryland 21205, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Renuse S, Chaerkady R, Pandey A. Proteogenomics. Proteomics 2011; 11:620-30. [DOI: 10.1002/pmic.201000615] [Citation(s) in RCA: 106] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2010] [Revised: 11/14/2010] [Accepted: 11/16/2010] [Indexed: 12/13/2022]
|
27
|
Nesvizhskii AI. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics 2010; 73:2092-123. [PMID: 20816881 DOI: 10.1016/j.jprot.2010.08.009] [Citation(s) in RCA: 370] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Revised: 08/25/2010] [Accepted: 08/25/2010] [Indexed: 12/18/2022]
Abstract
This manuscript provides a comprehensive review of the peptide and protein identification process using tandem mass spectrometry (MS/MS) data generated in shotgun proteomic experiments. The commonly used methods for assigning peptide sequences to MS/MS spectra are critically discussed and compared, from basic strategies to advanced multi-stage approaches. A particular attention is paid to the problem of false-positive identifications. Existing statistical approaches for assessing the significance of peptide to spectrum matches are surveyed, ranging from single-spectrum approaches such as expectation values to global error rate estimation procedures such as false discovery rates and posterior probabilities. The importance of using auxiliary discriminant information (mass accuracy, peptide separation coordinates, digestion properties, and etc.) is discussed, and advanced computational approaches for joint modeling of multiple sources of information are presented. This review also includes a detailed analysis of the issues affecting the interpretation of data at the protein level, including the amplification of error rates when going from peptide to protein level, and the ambiguities in inferring the identifies of sample proteins in the presence of shared peptides. Commonly used methods for computing protein-level confidence scores are discussed in detail. The review concludes with a discussion of several outstanding computational issues.
Collapse
|
28
|
Castellana N, Bafna V. Proteogenomics to discover the full coding content of genomes: a computational perspective. J Proteomics 2010; 73:2124-35. [PMID: 20620248 DOI: 10.1016/j.jprot.2010.06.007] [Citation(s) in RCA: 134] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2010] [Revised: 06/04/2010] [Accepted: 06/21/2010] [Indexed: 11/16/2022]
Abstract
Proteogenomics has emerged as a field at the junction of genomics and proteomics. It is a loose collection of technologies that allow the search of tandem mass spectra against genomic databases to identify and characterize protein-coding genes. Proteogenomic peptides provide invaluable information for gene annotation, which is difficult or impossible to ascertain using standard annotation methods. Examples include confirmation of translation, reading-frame determination, identification of gene and exon boundaries, evidence for post-translational processing, identification of splice-forms including alternative splicing, and also, prediction of completely novel genes. For proteogenomics to deliver on its promise, however, it must overcome a number of technological hurdles, including speed and accuracy of peptide identification, construction and search of specialized databases, correction of sampling bias, and others. This article reviews the state of the art of the field, focusing on the current successes, and the role of computation in overcoming these challenges. We describe how technological and algorithmic advances have already enabled large-scale proteogenomic studies in many model organisms, including arabidopsis, yeast, fly, and human. We also provide a preview of the field going forward, describing early efforts in tackling the problems of complex gene structures, searching against genomes of related species, and immunoglobulin gene reconstruction.
Collapse
Affiliation(s)
- Natalie Castellana
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093-0404, USA
| | | |
Collapse
|
29
|
Proteomic analysis of two Trypanosoma cruzi zymodeme 3 strains. Exp Parasitol 2010; 126:540-51. [PMID: 20566365 DOI: 10.1016/j.exppara.2010.06.005] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2009] [Revised: 05/27/2010] [Accepted: 06/01/2010] [Indexed: 02/02/2023]
Abstract
Two Trypanosoma cruzi Z3 strains, designated as 3663 and 4167, were previously isolated from insect vectors captured in the Brazilian Amazon region. These strains exhibited different infection patterns in Vero, C6/36, RAW 264.7 and HEp-2 cell lineages, in which 3663 trypomastigote form was much less infective than 4167 ones. A proteomic approach was applied to investigate the differences in the global patterns of protein expression in these two Z3 strains. Two-dimensional (2D) protein maps were generated and certain spots were identified by mass spectrometry (MS). Our analyses revealed a significant difference in the expression profile of different proteins between strains 3663 and 4167. Among them, cruzipain, an important regulator of infectivity. This data was corroborated by flow cytometry analysis using anti-cruzipain antibody. This difference could contribute to the infectivity profiles observed for each strain by in vitro assay using different cell lines.
Collapse
|
30
|
Li J, Hosseini Moghaddam SH, Chen X, Chen M, Zhong B. Shotgun strategy-based proteome profiling analysis on the head of silkworm Bombyx mori. Amino Acids 2010; 39:751-61. [PMID: 20198493 DOI: 10.1007/s00726-010-0517-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2009] [Accepted: 02/05/2010] [Indexed: 01/09/2023]
Abstract
Insect head is comprised of important sensory systems to communicate with internal and external environment and endocrine organs such as brain and corpus allatum to regulate insect growth and development. To comprehensively understand how all these components act and interact within the head, it is necessary to investigate their molecular basis at protein level. Here, the spectra of peptides digested from silkworm larval heads were obtained from liquid chromatography tandem mass spectrometry (LC-MS/MS) and were analyzed by bioinformatics methods. Totally, 539 proteins with a low false discovery rate (FDR) were identified by searching against an in-house database with SEQUEST and X!Tandem algorithms followed by trans-proteomic pipeline (TPP) validation. Forty-three proteins had the theoretical isoelectric point (pI) greater than 10 which were too difficult to separate by two-dimensional gel electrophoresis (2-DE). Four chemosensory proteins, one odorant-binding protein, two diapause-related proteins, and a lot of cuticle proteins, interestingly including pupal cuticle proteins were identified. The proteins involved in nervous system development, stress response, apoptosis and so forth were related to the physiological status of head. Pathway analysis revealed that many proteins were highly homologous with the human proteins which involved in human neurodegenerative disease pathways, probably implying a symptom of the forthcoming metamorphosis of silkworm. These data and the analysis methods were expected to be of benefit to the proteomics research of silkworm and other insects.
Collapse
Affiliation(s)
- Jianying Li
- College of Animal Sciences, Zhejiang University, Hangzhou, 310029, People's Republic of China
| | | | | | | | | |
Collapse
|
31
|
Armengaud J. Proteogenomics and systems biology: quest for the ultimate missing parts. Expert Rev Proteomics 2010; 7:65-77. [DOI: 10.1586/epr.09.104] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/30/2023]
|
32
|
Deep proteogenomics; high throughput gene validation by multidimensional liquid chromatography and mass spectrometry of proteins from the fungal wheat pathogen Stagonospora nodorum. BMC Bioinformatics 2009; 10:301. [PMID: 19772613 PMCID: PMC2753851 DOI: 10.1186/1471-2105-10-301] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2009] [Accepted: 09/22/2009] [Indexed: 12/23/2022] Open
Abstract
Background Stagonospora nodorum, a fungal ascomycete in the class dothideomycetes, is a damaging pathogen of wheat. It is a model for necrotrophic fungi that cause necrotic symptoms via the interaction of multiple effector proteins with cultivar-specific receptors. A draft genome sequence and annotation was published in 2007. A second-pass gene prediction using a training set of 795 fully EST-supported genes predicted a total of 10762 version 2 nuclear-encoded genes, with an additional 5354 less reliable version 1 genes also retained. Results In this study, we subjected soluble mycelial proteins to proteolysis followed by 2D LC MALDI-MS/MS. Comparison of the detected peptides with the gene models validated 2134 genes. 62% of these genes (1324) were not supported by prior EST evidence. Of the 2134 validated genes, all but 188 were version 2 annotations. Statistical analysis of the validated gene models revealed a preponderance of cytoplasmic and nuclear localised proteins, and proteins with intracellular-associated GO terms. These statistical associations are consistent with the source of the peptides used in the study. Comparison with a 6-frame translation of the S. nodorum genome assembly confirmed 905 existing gene annotations (including 119 not previously confirmed) and provided evidence supporting 144 genes with coding exon frameshift modifications, 604 genes with extensions of coding exons into annotated introns or untranslated regions (UTRs), 3 new gene annotations which were supported by tblastn to NR, and 44 potential new genes residing within un-assembled regions of the genome. Conclusion We conclude that 2D LC MALDI-MS/MS is a powerful, rapid and economical tool to aid in the annotation of fungal genomic assemblies.
Collapse
|
33
|
Abstract
The analysis of the large volume of tandem mass spectrometry (MS/MS) proteomics data that is generated these days relies on automated algorithms that identify peptides from their mass spectra. An essential component of these algorithms is the scoring function used to evaluate the quality of peptide-spectrum matches (PSMs). In this paper, we present new approach to scoring of PSMs. We argue that since this problem is at its core a ranking task (especially in the case of de novo sequencing), it can be solved effectively using machine learning ranking algorithms. We developed a new discriminative boosting-based approach to scoring. Our scoring models draw upon a large set of diverse feature functions that measure different qualities of PSMs. Our method improves the performance of our de novo sequencing algorithm beyond the current state-of-the-art, and also greatly enhances the performance of database search programs. Furthermore, by increasing the efficiency of tag filtration and improving the sensitivity of PSM scoring, we make it practical to perform large-scale MS/MS analysis, such as proteogenomic search of a six-frame translation of the human genome (in which we achieve a reduction of the running time by a factor of 15 and a 60% increase in the number of identified peptides, compared to the InsPecT database search tool). Our scoring function is incorporated into PepNovo+ which is available for download or can be run online at http://bix.ucsd.edu.
Collapse
Affiliation(s)
- Ari M Frank
- Department of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, Mail Code 0404 La Jolla, California 92093-0404, USA.
| |
Collapse
|
34
|
Overcoming function annotation errors in the Gram-positive pathogen Streptococcus suis by a proteomics-driven approach. BMC Genomics 2008; 9:588. [PMID: 19061494 PMCID: PMC2613929 DOI: 10.1186/1471-2164-9-588] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2008] [Accepted: 12/05/2008] [Indexed: 12/02/2022] Open
Abstract
Background Annotation of protein-coding genes is a key step in sequencing projects. Protein functions are mainly assigned on the basis of the amino acid sequence alone by searching of homologous proteins. However, fully automated annotation processes often lead to wrong prediction of protein functions, and therefore time-intensive manual curation is often essential. Here we describe a fast and reliable way to correct function annotation in sequencing projects, focusing on surface proteomes. We use a proteomics approach, previously proven to be very powerful for identifying new vaccine candidates against Gram-positive pathogens. It consists of shaving the surface of intact cells with two proteases, the specific cleavage-site trypsin and the unspecific proteinase K, followed by LC/MS/MS analysis of the resulting peptides. The identified proteins are contrasted by computational analysis and their sequences are inspected to correct possible errors in function prediction. Results When applied to the zoonotic pathogen Streptococcus suis, of which two strains have been recently sequenced and annotated, we identified a set of surface proteins without cytoplasmic contamination: all the proteins identified had exporting or retention signals towards the outside and/or the cell surface, and viability of protease-treated cells was not affected. The combination of both experimental evidences and computational methods allowed us to determine that two of these proteins are putative extracellular new adhesins that had been previously attributed a wrong cytoplasmic function. One of them is a putative component of the pilus of this bacterium. Conclusion We illustrate the complementary nature of laboratory-based and computational methods to examine in concert the localization of a set of proteins in the cell, and demonstrate the utility of this proteomics-based strategy to experimentally correct function annotation errors in sequencing projects. This approach also contributes to provide strong experimental evidences that can be used to annotate those proteins for which a Gene Ontology (GO) term has not been assigned so far. Function annotation correction would then improve the identification of surface-associated proteins in bacterial pathogens, thus accelerating the discovery of new vaccines in infectious disease research.
Collapse
|
35
|
Kim S, Gupta N, Bandeira N, Pevzner PA. Spectral dictionaries: Integrating de novo peptide sequencing with database search of tandem mass spectra. Mol Cell Proteomics 2008; 8:53-69. [PMID: 18703573 DOI: 10.1074/mcp.m800103-mcp200] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Database search tools identify peptides by matching tandem mass spectra against a protein database. We study an alternative approach when all plausible de novo interpretations of a spectrum (spectral dictionary) are generated and then quickly matched against the database. We present a new MS-Dictionary algorithm for efficiently generating spectral dictionaries and demonstrate that MS-Dictionary can identify spectra that are missed in the database search. We argue that MS-Dictionary enables proteogenomics searches in six-frame translation of genomic sequences that may be prohibitively time-consuming for existing database search approaches. We show that such searches allow one to correct sequencing errors and find programmed frameshifts.
Collapse
Affiliation(s)
- Sangtae Kim
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA
| | | | | | | |
Collapse
|
36
|
Xia D, Sanderson SJ, Jones AR, Prieto JH, Yates JR, Bromley E, Tomley FM, Lal K, Sinden RE, Brunk BP, Roos DS, Wastling JM. The proteome of Toxoplasma gondii: integration with the genome provides novel insights into gene expression and annotation. Genome Biol 2008; 9:R116. [PMID: 18644147 PMCID: PMC2530874 DOI: 10.1186/gb-2008-9-7-r116] [Citation(s) in RCA: 89] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2008] [Revised: 06/17/2008] [Accepted: 07/21/2008] [Indexed: 11/10/2022] Open
Abstract
A proteomics analysis identifies one third of the predicted Toxoplasma gondii proteins and integrates proteomics and genomics data to refine genome annotation. Background Although the genomes of many of the most important human and animal pathogens have now been sequenced, our understanding of the actual proteins expressed by these genomes and how well they predict protein sequence and expression is still deficient. We have used three complementary approaches (two-dimensional electrophoresis, gel-liquid chromatography linked tandem mass spectrometry and MudPIT) to analyze the proteome of Toxoplasma gondii, a parasite of medical and veterinary significance, and have developed a public repository for these data within ToxoDB, making for the first time proteomics data an integral part of this key genome resource. Results The draft genome for Toxoplasma predicts around 8,000 genes with varying degrees of confidence. Our data demonstrate how proteomics can inform these predictions and help discover new genes. We have identified nearly one-third (2,252) of all the predicted proteins, with 2,477 intron-spanning peptides providing supporting evidence for correct splice site annotation. Functional predictions for each protein and key pathways were determined from the proteome. Importantly, we show evidence for many proteins that match alternative gene models, or previously unpredicted genes. For example, approximately 15% of peptides matched more convincingly to alternative gene models. We also compared our data with existing transcriptional data in which we highlight apparent discrepancies between gene transcription and protein expression. Conclusion Our data demonstrate the importance of protein data in expression profiling experiments and highlight the necessity of integrating proteomic with genomic data so that iterative refinements of both annotation and expression models are possible.
Collapse
Affiliation(s)
- Dong Xia
- Department of Pre-clinical Veterinary Science, Faculty of Veterinary Science, University of Liverpool, Liverpool L69 7ZJ, UK.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Gupta N, Benhamida J, Bhargava V, Goodman D, Kain E, Kerman I, Nguyen N, Ollikainen N, Rodriguez J, Wang J, Lipton MS, Romine M, Bafna V, Smith RD, Pevzner PA. Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes. Genome Res 2008; 18:1133-42. [PMID: 18426904 DOI: 10.1101/gr.074344.107] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Recent proliferation of low-cost DNA sequencing techniques will soon lead to an explosive growth in the number of sequenced genomes and will turn manual annotations into a luxury. Mass spectrometry recently emerged as a valuable technique for proteogenomic annotations that improves on the state-of-the-art in predicting genes and other features. However, previous proteogenomic approaches were limited to a single genome and did not take advantage of analyzing mass spectrometry data from multiple genomes at once. We show that such a comparative proteogenomics approach (like comparative genomics) allows one to address the problems that remained beyond the reach of the traditional "single proteome" approach in mass spectrometry. In particular, we show how comparative proteogenomics addresses the notoriously difficult problem of "one-hit-wonders" in proteomics, improves on the existing gene prediction tools in genomics, and allows identification of rare post-translational modifications. We therefore argue that complementing DNA sequencing projects by comparative proteogenomics projects can be a viable approach to improve both genomic and proteomic annotations.
Collapse
Affiliation(s)
- Nitin Gupta
- Bioinformatics Program, University of California San Diego, La Jolla, California 92093, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Dandass YS, Burgess SC, Lawrence M, Bridges SM. Accelerating string set matching in FPGA hardware for bioinformatics research. BMC Bioinformatics 2008; 9:197. [PMID: 18412963 PMCID: PMC2374783 DOI: 10.1186/1471-2105-9-197] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2008] [Accepted: 04/15/2008] [Indexed: 11/16/2022] Open
Abstract
Background This paper describes techniques for accelerating the performance of the string set matching problem with particular emphasis on applications in computational proteomics. The process of matching peptide sequences against a genome translated in six reading frames is part of a proteogenomic mapping pipeline that is used as a case-study. The Aho-Corasick algorithm is adapted for execution in field programmable gate array (FPGA) devices in a manner that optimizes space and performance. In this approach, the traditional Aho-Corasick finite state machine (FSM) is split into smaller FSMs, operating in parallel, each of which matches up to 20 peptides in the input translated genome. Each of the smaller FSMs is further divided into five simpler FSMs such that each simple FSM operates on a single bit position in the input (five bits are sufficient for representing all amino acids and special symbols in protein sequences). Results This bit-split organization of the Aho-Corasick implementation enables efficient utilization of the limited random access memory (RAM) resources available in typical FPGAs. The use of on-chip RAM as opposed to FPGA logic resources for FSM implementation also enables rapid reconfiguration of the FPGA without the place and routing delays associated with complex digital designs. Conclusion Experimental results show storage efficiencies of over 80% for several data sets. Furthermore, the FPGA implementation executing at 100 MHz is nearly 20 times faster than an implementation of the traditional Aho-Corasick algorithm executing on a 2.67 GHz workstation.
Collapse
Affiliation(s)
- Yoginder S Dandass
- Institute of Digital Biology, Mississippi State University, Mississippi 39762, USA.
| | | | | | | |
Collapse
|
39
|
Ferro M, Tardif M, Reguer E, Cahuzac R, Bruley C, Vermat T, Nugues E, Vigouroux M, Vandenbrouck Y, Garin J, Viari A. PepLine: a software pipeline for high-throughput direct mapping of tandem mass spectrometry data on genomic sequences. J Proteome Res 2008; 7:1873-83. [PMID: 18348511 DOI: 10.1021/pr070415k] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
PepLine is a fully automated software which maps MS/MS fragmentation spectra of trypsic peptides to genomic DNA sequences. The approach is based on Peptide Sequence Tags (PSTs) obtained from partial interpretation of QTOF MS/MS spectra (first module). PSTs are then mapped on the six-frame translations of genomic sequences (second module) giving hits. Hits are then clustered to detect potential coding regions (third module). Our work aimed at optimizing the algorithms of each component to allow the whole pipeline to proceed in a fully automated manner using raw nucleic acid sequences (i.e., genomes that have not been "reduced" to a database of ORFs or putative exons sequences). The whole pipeline was tested on controlled MS/MS spectra sets from standard proteins and from Arabidopsis thaliana envelope chloroplast samples. Our results demonstrate that PepLine competed with protein database searching softwares and was fast enough to potentially tackle large data sets and/or high size genomes. We also illustrate the potential of this approach for the detection of the intron/exon structure of genes.
Collapse
Affiliation(s)
- Myriam Ferro
- CEA, DSV, iRTSV, Laboratoire d'Etude de la Dynamique des Protéomes, Grenoble, F-38054, France
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Lucitt MB, Price TS, Pizarro A, Wu W, Yocum AK, Seiler C, Pack MA, Blair IA, Fitzgerald GA, Grosser T. Analysis of the zebrafish proteome during embryonic development. Mol Cell Proteomics 2008; 7:981-94. [PMID: 18212345 DOI: 10.1074/mcp.m700382-mcp200] [Citation(s) in RCA: 101] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The model organism zebrafish (Danio rerio) is particularly amenable to studies deciphering regulatory genetic networks in vertebrate development, biology, and pharmacology. Unraveling the functional dynamics of such networks requires precise quantitation of protein expression during organismal growth, which is incrementally challenging with progressive complexity of the systems. In an approach toward such quantitative studies of dynamic network behavior, we applied mass spectrometric methodology and rigorous statistical analysis to create comprehensive, high quality profiles of proteins expressed at two stages of zebrafish development. Proteins of embryos 72 and 120 h postfertilization (hpf) were isolated and analyzed both by two-dimensional (2D) LC followed by ESI-MS/MS and by 2D PAGE followed by MALDI-TOF/TOF protein identification. We detected 1384 proteins from 327,906 peptide sequence identifications at 72 and 120 hpf with false identification rates of less than 1% using 2D LC-ESI-MS/MS. These included only approximately 30% of proteins that were identified by 2D PAGE-MALDI-TOF/TOF. Roughly 10% of all detected proteins were derived from hypothetical or predicted gene models or were entirely unannotated. Comparison of proteins expression by 2D DIGE revealed that proteins involved in energy production and transcription/translation were relatively more abundant at 72 hpf consistent with faster synthesis of cellular proteins during organismal growth at this time compared with 120 hpf. The data are accessible in a database that links protein identifications to existing resources including the Zebrafish Information Network database. This new resource should facilitate the selection of candidate proteins for targeted quantitation and refine systematic genetic network analysis in vertebrate development and biology.
Collapse
Affiliation(s)
- Margaret B Lucitt
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Sperança MA, Capurro ML. Perspectives in the control of infectious diseases by transgenic mosquitoes in the post-genomic era--a review. Mem Inst Oswaldo Cruz 2008; 102:425-33. [PMID: 17612761 DOI: 10.1590/s0074-02762007005000054] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2006] [Accepted: 04/10/2007] [Indexed: 12/14/2022] Open
Abstract
Arthropod-borne diseases caused by a variety of microorganisms such as dengue virus and malaria parasites afflict billions of people worldwide imposing major economic and social burdens. Despite many efforts, vaccines against diseases transmitted by mosquitoes, with the exception of yellow fever, are not available. Control of such infectious pathogens is mainly performed by vector management and treatment of affected individuals with drugs. However, the numbers of insecticide-resistant insects and drug-resistant parasites are increasing. Therefore, inspired in recent years by a lot of new data produced by genomics and post-genomics research, several scientific groups have been working on different strategies to control infectious arthropod-borne diseases. This review focuses on recent advances and perspectives towards construction of transgenic mosquitoes refractory to malaria parasites and dengue virus transmission.
Collapse
|
42
|
Choumet V, Carmi-Leroy A, Laurent C, Lenormand P, Rousselle JC, Namane A, Roth C, Brey PT. The salivary glands and saliva of Anopheles gambiae as an essential step in the Plasmodium life cycle: a global proteomic study. Proteomics 2007; 7:3384-94. [PMID: 17849406 DOI: 10.1002/pmic.200700334] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Proteins synthesized in the salivary glands of the Anopheles gambiae mosquito are thought to be important in the life cycle of the malaria parasite Plasmodium. To describe A. gambiae salivary gland and saliva contents, we combined several techniques: 1-DE, 2-DE and LC MS/MS. This study has identified five saliva proteins and 122 more proteins from the salivary glands, including the first proteomic description for 89 of these salivary gland proteins. Since the invasion and sporozoite maturation take place during the process of salivary glands ageing, the effect of salivary gland age on salivary component composition was examined. LC MS/MS profiling of young versus old salivary gland proteomes suggests that there is an over-representation of proteins involved in signaling and proteins related to the immune response in the proteins from older mosquitoes. The iTRAQ labeling was used for a comparative proteomic analysis of salivary gland samples from infected or Plasmodium berghei-free mosquitoes. The expression levels of five secreted proteins were altered when the parasite was present. These observations will serve as a basis for future work concerning the possible role of these proteins in the interaction between A. gambiae, Plasmodium and the mammalian host.
Collapse
Affiliation(s)
- Valérie Choumet
- Unité de Biochimie et de Biologie Moléculaire des Insectes, Institut Pasteur, Paris cedex 15, France.
| | | | | | | | | | | | | | | |
Collapse
|
43
|
Gupta N, Tanner S, Jaitly N, Adkins JN, Lipton M, Edwards R, Romine M, Osterman A, Bafna V, Smith RD, Pevzner PA. Whole proteome analysis of post-translational modifications: applications of mass-spectrometry for proteogenomic annotation. Genes Dev 2007; 17:1362-77. [PMID: 17690205 PMCID: PMC1950905 DOI: 10.1101/gr.6427907] [Citation(s) in RCA: 159] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2007] [Accepted: 06/12/2007] [Indexed: 11/24/2022]
Abstract
While bacterial genome annotations have significantly improved in recent years, techniques for bacterial proteome annotation (including post-translational chemical modifications, signal peptides, proteolytic events, etc.) are still in their infancy. At the same time, the number of sequenced bacterial genomes is rising sharply, far outpacing our ability to validate the predicted genes, let alone annotate bacterial proteomes. In this study, we use tandem mass spectrometry (MS/MS) to annotate the proteome of Shewanella oneidensis MR-1, an important microbe for bioremediation. In particular, we provide the first comprehensive map of post-translational modifications in a bacterial genome, including a large number of chemical modifications, signal peptide cleavages, and cleavages of N-terminal methionine residues. We also detect multiple genes that were missed or assigned incorrect start positions by gene prediction programs, and suggest corrections to improve the gene annotation. This study demonstrates that complementing every genome sequencing project by an MS/MS project would significantly improve both genome and proteome annotations for a reasonable cost.
Collapse
Affiliation(s)
- Nitin Gupta
- Bioinformatics Program, University of California San Diego, La Jolla, California 92093, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Cázares-Raga FE, González-Lázaro M, Montero-Solís C, González-Cerón L, Zamudio F, Martínez-Barnetche J, Torres-Monzón JA, Ovilla-Muñoz M, Aguilar-Fuentes J, Rodríguez MH, de la Cruz Hernández-Hernández F. GP35 ANOAL, an abundant acidic glycoprotein of female Anopheles albimanus saliva. INSECT MOLECULAR BIOLOGY 2007; 16:187-98. [PMID: 17298558 DOI: 10.1111/j.1365-2583.2006.00712.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Salivary glands of female mosquitoes produce proteins, not completely described yet, that participate in carbohydrate and blood feeding. Here, we report an acidic glycoprotein of 35 kDa (GP35 ANOAL) secreted in the saliva of the malaria vector mosquito Anopheles albimanus. GP35 ANOAL is produced exclusively in the distal lateral lobes of adult female salivary glands, it has a pI of 4.45 and is negatively stained by regular silver stain. An 888 bp cDNA clone encoding a predicted product of 240 amino acids has a signal peptide, potential post-translational modification sites, and a disintegrin signature RGD. The GP35 ANOAL sequence depicts high similarities with the 30 kDa saliva allergen of Aedes aegypti, 30 kDa allergen-like hypothetical proteins, and GE-rich proteins present in several Anopheles species, as well as in Ae. albopictus and Culex pipiens quinquefasciatus. The function of this protein family is still unknown.
Collapse
Affiliation(s)
- F E Cázares-Raga
- Centro de Investigación Sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, Morelos, Mexico
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Savidor A, Donahoo RS, Hurtado-Gonzales O, Verberkmoes NC, Shah MB, Lamour KH, McDonald WH. Expressed peptide tags: an additional layer of data for genome annotation. J Proteome Res 2007; 5:3048-58. [PMID: 17081056 DOI: 10.1021/pr060134x] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
While genome sequencing is becoming ever more routine, genome annotation remains a challenging process. Identification of the coding sequences within the genomic milieu presents a tremendous challenge, especially for eukaryotes with their complex gene architectures. Here, we present a method to assist the annotation process through the use of proteomic data and bioinformatics. Mass spectra of digested protein preparations of the organism of interest were acquired and searched against a protein database created by a six-frame translation of the genome. The identified peptides were mapped back to the genome, compared to the current annotation, and then categorized as supporting or extending the current genome annotation. We named the classified peptides Expressed Peptide Tags (EPTs). The well-annotated bacterium Rhodopseudomonas palustris was used as a control for the method and showed a high degree of correlation between EPT mapping and the current annotation, with 86% of the EPTs confirming existing gene calls and less than 1% of the EPTs expanding on the current annotation. The eukaryotic plant pathogens Phytophthora ramorum and Phytophthora sojae, whose genomes have been recently sequenced and are much less well-annotated, were also subjected to this method. A series of algorithmic steps were taken to increase the confidence of EPT identification for these organisms, including generation of smaller subdatabases to be searched against, and definition of EPT criteria that accommodates the more complex eukaryotic gene architecture. As expected, the analysis of the Phytophthora species showed less correlation between EPT mapping and their current annotation. While approximately 76% of Phytophthora EPTs supported the current annotation, a portion of them (7.7% and 12.9% for P. ramorum and P. sojae, respectively) suggested modification to current gene calls or identified novel genes that were missed by the current genome annotation of these organisms.
Collapse
Affiliation(s)
- Alon Savidor
- Graduate School of Genome Science and Technology, University of Tennessee-Oak Ridge National Laboratory, Oak Ridge, Tennessee 37830, USA
| | | | | | | | | | | | | |
Collapse
|
46
|
Lawson D, Arensburger P, Atkinson P, Besansky NJ, Bruggner RV, Butler R, Campbell KS, Christophides GK, Christley S, Dialynas E, Emmert D, Hammond M, Hill CA, Kennedy RC, Lobo NF, MacCallum MR, Madey G, Megy K, Redmond S, Russo S, Severson DW, Stinson EO, Topalis P, Zdobnov EM, Birney E, Gelbart WM, Kafatos FC, Louis C, Collins FH. VectorBase: a home for invertebrate vectors of human pathogens. Nucleic Acids Res 2006; 35:D503-5. [PMID: 17145709 PMCID: PMC1751530 DOI: 10.1093/nar/gkl960] [Citation(s) in RCA: 99] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
VectorBase () is a web-accessible data repository for information about invertebrate vectors of human pathogens. VectorBase annotates and maintains vector genomes providing an integrated resource for the research community. Currently, VectorBase contains genome information for two organisms: Anopheles gambiae, a vector for the Plasmodium protozoan agent causing malaria, and Aedes aegypti, a vector for the flaviviral agents causing Yellow fever and Dengue fever.
Collapse
Affiliation(s)
- Daniel Lawson
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
McCarthy FM, Bridges SM, Wang N, Magee GB, Williams WP, Luthe DS, Burgess SC. AgBase: a unified resource for functional analysis in agriculture. Nucleic Acids Res 2006; 35:D599-603. [PMID: 17135208 PMCID: PMC1751552 DOI: 10.1093/nar/gkl936] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Analysis of functional genomics (transcriptomics and proteomics) datasets is hindered in agricultural species because agricultural genome sequences have relatively poor structural and functional annotation. To facilitate systems biology in these species we have established the curated, web-accessible, public resource 'AgBase' (www.agbase.msstate.edu). We have improved the structural annotation of agriculturally important genomes by experimentally confirming the in vivo expression of electronically predicted proteins and by proteogenomic mapping. Proteogenomic data are available from the AgBase proteogenomics link. We contribute Gene Ontology (GO) annotations and we provide a two tier system of GO annotations for users. The 'GO Consortium' gene association file contains the most rigorous GO annotations based solely on experimental data. The 'Community' gene association file contains GO annotations based on expert community knowledge (annotations based directly from author statements and submitted annotations from the community) and annotations for predicted proteins. We have developed two tools for proteomics analysis and these are freely available on request. A suite of tools for analyzing functional genomics datasets using the GO is available online at the AgBase site. We encourage and publicly acknowledge GO annotations from researchers and provide an online mechanism for agricultural researchers to submit requests for GO annotations.
Collapse
Affiliation(s)
- Fiona M. McCarthy
- Department of Basic Sciences, College of Veterinary Medicine, Mississippi State UniversityPO Box 6100, Mississippi, MS 39762, USA
- Institute for Digital Biology, Mississippi State UniversityMS 39762, USA
- To whom correspondence should be addressed. Tel: +1 662 325 5859; Fax: +1 662 325 1031;
| | - Susan M. Bridges
- Institute for Digital Biology, Mississippi State UniversityMS 39762, USA
- Department of Computer Science and Engineering, Bagley College of EngineeringPO Box 9637, Mississippi, MS 39762, USA
- To whom correspondence should be addressed. Tel: +1 662 325 5859; Fax: +1 662 325 1031;
| | - Nan Wang
- Institute for Digital Biology, Mississippi State UniversityMS 39762, USA
- Department of Computer Science and Engineering, Bagley College of EngineeringPO Box 9637, Mississippi, MS 39762, USA
| | - G. Bryce Magee
- Institute for Digital Biology, Mississippi State UniversityMS 39762, USA
- Department of Computer Science and Engineering, Bagley College of EngineeringPO Box 9637, Mississippi, MS 39762, USA
| | - W. Paul Williams
- USDA ARS Corn Host Plant Resistance Research UnitBox 5367, Mississippi, MS 39762, USA
| | - Dawn S. Luthe
- Department of Crop and Soil Sciences, The Pennsylvania State UniversityUniversity Park, PA 16802, USA
| | - Shane C. Burgess
- Department of Basic Sciences, College of Veterinary Medicine, Mississippi State UniversityPO Box 6100, Mississippi, MS 39762, USA
- Department of Computer Science and Engineering, Bagley College of EngineeringPO Box 9637, Mississippi, MS 39762, USA
- Mississippi Agricultural and Forestry Experiment Station, Mississippi State UniversityMS 39762, USA
| |
Collapse
|
48
|
Boisson B, Jacques JC, Choumet V, Martin E, Xu J, Vernick K, Bourgouin C. Gene silencing in mosquito salivary glands by RNAi. FEBS Lett 2006; 580:1988-92. [PMID: 16530187 DOI: 10.1016/j.febslet.2006.02.069] [Citation(s) in RCA: 96] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2006] [Revised: 02/27/2006] [Accepted: 02/27/2006] [Indexed: 11/21/2022]
Abstract
Salivary glands are the ultimate site of development in the insect of mosquito born pathogens such as Plasmodium. Mosquito salivary glands also secrete components involved in anti-haemostatic activities and allergic reactions. We investigated the feasibility of RNAi as a tool for functional analysis of genes expressed in Anopheles gambiae salivary glands. We show that specific gene silencing in salivary glands requires the use of large amounts of dsRNA, condition that differs from those for efficient RNAi in other mosquito tissues. Using this protocol, we demonstrated the role of AgApy, which encodes an apyrase, in the probing behaviour of An. gambiae.
Collapse
Affiliation(s)
- Bertrand Boisson
- Unité de Biologie et Génétique du Paludisme, Institut Pasteur, 28 Rue du Dr Roux 75724 Paris Cedex, France.
| | | | | | | | | | | | | |
Collapse
|